2013 12th International Conference on Document Analysis and Recognition

A Radial Neural Convolutional Layer for Multi-Oriented Character Recognition Hubert Cecotti

Szil´ard Vajda

University of California, Santa Barbara Department of Psychological and Brain Sciences Santa Barbara, CA, 93106 USA Email: [email protected]

Lister Hill National Center for Biomedical Communications National Library of Medicine, National Institutes of Health Bethesda, MD 20894, USA Email: [email protected]

Abstract—The recognition of fully multi-oriented handwritten characters is a challenging problem. Contrary to univariate signals where the shift invariance property in the Fourier transform can be used, multivariate signals like images require special care to extract rotation invariant features. Several strategies to solve such classification tasks are possible. The proposed method considers input features obtained by the Radon transform or Polar transform. A convolutional neural network is then used for extracting higher level features. This classifier includes in addition the Fast Fourier Transform for extracting shift invariant features at the neural network level. The Radon transform and the convolutional layers process the image at the pixel level while the Fourier transform and the upper layers of the neural network process rotation invariant features. The classifier is evaluated on multi-oriented handwritten digits based on the MNIST database (Arabic digits) and on the ISI database (Bangla digits). The average recognition rate for multi-oriented characters is 93.10% for the Arabic digits and 77.01% for the Bangla digits. This neural architecture highlights the interest of the radial convolutional layer for the recognition of multi-oriented shapes.

Second, technical documents, e.g. maps, electrical diagrams, are often composed of symbols or characters that can be multi-oriented [6]. Three main strategies are typically proposed for creating pattern recognition systems that are enabled to consider the multi-orientation constraint. The first one consists of removing the constraint of the multi-orientation. The goal is to transfer the problem back to the case where every image has the same orientation [7]. The second strategy is the most commonly used approach. The goal is to extract a set of descriptors, which are invariant to the orientation [8], [9]. In character and symbol recognition, many approaches have been proposed for creating rotation invariant features, they include methods such as ring projection, the number of structural features (multifork points, strokes, stroke intersections), Fourier descriptor moments, Zernike invariant moments [10]–[13]. In [14], rotation invariant features are based on the angular information of the external and internal border points of the characters. In the third solution, the classifier does not particularly consider the multi-orientation knowledge. The constraint is directly absorbed by the classifier [15]. The multi-orientation can be taken into account by the classifier by feeding specifically multi-oriented images during its training [16]. The proposed technique is a combination of the second and third approach. Its interest is to combine some analytical methods like the Radon and the Fourier transform that aim at solving the multiorientation problem while letting the classifier solve some remaining aspects of the classification. In the second approach, the rotation invariant descriptors are usually given as input to the classifier. The novelty of the proposed method is to embed the extraction of rotation invariant features inside the classifier. The feature extraction process usually precedes the classification step. In our approach, features are extracted in a hierarchical manner, and both analytically and during the classifier training.

I.

I NTRODUCTION

Document analysis and recognition has been one of the application field where advanced new pattern recognition and neural network techniques have been proposed [1]. While current Optical Character Recognition (OCR) systems can offer a high accuracy, many challenges remain in different applications. Pattern recognition techniques dealing with several constraints: like the independence of the size, the position and the rotation; are essential for providing reliable systems in several document analysis applications. These deformations are also a great challenge for classifiers. Indeed, the gap between the accuracy of current techniques and a perfect recognition becomes rather small for straight characters that are only slightly deformed [2]. Among these constraints, we consider in this paper the fully multi-orientation (i.e. possible angle in [0 − 2π]) that we distinguish from local multi-orientation (e.g. possible angle in [±π/8]) [3]. At the application level in document analysis, the multi-orientation can be an issue for the complete recognition of textual content. First, in modern day documents, unusual and artistic layout can be a source of problems for determining the orientation of the text or isolated characters [4], [5]. In such documents, the lines supporting some characters can be difficult to determine, the text lines may be curved in various shapes (circle, wave,...). Therefore, characters of a single text line can be multi-oriented. In other documents, text lines of a single page may not all be parallel. 1520-5363/13 $26.00 © 2013 IEEE DOI 10.1109/ICDAR.2013.137

II.

I NPUT FEATURES

One classical step toward the extraction of rotation invariant features is to use the polar transform or the Radon transform [17]. This transformation allows transforming the rotation problem in the Cartesian coordinate into a circularshift problem in polar coordinate. Let consider an image I(r, θ) in polar coordinate of size Nr × Nθ where Nr and Nθ represent the number of radius and angles, respectively. The image is centered on its gravity center. The Radon transform is an efficient shape descriptor that was successfully used for symbol detection [18]. One common solution for extracting 668

circular-shift invariant features is to consider the amplitude of the Fourier Transform of each circle defining the image, i.e. of each line of the polar transform 1 ≤ r ≤ Nr . It is the application of the shift in time property of the Fourier transform. While this solution allows the creation of rotationinvariant features, one problem may arise: the set of features is also invariant to the different shift of the different circles. There is no synchronization between the circles. Figure 1 illustrates this problem. In this figure, we consider a shape that is represented in two disks. If we consider the solution defined previously with the amplitude from the Fourier Transform, every image of the Figure 1 will be represented with the same set of features. Therefore, while this features set is invariant to the rotation, it remains also invariant to other deformations of the shape, whereas we would prefer to have only the same set of features for the two images on the left. To solve this problem, one solution is to merge the information contained within all the disks at a particular angle into a set of channels, i.e. radial filters, Ci , 1 ≤ i ≤ Ns , where Ns is the number of channels:

III.

The classifier is based on a convolutional neural network. This popular model has been proven efficient for handwriting character recognition [1] and the classification of EEG (electroencephalogram) signals [19]. It allows automatic feature extraction within its layers and it keeps as input the raw information without specific pre-processing, except for instance, for scaling and centering the input vector. This type of neural network has many advantages when the input data contains an inner structure like for images and where invariant features must be discovered [20]. Contrary to the approach presented in [21] that also consider the Fourier transform between two hidden layers, only a subset of relevant frequencies based on the problem itself was selected. For processing multi-oriented characters, the neural network is decomposed into three parts: • Image processing: the neuron states represents gray-level pixel values in polar coordinate. The first step corresponds to the application of the Ns radial filters of size Nr × 1. Each radial filter corresponds to a function gi defined previously, 1 ≤ i ≤ Ns . • The transfer from the time domain to the frequency domain: this step transfers the neuron values in the frequency domain. We consider the amplitude of the Nf first frequencies. This set corresponds to a sub-set of the rotation invariant features; we assume here that the amplitude in the high frequencies are less discriminant for the classification. • Classification: the neuron values have a semantic in the frequency domain, they correspond to amplitude values of the signal. We denote by Nclass the number of classes in the problem. In the next sections, Nclass = 10 and corresponds to the classification of digits ([0..9]).

Ci (θ) = gi (I(r1 , θ), . . . , I(rNr , θ)) where gi is a function that fuses the information of the image at a particular angle θ. In the following parts, we consider a channel Ci as a linear combination of the different pixel values at all radial points: Ci (θ) =

Nr 

S YSTEM OVERVIEW

wj · I(rNj , θ))

j=1

where 1 ≤ θ ≤ Nθ . An easy solution for determining a channel consists of considering the mean value of the pixel values at different angles i.e. wj = 1/Nr for 1 ≤ j ≤ Nr . Then, the Fast Fourier Transform (FFT) can be applied on each Ci to obtain a set of shift-invariant features. It is worth mentioning that the first value of the FFT corresponds to the mean of the values. If we consider channels defined as:  I(rNj , θ) if i = j Ci (θ) = 0 otherwise

The neural network is composed of five layers. Each layer is composed of one or several maps. We define a map as a layer entity that has a specific semantic: each map of the first and second hidden layers is a channel. The neural topology is presented in Figure 2. The neural architecture is described as follows: • The input layer (L0 ). I(i, j) with 1 ≤ i ≤ Nr and 1 ≤ j ≤ Nθ . In the experiment, Nr = 15 and Nθ = 64. This layer corresponds to an image in polar coordinate. • The first hidden layer (L1 ). It is composed of Ns maps. We define L1 Mm , the mth map. Each map of L1 has Nθ neurons. This layer corresponds to Ns channels. Each map corresponds to a projection in one dimension (the angles) of the input image. This layer corresponds to Ns representation of the image. In L1 , the image is represented as Ns vectors. • The second hidden layer (L2 ). It is composed of Ns maps. Each map of L2 has Nf neurons. Nf corresponds to the number of selected frequencies. The neurons in L2 corresponds for each map to shift-invariant features of the corresponding map in L1 . Each map in L2 is rotation-invariant description of the input image. • The third hidden layer (L3 ). It is composed of one map of 100 neurons. This map is fully connected to the different maps of L2 . • The output layer (L4 ). This layer has only one map of Nclass neurons. This layer is fully connected to L3 .

then, the mean value of each circle corresponds to the features obtained by the ring-projection method, which transforms a 2D gray-level image into a rotation-invariant representation in a 1D ring-projection space. In the next section, we will consider inputs obtained with the polar transform or the Radon transform. We will propose a new method for finding automatically a set of functions gi through a convolutional neural network, i.e. we propose to find through the neural network how to learn the weights that define channels. Fig. 1. Rotation invariant challenge. If we consider rotation invariant features for each of the two disks composing each character, these four characters possess the same set of features.

669

Fig. 2.

from the time domain to the frequency domain. We define the function S, which assigns a frequency to a neuron j of L2 . As we select the Nf first frequencies, ym (S(i)) corresponds to the first values of the FFT ,0 ≤ i < Nf .

System overview.

The value of each neuron of L2 is defined as: x(2,m,j) = |ym (S(j))|

(4)

where v
ym (u) =

A. Propagation Although the definition of the convolutional neural network (CNN) is classical, the switch from L2 to L3 requires some details. We first define the value of a neuron in the layer l, in the map m at the position j by x(l,m,j) , or x(l,j) when there is only one map in the layer. The same way, we define σ(l,m,j) as the scalar product between a set of input neurons and the weights on the connection between these neurons and the neuron number j in the map m in the layer l : x(l,m,j) = f (σ(l,m,j) ), where f is a sigmoid function (hyperbolic tangent for L1 , logistic function for L3 and L4 ) [22].

B. Backpropagation The learning algorithm for tuning the weights (w) and thresholds (w0 ) of the network uses the backpropagation, by minimizing the least mean square error. At the initialization of the network, the weights and the thresholds of each neuron are initialized randomly with a standard distribution around ±1/Ninput where Ninput is the number of input links for each neuron. For L1 , the local gradients are calculated in a different way as there is no usual learned set of weights between L2 and L1 , but there is a relationship that has been defined in terms of the FFT. For L2 , the local gradient must be transferred back in the time domain from the frequency domain by using the Inverse Fourier Transform in order to calculate the local gradient of neurons in L1 .

• For L1 : i
Ii,j · w(1,m,i)

(1)

i=0

0 is a threshold. A set of weights w(1,m,i) with where w(1,m) m fixed, 0 ≤ i < Nr corresponds to a radial filter. In this layer, there are Nr + 1 weights for each map. • For L3 : 0 σ(3,j) = w(3,j) +

i
x(2,i,k) · w(3,i,k)

• For L4 the local gradient of each neuron j is defined by: δ(4,j) = (o(j) − x(4,j) ) · f  (x(4,j) ),

(2)

is a threshold. Each neuron of L3 is connected to where each neuron of L2 . In this layer, each neuron has Ns · Nf + 1 input weights. L3 contains 100(Ns · Nf ) input connections. • For L4 : i<100 

x(3,i) · w(4,i)

(7)

where o(j) is the expected value for the neuron j. • For L3 and L2 the local gradient is defined by:

k=0

0 w(3,j)

0 σ(4,j) = w(4,j) +

(6)

ym represents the Fourier Transform of Ym . ym (u) is based on x(1,m) with 64 points by using zero padding, T = 64. The values in ym are not only computed for the u that correspond to S(j), 0 ≤ j < Nf . Indeed, the phase must be conserved to reconstruct the signal in the time domain during the backpropagation. We note φm (u) the phase of the transformed signal.

We define σ(l,m,j) for the four layers. There is no σ(2,m,j) to calculate as the neurons in L2 does not have any input links. The neuron values of L2 are calculated directly from L1 , which is convolutional layer in the radial domain. L2 , L3 and L4 can be considered as a multi-layer perceptron (MLP) where L2 is the input layer, L3 is the hidden layer and L4 is the output layer. For L1 , each neuron of one map shares the same set of weights.

0 σ(1,m,j) = w(1,m) +

(5)

δ(l,m,j) = f  (x(l,m,j) ) ·

N out 

w(l+1,m,i) · δ(l+1,m,i)

(8)

i=0

where Nout is the number of neurons that have the neuron n(l,m,j) as input. The weights w(l+1,m,i) are on the connection between the neuron n(l,m,j) and the neurons n(l+1,m,i) . • For L1 , we first define:  δ(2,m,S −1 (v)) · expiφm (v) if S −1 (v) is defined zm (v) = 0 otherwise

(3)

i=0

0 is a threshold. Each neuron of L4 is connected where w(4,j) to each neuron of L3 .

The states of the neuron in L2 are not computed by a classical propagation. They correspond to the results of different signal processing parts. L2 is the result of L1 after the Fourier Transform and a selection of specific amplitudes. The Fourier Transform is applied on L1 in order to change

The Inverse Fourier Transform is applied on zm : Zm (u) =

v
670

2πi

zm (v)exp Nθ

uv

.

(9)

Only the real part of the error is used for updating the weights: δ(1,m,j) = |Zm (j)|.

classical database are the same than with the database with oriented images. For both databases, the recognition rate is better with Ns = 30 and Nf = 10. It suggests that low frequencies are enough for providing relevant rotation invariant features but it is also better to keep a high number of maps. The average recognition rate for the Arabic (respectively, Bangla) digits is 91.68%, (respectively, 77.25%) with the polar transform as input. With the Radon transform, the average recognition rate for the Arabic (respectively, Bangla) digits is 93.10%, (respectively, 77.01%). With the Radon transform, the accuracy is better with the Arabic digits while the polar transform allows slightly better results for Bangla digits.

(10)

Each weight is updated by Δw(l,m,i) : Δw(l,m,i) = γ · δ(l+1,m,j) · x(l,m,i) ,

(11)

where w(l,m,i) is the weight on the connection between the neurons n(l,m,i) and n(l+1,m,j) . For training, the learning parameter was set to γ = 0.4. IV.

DATABASE

The classifier was also evaluated when L1 is not used to highlight the relevance of the convolutional layer. In this new topology, the input layer corresponds to L2 and we consider a standard MLP for the classification. To stay consistent with the topology of the proposed classifier, the hidden layer has 100 neurons and the output layer has 10 neurons, like L3 and L4 . We consider inputs of size Nr × Nf where Nr = 15 and Nf = 10. When the FFT is applied on the polar transform, the mean accuracy of the MLP classifier on the MNIST and ISI database is 85.19% and 66.36%, respectively. When the FFT is applied on the Radon transform, the mean accuracy of the MLP classifier on the MNIST and ISI database is 76.34% and 62.86%, respectively. In this classifier, the polar transform provides better input than the Radon transform. The recognition rate is given for each class and each database in Table I. Compared to the proposed method, the MLP without the convolutional layer, i.e. without the radial filters, provides worst recognition rates for both database. These results were expected as the different ‘rings’ describing an image are not synchronized, as explained in the third section. Furthermore, some input features are similar. Indeed, the rings near the gravity center of the image contain information that is spatially closer than rings far from the center.

The classifiers were tested on the well known MNIST database [1]. Although this database has been widely used and the current state of the art methods allow a recognition rate superior to 99.5% for the original database, i.e. straight characters, this database remains interesting. While the initial database still proposes challenge to reach 100% of accuracy, recent studies have extented the initial database by adding distortion, rotation, by changing the background [16]. In the extented version of MNIST described in [16], the digits were rotated by an angle generated uniformly between 0 and 2π. The goal of such experiment is to recognize digits in typical conditions in present-day documents. As the proposed classifier is fully invariant to different orientation and not just tolerant to different orientations, it is not necessary to test the classifier on an extended version of MNIST with rotated images. As the initial images are centered on their gravity center, all the steps described in the previous section are invariant to the rotation. This means that different orientations of the database will provide the same classifier outputs as the features are invariant to the rotation. The classifier was also tested on Bangla digits from the ISI database [23]. This database was created at the Indian Statistical Institute, Kolkata, India. Bangla is the fourth most popular script in the world, used by more than 200 million people in the world [24]. The number of samples for each class is less important with the ISI database than for MNIST. Like for Arabic handwritten digits, current systems for the recognition of straight Bangla digits offer very good performance [25]. The MNIST (resp. ISI) database contain 60000 and 10000 (resp. 7938 and 4997) images for training and testing. V.

Other systems aim at learning rotation invariant or rotation tolerant features through the model by using multi-oriented characters. In [16], several classifiers (Deep Belief Network, Stacked Auto-associators, MLP and SVM) were evaluated and compared on a multi-oriented version of MNIST. The best results were obtained with an SVM with a radial basis function (RBF) kernel, with a recognition rate of 89.62%. In [16], the distribution of the samples between training, validation and test database was different than the presented experiments. To compare the resulst in similar conditions (based on the number of samples that were used), we have considered the topology (Ns = 30, Nf = 10) and Radon Transform as pre-processing for the input features. For the evaluation, 10000, 2000 and 50000 samples were considered for training, validation and test, respectively. The obtained recognition rate was 88.80%. For the neural classifiers tested in [16], the best recognition rate, 88.57%, was obtained with the Stacked Autoassociators method. The proposed strategy is different than the other methods as it is fully invariant to the rotation and it considers rotation invariant feature through the classification steps.

R ESULTS

Table I presents the recognition rates of the classifier for each class and the whole database, with input features obtained with the polar and Radon transform, respectively. 90% of the training database is used for effective training, the remaining 10% of the training database are used for the validation of the network. Several neural network topologies were tested and compared in relation to the number of channels Ns and the number of rotation invariant features Nf . The real rotation invariant features correspond to the neurons in L2 . They are a combination of the number of maps Ns and the number of selected amplitudes Nf . For each topology, the number of neurons in L2 is Ns · Nf . The results were identical when the classifier was tested on MNIST or MNIST with different orientations (every π/4), confirming that the extracted features are indeed invariant to the rotation. Hence, the results on the

VI.

C ONCLUSION

A new solution for the recognition of multi-oriented shapes has been presented and was successfully applied on handwrit-

671

TABLE I. Data (Ns , Nf ) 0 1 2 3 4 5 6 7 8 9 Total

R ECOGNITION RATE ( IN %) FOR EACH CLASS ,

Polar+CNN MNIST ISI (10,30) (30,10) (10,30) (30,10) 95.51 97.76 94.99 96.19 98.41 98.77 56.31 65.93 73.93 80.23 76.00 81.40 87.52 91.49 64.80 75.80 90.73 93.38 84.20 87.40 71.19 84.30 61.00 75.80 78.81 91.54 57.20 60.80 87.94 90.08 69.34 77.56 88.91 96.00 79.40 86.40 86.92 92.17 58.40 65.20 86.28 91.68 70.16 77.25

Radon+CNN MNIST ISI (10,30) (30,10) (10,30) (30,10) 96.43 98.47 94.59 95.79 97.97 98.68 52.91 69.74 74.52 86.53 72.20 78.60 82.87 92.77 79.40 82.40 90.84 93.69 83.40 83.00 81.28 86.77 63.20 71.00 87.27 94.36 60.00 68.00 85.70 93.97 74.35 77.35 86.45 92.81 89.60 89.20 87.91 91.87 46.40 55.00 87.26 93.10 71.60 77.01

ten character recognition. The goal was to provide a hybrid approach where features are extracted both with analytical methods, if it is possible, and with a learning strategy through the neural network otherwise. One interest is to not let the network learn everything but to let it focus on procedures that should be tuned in relation to the problem, i.e. radial filters, classification. The system considers as input images on which the Radon transform is applied. These descriptors are then processed by a convolutional neural network. In addition, this network includes the Fourier transform for extracting fully invariant features within its layer. The convolutional layer that extracts radial filters allowed to increase the accuracy compared to the classical approach that considers directly the Fourier transform on the inputs, in polar coordinate or after the Radon transform.

[11]

[12]

[13]

[14]

[15]

[16]

R EFERENCES [1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

DATABASE , AND METHOD .

[17]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun, “What is the best multi-stage architecture for object recognition?” in Proc. International Conference on Computer Vision (ICCV’09). IEEE, 2009, pp. 2146–2153. P. Y. Simard, Y. LeCun, J. S. Denker, and B. Victorri, “Transformation invariance in pattern recognition – tangent distance and tangent propagation,” International Journal of Imaging Systems and Technology, vol. 11, no. 3, p. 181197, 2001. U. Pal, P. P. Roy, N. Tripathy, and J. Llad´os, “Multi-oriented Bangla and Devnagari text recognition,” Pattern Recognition, vol. 43, no. 12, pp. 4124–4136, 2010. P. P. Roy, U. Pal, J. Llads, and M. Delalandre, “Multi-oriented touching text character segmentation in graphical documents using dynamic programming,” Pattern Recognition, vol. 45, no. 5, pp. 1972 – 1983, 2012. J. M. Ogier, C. Cariou, R. Mullot, J. Gardes, and Y. Lecourtier, “Interpretation of technical document: Application to french telephonic network,” Proceedings of ISAS, pp. 457–463, 1998. H. Hase, M. Yoneda, T. Shinokawa, and C. Suen, “Alignment of free layout color texts for character recognition,” Proc. of the 6th International Conference on Document Analysis and Recognition (ICDAR), pp. 932–936, 2001. S. X. Liao and M. Pawlak, “On image analysis by moments,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 18, pp. 254– 266, 1996. S. Adam, J. M. Ogier, C. Carlon, R. Mullot, J. Labiche, and J. Gardes, “Symbol and character recognition: application to engineering drawing,” Int. Journal of Document Analysis and Recognition, vol. 3, pp. 89–101, 2000. A. Khotanzad and Y. H. Hong, “Invariant image recognition by Zernike moments,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 12, no. 5, pp. 489–497, 1990.

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

672

MLP MNIST ISI Polar Radon Polar Radon 94.69 95.00 88.38 90.58 98.06 96.39 46.49 43.09 71.03 48.64 70.80 69.40 90.20 85.34 71.80 67.60 89.10 75.97 81.60 68.20 68.95 51.91 39.60 46.40 77.14 68.16 61.00 55.40 88.62 87.84 70.94 72.14 88.40 76.18 82.60 80.40 82.56 72.65 50.40 35.40 86.28 76.34 66.36 62.86

D. M. Tsai and H. C. Chiang, “Rotation-invariant pattern matching using wavelet decomposition,” Pattern Recognition Letters, vol. 23, pp. 191–201, 2002. W.-H. Wong, W.-C. Siu, and K.-M. Lam, “Generation of moment invariants and their uses for character recognition,” Pattern Recognition Letters, vol. 16, no. 2, pp. 115 – 123, 1995. T. N. Yang and S. D. Wang, “A rotation invariant printed chinese character recognition system source,” Pattern Recognition Letters, vol. 22, no. 2, pp. 85–95, 2001. U. Pal, F. Kimura, K. Roy, and T. Pal, “Recognition of English multioriented characters,” Proc. of the International Conference on Pattern Recognition, pp. 873–876, 2006. M. Fukumi, S. Omatu, T. Takeda, and T. Kosaka, “Rotation invariant neural pattern recognition system with application to coin recognition,” IEEE Neural Networks, vol. 3, pp. 272–279, 1992. H. Larochelle, D. Erhan, A. Courville, J. Bergstra, and Y. Bengio, “An empirical evaluation of deep architectures on problems with many factors of variation,” International Conference on Machine Learning proceedings, 2007. S. R. Deans, The Radon Transform and Some of Its Applications. Dover Publications, 2007. S. Tabbone, O. Ramos-Terrades, and S. Barrat, “Histogram of Radon transform. a useful descriptor for shape retrieval,” International Conference on Pattern Recognition, 2008. H. Cecotti and A. Gr¨aser, “Convolutional neural networks for P300 detection with application to brain-computer interfaces,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 33, no. 3, pp. 433–445, 2010. Y. Bengio and Y. LeCun, “Scaling learning algorithms towards AI,” in Bottou, L. and Chapelle, O. and DeCoste, D. and Weston, J. (Eds), Large-Scale Kernel Machines, MIT Press, 2007. H. Cecotti, “A time-frequency convolutional neural network for the offline classification of steady-state visual evoked potential responses,” Pattern Recognition Letters, vol. 32, no. 8, pp. 1145–1153, 2011. Y. LeCun, L. Bottou, G. Orr, and K.-R. M¨uller, “Efficient backprop, in neural networks: Tricks of the trade,” (G. Orr and Muller K., eds.), 1998. U. Bhattacharya and B. B. Chaudhuri, “Databases for research on recognition of handwritten characters of indian scripts,” 8th International Conference on Document Analysis and Recognition (ICDAR’05), pp. 789–793, 2005. B. B. Chaudhuri, “A complete handwritten numeral database of Bangla - a major Indic script,” 10th International Workshop on Frontiers in Handwriting Recognition (IWFHR’10), 2006. U. Bhattacharya and B. Chaudhuri, “Handwritten numeral databases of indian scripts and multistage recognition of mixed numerals,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 3, pp. 444–457, 2009.

A Radial Neural Convolutional Layer for Multi-oriented Character ...

average recognition rate for multi-oriented characters is 93.10% ..... [14] U. Pal, F. Kimura, K. Roy, and T. Pal, “Recognition of English multi- oriented characters ...

209KB Sizes 1 Downloads 304 Views

Recommend Documents

Convolutional Neural Network Committees For Handwritten Character ...
Abstract—In 2010, after many years of stagnation, the ... 3D objects, natural images and traffic signs [2]–[4], image denoising .... #Classes. MNIST digits. 60000. 10000. 10. NIST SD 19 digits&letters ..... sull'Intelligenza Artificiale (IDSIA),

A Deep Convolutional Neural Network for Anomalous Online Forum ...
as releasing of NSA hacking tools [1], card cloning services [24] and online ... We propose a methodology that employs a neural network to learn deep features.

24 - Convolutional Neural Networks.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

Convolutional Neural Networks for Eye Detection in ...
Convolutional Neural Network (CNN) for. Eye Detection. ▫ CNN has ... complex features from the second stage to form the outputs of the network. ... 15. Movie ...

pdf-0749\radial-basis-function-rbf-neural-network-control-for ...
... apps below to open or edit this item. pdf-0749\radial-basis-function-rbf-neural-network-contr ... design-analysis-and-matlab-simulation-by-jinkun-liu.pdf.

Inverting face embeddings with convolutional neural networks
Jul 7, 2016 - of networks e.g. generator and classifier are training in parallel. ... arXiv:1606.04189v2 [cs. ... The disadvantage, is of course, the fact that.

Convolutional Neural Networks for Small ... - Research at Google
Apple's Siri, Microsoft's Cortana and Amazon's Alexa, all uti- lize speech recognition to interact with these systems. Google has enabled a fully hands-free ...

Attention-Based Convolutional Neural Network for ...
the document are irrelevant for a given question. .... Feature maps for phrase representations pi and the max pooling steps that create sentence representations.

Interactive Learning with Convolutional Neural Networks for Image ...
data and at the same time perform scene labeling of .... ample we have chosen to use a satellite image. The axes .... For a real scenario, where the ground truth.

Deep Convolutional Neural Networks for Smile ...
Illustration of a convolutional neural network [4]. ...... [23] Ji, Shuiwang; Xu, Wei; Yang, Ming; Yu, Kai: 3D Convolutional Neural ... Deep Learning Tutorial.

Deep Convolutional Neural Networks On Multichannel Time Series for ...
Deep Convolutional Neural Networks On Multichannel Time Series for Human Activity Recognition.pdf. Deep Convolutional Neural Networks On Multichannel ...

Fine-tuning deep convolutional neural networks for ...
Aug 19, 2016 - mines whether the input image is an illustration based on a hyperparameter .... Select images for creating vocabulary, and generate interest points for .... after 50 epochs of training, and the CNN models that had more than two ...

Locally-Connected and Convolutional Neural ... - Research at Google
Sep 6, 2015 - can run in real-time in space-constrained mobile platforms. Our constraints ... sponds to the number of speakers in the development set, N. Each input has a ..... the best CNN model have the same number of parameters and.

lecture 17: neural networks, deep networks, convolutional ... - GitHub
As we increase number of layers and their size the capacity increases: larger networks can represent more complex functions. • We encountered this before: as we increase the dimension of the ... Lesson: use high number of neurons/layers and regular

The Power of Sparsity in Convolutional Neural Networks
Feb 21, 2017 - Department of Computer Science .... transformation and depend on number of classes. 2 ..... Online dictionary learning for sparse coding.

Multiple-Layer Neural Network Applied to Phase ... - Semantic Scholar
collimated fringes (usually with sinusoidal intensity pattern) will be projected onto the target. 20 surface. Cameras would be placed from a different view angle. Deformation of fringe pattern. 21 will appear in the captured images and surface shape

Mo_Jianhua_CL12_Relay Placement for Physical Layer Security A ...
Sign in. Page. 1. /. 4. Loading… .... PDF (d)=1 − dα. sedα. re. (dα. rd + dα .... In Fig. 2, we plot. PDF (d) and PRF (d) as functions of the relay position. We. find that ...

Mo_Jianhua_CL12_Relay Placement for Physical Layer Security A ...
Mo_Jianhua_CL12_Relay Placement for Physical Layer Security A Secure Connection Perspective.pdf. Mo_Jianhua_CL12_Relay Placement for Physical ...

single layer feedforward neural networks pdf
Page 1 of 1. File: Single layer feedforward neural. networks pdf. Download now. Click here if your download doesn't start automatically. Page 1 of 1. single layer feedforward neural networks pdf. single layer feedforward neural networks pdf. Open. Ex