Online Bangla Word Recognition Using Sub-Stroke ...

Viewer
Transcript

2010 12th International Conference on Frontiers in Handwriting Recognition

Online Bangla Word Recognition Using Sub-Stroke Level Features and Hidden Markov Models Gernot A. Fink, Szil´ard Vajda TU Dortmund University 44221 Dortmund, Germany {Gernot.Fink,Szilard.Vajda}@udo.edu

Ujjwal Bhattacharya, Swapan K. Parui, Bidyut B. Chaudhuri Indian Statistical Institute Kolkata 700108, India {ujjwal,swapan,bbc}@isical.ac.in

analytic approaches an input word is generally segmented into smaller components such as pseudo-characters, graphemes, or allographs (cf. e.g. [10]). These smaller components are classified first before final recognition of the input word is performed. In contrast, no segmentation is performed in holistic approaches and input words are classified directly (cf. [11]). Consequently, holistic methods can only be applied for tasks with a limited lexicon, which, however, is quite common in automatic handwriting recognition. Recognition methods based on Hidden Markov Models (HMMs) can be considered as hybrids between analytic and holistic techniques, as they follow the analysis-by-synthesis approach. This means that an HMM-based writing model can be constructed analytically from smaller modeling units (e.g. characters). However, a segmentation into these units is not required prior to recognition. Rather, segmentation and classification is performed in an integrated scheme, which led to the term segmentation-free being used for HMM-based approaches (cf. [12]). In this paper we present an HMM-based approach to online Bangla word recognition. It is based on a sub-stroke level feature representation of the script. For building the writing model we considered different model structures as this is a crucial aspect in HMM-based methods. Especially for a script like Bangla, which has been studied rather little up to now, this choice is far from obvious. We will show that the use of appropriate sub-word units for constructing the writing model is important and that context-dependent subword units significantly outperform context-independent ones on our writer independent Bangla word recognition task.

Abstract—For automatic recognition of Bangla script, only a few studies are reported in the literature, which is in contrast to the role of Bangla as one of the world’s major scripts. In this paper we present a new approach to online Bangla handwriting recognition and one of the first to consider cursively written words instead of isolated characters. Our method uses a substroke level feature representation of the script and a writing model based on hidden Markov models. As for the latter an appropriate internal structure is crucial, we investigate different approaches to defining model structures for a highly compositional script like Bangla. In experimental evaluations of a writer independent Bangla word recognition task we show that the use of context-dependent sub-word units achieves quite promising results and significantly outperforms alternatively structured models. Keywords-online handwriting recognition; Bangla script; substroke level features; hidden Markov models;

I. I NTRODUCTION With the availability of electronic tablets at a cost affordable by common Indians, online handwriting recognition for Indian scripts has gained enough significance. On the other hand, both for standard and portable miniature computing devices (such as palmtop, PDA, mobile phone) equipped with pen-based input technology, non-keyboard based methods for data entry have additional importance particularly in the context of Indian scripts having large symbol sets. Significant research works on online unconstrained handwriting recognition can be found in the literature (cf., e.g., [1]–[3]). However, the same is not true for Indian scripts. Most of the available online handwriting recognition works on an Indian script deal with isolated character symbols only [4]–[7]. To the best of our knowledge, there exists only one prior work [8] in the literature, which deals with unconstrained cursive online handwriting in Bangla. On the other hand, Bangla is the second most popular language and script in the Indian sub-continent and the fifth most popular of the world. It is also the official language / script of the country Bangladesh, and is used by more than 240 million people around the world. Automatic recognition systems for unconstrained handwriting can be broadly categorized with respect to the degree of segmentation performed prior to recognition (cf. [9]). In 978-0-7695-4221-8/10 $26.00 © 2010 IEEE DOI 10.1109/ICFHR.2010.68

II. R ELATED W ORK For online handwriting recognition many different approaches were proposed in the literature and a wide variety of pattern recognition methods were applied (cf., e.g., [2]). Many successful methods are based on HMMs (cf., e.g., [13], [14]) – a technique also widely used in the field of offline handwriting recognition (cf. [15]). The success of all recognition methods heavily relies on an appropriate pre-processing of the script and the computation of a powerful feature representation. For online handwriting 393

recognition pre-processing usually comprises normalization of slant, baseline orientation, and writing speed (cf. [16]). The computed feature representations try to capture local shape properties of the pen trajectory together with dynamic information about the writing process (cf., e.g., [17]). The context considered within the pen trajectory can either be defined as a sliding window of fixed size or by presegmenting the pen trajectory into stroke-like segments (cf. [18]). In the earliest available work on segmentation of handwritten cursive Bangla words [19], offline word images were considered. In this work, a recursive contour following approach was proposed. In [20], a certain water reservoir technique was used for segmentation of handwritten Bangla word images, where the ”water reservoirs” are considered being the cavities between the different letter segment. The features derived from these notions are the centroid of the reservoirs, the size of the reservoir, etc. A fuzzy feature based segmentation technique for Bangla word images was proposed in [21]. In another related work [22], segmentation of touching characters of printed Bangla and Devanagari scripts were considered. Reports on online isolated Bangla character/numeral recognition are found in [5]–[7]. In an early work on online Bangla handwriting recognition [6], isolated basic characters of Bangla were considered. In [7], a HMM was used for recognition of online handwritten isolated Bangla numerals. A direction code based feature vector was used in [5] for recognition of online handwritten Bangla basic characters. A similar attempt has been proposed in [4] for recognizing online handwritten Tamil characters. Instead of following the natural course of writing, the authors prefer to match first the different strokes using a flexible string matching algorithm and then a horizontal block is considered to reconstruct the character. As stated in the paper, while this horizontal block is obvious in Tamil, for other Indian scripts the grouping of strokes becomes more complicated.

STEP 1: i = 0; j = 1; STEP 2: Compute θ = angle(Pi , Pj ); STEP 3: If (j < N + 1) Compute αj = angle (Pi , Pj ); βj = angle (Pj−1 , Pj ); else go to STEP 6 STEP 4:If (min{|θ − αj |, 360◦ − |θ − αj |} < 90◦ and min{|βj−1 − βj |, 360◦ − |βj−1 − βj |} < 90◦ ) j = j + 1; GOTO STEP 3 STEP 5: Segment the stroke at Pj−1 i = j; j = j + 1; GOTO STEP 2 STEP 6: STOP Figure 1.

Stroke segmentation algorithm

Finally, it is re-sampled so that the curve distance between two consecutive re-sampled points is 7.0. In the present approach, we divide each stroke of the preprocessed word sample into several sub-strokes as in [23]. In fact, whenever a stroke has complex arc pattern, it is divided into sub-strokes of simpler shapes. The details are described below. Suppose the sequence of points in a stroke is P0 , P1 , P2 , . . . , PN , where N +1 is the total number of points in the stroke. The algorithm describing the segmentation process is given in Fig. 1. We compute the length of each sub-stroke obtained by segmenting a stroke. If the length of such a sub-stroke is less than 5% of the height of the pre-processed sample, then we do not consider the same for further processing. Since we normalize the height of the input sample at 100, each sub-stroke of length less than 5 is discarded. An example of the sub-strokes obtained for an input sample is shown in Fig. 2.

III. O NLINE BANGLA W ORD R ECOGNITION For online Bangla word recognition we adopt the HMMbased approach. It works on a stroke-level feature representation of the script that will be explained in the following. Afterwards, we will describe different approaches to build HMM-based writing models for Bangla script, which differ in the internal structure imposed and the basic modeling units used, respectively.

Figure 2.

Sub-strokes obtained for a sample word

B. Feature Extraction From each sub-stroke S, 8 scalar feature values representing its shape, size and relative position are computed. Let the sub-stroke S consist of the sequence of n + 1 points P0 , P1 , P2 , . . . , Pn with li as the length of the line segment joining Pi−1 and Pi and (xi , yi ) be the co-ordinates of Pi . The sub-stroke here is represented as a polyline which is a sequence of straight line segments. The length of the subn X stroke S is given by l = lk . We next find six equidistant

A. Preprocessing and Sub-Stroke Segmentation Extraction of sub-strokes from an input character sample is preceded by a sequence of preprocessing operations and these include size normalization, smoothing and re-sampling. We normalize the height of the input word to 100 points and its aspect ratio is kept the same as the original one. Then we apply moving average method to smooth the sample.

k=1

394

Therefore, there are two extreme approaches for defining basic units of writing models for Bangla script. First, the structure of the character shapes can be decomposed to its full extent into elementary units leaving only basic characters and modifiers as elementary models. Second, the compositional structure of the script can be completely preserved. In such a model basic units will be used, which correspond to all potential character shapes that can be built by grouping basic characters into compounds and adding modifier characters to basic or compound character shapes. Figure 3.

Bangla Script: Basic character shapes

Figure 4.

Bangla Script: Compound characters

The first approach has the obvious advantage of using a rather small inventory of basic units only. Therefore, it can be expected that samples for all basic shapes will be sufficiently available in the training data and robust parameter estimates for the associated models can be obtained. However, such an approach is unable to capture mutual contextual influences between basic shapes that are combined into a more complex character shape. Such local context is more adequately described by the second approach. However, the inventory of basic models will be quite large. Consequently, several of the basic units considered will be quite rarely observed in the training data – a fact which will lead to poor parameter estimates for the associated models.

points Q0 , Q1 , . . . , Q5 on S such that Q0 = P0 , Q5 = Pn and the distance between Qi−1 and Qi along the polyline is l/5 for each i = 1, 2, . . . , 5. Let the angle made with the positive x-axis while moving from Qi−1 to Qi be θi , i = 1, 2, . . . , 5 (0◦ ≤ θi < 360◦ ). Let W = max xi − min xi be the width of the input i i handwritten character. The normalized coordinates of its center of gravity (x, y) are X = x/W and Y = y/100. Now the 8-dimensional feature vector for the sub-stroke S is defined as F = (θ1 , . . . , θ5 , X, Y , l).

The dilemma that capturing contextual influence by enlarging modeling units and thus deteriorating parameter estimates can partly be resolved by using context-dependent sub-word units – a technique that was invented and pioneered in the field of automatic speech recognition. There is evidence in the literature that the technique is also beneficial for online handwriting recognition tasks [13], [14] but offers little advantage for offline recognition [24].

C. Writing Model

For the definition of context-dependent sub-word units for Bangla we started from the fully decomposed model1 . Units were defined by including one symbol of context to the left and right of every elementary character unit. By introducing context dependence the total number of subword units is considerably enlarged. Therefore, appropriate techniques have to be applied that allow the grouping of similar units which share parameters within the model and therefore, can be trained robustly. As in our work on contextdependent modeling for offline recognition we apply a greedy clustering approach to the intermediate state-space defined by the overall writing model withouth any grouping applied [24]. Compared to alternative methods this technique has the advantage of determining to-be-shared parameter sets within a complex model in a purely data-driven manner not requiring any expert knowledge.

The use of HMMs for pattern recognition offers the important advantage of solving the problem of segmentation and classification within a unified framework (cf. [12]). For the problem of word recognition this means that word models can be constructed from basic units – e.g., characters – without requiring the explicit segmentation of the appropriate units during the recognition process. In contrast to a holistic recognition method, which would in principle be feasible for a word recognition problem with limited lexicon, this sharing of parameters within a more complex model structure greatly improves the robustness of the parameter estimates obtained during training. In contrast to Roman script, the choice of basic modeling units is not quite obvious for Bangla. The script is built around a set of basic character shapes (Fig. 3). From these, two types of more complex characters can be be constructed. By adding a diacritic modifier character to a basic shape the phonetic properties of the basic character can be altered or augmented (Fig. 5). In addition to this process multiple characters may be merged into compound characters with quite complicated structure (Fig. 4). These compound characters can again be complemented by adding modifiers.

1 In order to reflect the special temporal characteristic of some modifiers that sort-of surround the modified character in the appearance of the script (see, e.g., modifier au in Fig. 5). we split up the respective models into two basic components.

395

Figure 5.

Bangla Script: Vowel modifiers (left) and consonant modifiers (right)

i.e. approximately two samples per city name considered). The independent test set consists of data provided by the remaining 76 writers (6,516 word samples in total). B. Recognition System Our online Bangla word recognition system was built using the Open-Source development environment ESMERALDA [25]. It is based on the sub-stroke level feature representation described in section III-B and uses semi-continuous HMMs just like our previous offline handwriting recognition systems (cf. e.g. [26]). We compared four different structures of the writing model – the three model variants discussed in section III-C and a holistic model. In the latter a single HMM without any further internal structure corresponds directly to each city name in the lexicon. All the other models apply the analysis-by-synthesis paradigm and, consequently, are based on the use of sub-word units. The most compact model is obtained by using only the most elementary pseudo-characters that define Bangla writing, i.e. basic character shapes and modifiers. For our task we obtain 52 basic units (39 basic characters, 9 atomic and 2 discontinuous modifiers with 2 partial models each). A considerably larger writing model results from using all combinations of basic characters and modifiers in potential combined character shapes. We obtain 93 basic units in total for the lexicon considered. The largest potential model results from considering each pseudo-character in the context of its left and right neighbor as a distinct unit. However, due to limited variations in the actual contexts present only 345 context-dependent units are required for our task. All elementary models have Bakis topology (i.e. the skipping of single state in a linear sequence is allowed) and share a codebook of 1k Gaussians with diagonal covariance matrices. The number of model states is automatically determined depending on the length of the respective unit in the initialization data. The parameters of the combined writing model were estimated in two phases. For initialization a codebook was estimated on the training data by applying the k-means algorithm. Using this set of densities, the remaining HMM parameters were initialized on the small set of manually labelled data. Then joint re-estimation of the model parameters was performed by applying standard Baum-Welch training (20 iterations). Using this first model a forced alignment of the complete set of training data was computed. In the second phase of model training, the above procedure was repeated using the large set of labelled data for initialization purposes. For estimating context dependent

Figure 6. A few word samples from the database used in the present work

IV. E VALUATION In order to evaluate our approach to online Bangla word recognition we performed experiments on a writer independent recognition task. A. Dataset of Online Bangla Handwritten Words We used a database of 14,073 Bangla online unconstrained handwritten samples of 50 Indian city names. These samples were written by 163 writers. Each of them wrote 1-4 times each of the 50 city names. A few samples written by these writers are not considered in the database due to improper capture of the signal by the data capturing devices. We used three different devices for collection of the samples. These include WACOM Intuous 2, Genius G-note 7000 and HP tc4400 tablet PC. The writers include different sections of the population with mother tongue Bangla. No restrictions were imposed on the writing style. Every writer wrote several words per page which were automatically segmented into individual words. Each segmented word sample is stored in UNIPEN format including additional information about the writers such as their age, sex, profession, writing skill, writing habit etc. A few samples from this database are shown in Fig. 6. On this database a writer independent recognition task was defined by splitting the data into disjoint training and test sets. In this task data from 87 writers that provided 7,557 samples of handwritten city names in total are available for model training. Though all the training data is annotated on wordlevel, detailed manual annotations on sub-character level is available for only a tiny fraction of this data (97 samples,

396

R EFERENCES

Table I R ECOGNITION RESULTS ACHIEVED FOR DIFFERENTLY STRUCTURED

[1] J. A. Pittman, “Handwriting recognition: Tablet PC text input,” IEEE Computer, vol. 40, no. 9, pp. 49–54, 2007.

WRITING MODELS

Writing Model holistic combined characters pseudo-characters context-dependent

units

Complexity indep. states

Rec. Rate [%]

50 93 52 293

1,645 876 286 362 (1,714)

88.0 88.5 91.0 93.1

[2] R. Plamondon and S. N. Srihari, “On-line and off-line handwriting recognition: A comprehensive survey,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 63–83, 2000. [3] C. C. Tappert, C. Y. Suen, and T. Wakahara, “The state of the art in online handwriting recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 12, no. 8, pp. 787–808, 1990.

units, the greedy state-clustering in the virtual state space of 1,714 states was performed after the second phase of model training was completed.

[4] K. H. Aparna, V. Subramanian, M. Kasirajan, G. V. Prakash, V. S. Chakravarthy, and S. Madhvanath, “Online handwriting recognition for Tamil,” in Proc. Int. Workshop on Frontiers in Handwriting Recognition, Tokyo, 2004, pp. 438–443.

C. Experimental Results The recognition results obtained in our writer independent online Bangla word recognition task are summarized in Tab. I. Together with the actual recognition rates achieved, the model complexity is given by specifying the number of basic units and the number of independent HMM states within the overall model. As expected, the worst results are achieved by applying the holistic writing model, which treats all word samples separately. Interestingly, the model using all 93 potential combined character shapes as basic units, performs only slightly (but not significantly) better. The best result for context-independent sub-word units is achived when only the basic character shapes and the modifiers are considered as basic pseudo-character units. However, by differentiating these pseudo-character units depending on their left and right context and defining context-dependent sub-word units, a further significant improvement can be obtained and a quite satisfactory recognition rate of more than 93% is achieved.

[5] U. Bhattacharya, B. K. Gupta, and S. K. Parui, “Direction code based features for recognition of online handwritten characters of Bangla,” in Proc. Int. Conf. on Document Analysis and Recognition, vol. 1, 2007, pp. 58–62. [6] U. Garain, B. B. Chaudhuri, and T. Pal, “Online handwritten indian script recognition: a human motor function based framework,” in Proc. Int. Conf. on Pattern Recognition, 2002, pp. 164–167. [7] S. K. Parui, U. Bhattacharya, B. Shaw, and K. Guin, “A hidden markov model for recognition of online handwritten Bangla numerals,” in Proc. of the 41st National Annual Convention of CSI, 2006, pp. 27–31. [8] U. Bhattacharya, A. Nigam, Y. S. Rawat, and S. K. Parui, “An analytic scheme for online handwritten Bangla cursive word recognition,” in Proc. Int. Conf. on Frontiers in Handwriting Recognition, 2008, pp. 320–325.

V. C ONCLUSION In this paper we presented a new approach to online Bangla handwritten word recognition. It is based on hidden Markov models, which can be considered the major modeling paradigm used for both online and offline word recognition. Our approach combines a sub-stroke level feature representation of Bangla script and an HMM-based writing model using context-dependent sub-word units. In experimental evaluations on a writer independent online Bangla handwriting recognition task, this model achieved quite promising results and significantly outperformed models using alternative sub-word unit definitions, as, e.g., pseudocharacter models.

[9] N. Arica and F. Yarman-Vural, “An overview of character recognition focused on off-line handwriting,” IEEE Trans. Systems,Man, and Cybernetics-Part C: Applications and Rev., vol. 31, no. 2, pp. 216–233, 2001. [10] A. Koerich, R. Sabourin, and C. Suen, “Large vocabulary offline handwriting recognition: A survey,” Pattern Anal. Appl., vol. 6, no. 2, pp. 97–121, 2003. [11] S. Madhvanath and V. Govindaraju, “The role of holistic paradigms in handwritten word recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 23, pp. 149– 164, 2001. [12] G. A. Fink, Markov Models for Pattern Recognition. Berlin Heidelberg: Springer, 2008.

ACKNOWLEDGMENTS Part of this work was supported by the German Research Foundation (DFG) within project Fi799/3. The authors of this paper thankfully acknowledge Mr. Tanmay Mondal, CVPR Unit, Indian Statistical Institute, Kolkata for his unconditional help in the preparation of the database and the development of code in C for feature computation.

[13] G. Rigoll, A. Kosmala, and D. Willett, “An investigation of context-dependent and hybrid modeling techniques for very large vocabulary on-line cursive handwriting recognition,” in Proc. Int. Workshop on Frontiers in Handwriting Recognition, Taejon, Korea, 1998, pp. 429–438.

397

[21] S. Basu, R. Sarkar, N. Das, M. Kundu, M. Nasipuri, and D. K. Basu, “A fuzzy technique for segmentation of handwritten Bangla word images,” in Int. Conf. on Comput.: Theory and Appl., 2007, pp. 427–433.

[14] J. Tokuno, N. Inami, S. Matsuda, M. Nakai, H. Shimodaira, and S. Sagayama, “Context-dependent substroke model for HMM-based on-line handwriting recognition,” in Proc. Int. Workshop on Frontiers in Handwriting Recognition, 2002, pp. 78–83.

[22] U. Garain and B. B. Chaudhuri, “Segmentation of touching characters in printed Devanagri and Bangla scripts using fuzzy multifactorial analysis,” IEEE Trans. Systems,Man, and Cybernetics-Part C: Applications and Rev., vol. 22, no. 4, pp. 164–167, 2002.

[15] T. Pl¨otz and G. A. Fink, “Markov Models for Offline Handwriting Recognition: A Survey,” Int. Journal on Document Analysis and Recognition, vol. 12, no. 4, pp. 269–298, 2009. [16] W. Guerfali and R. Plamondon, “Normalizing and restoring on-line handwriting,” Pattern Recognition, vol. 26, no. 3, pp. 419–431, 1993.

[23] T. Mondal, U. Bhattacharya, S. K. Parui, K. Das, and V. Roy, “Database generation and recognition of online handwritten Bangla characters,” in Int. Workshop on Multilingual OCR (MOCR), Barcelona, Spain, 2009.

[17] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, and J. Schmidhuber, “A novel connectionist system for unconstrained handwriting recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, 2009.

[24] G. A. Fink and T. Pl¨otz, “On the use of context-dependent modelling units for hmm-based offline handwriting recognition,” in Proc. IEEE Int. Conf. on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 2007, pp. 729– 733.

[18] J. G. A. Dolfing and R. Haeb-Umbach, “Signal representations for Hidden Markov Model based on-line handwriting recognition,” in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. IV, M¨unchen, 1997, pp. 3385–3388.

[25] ——, “Developing pattern recognition systems based on Markov models: The ESMERALDA framework,” Pattern Recognition and Image Analysis, vol. 18, no. 2, pp. 207–215, 2008.

[19] A. Bishnu and B. B. Chaudhuri, “Segmentation of Bangla handwritten text into characters by recursive contour following,” in Proc. Int. Conf. on Document Analysis and Recognition, 1999, pp. 402–405.

[26] T. Pl¨otz and G. A. Fink, “Camera-based whiteboard reading: New approaches to a challenging task,” in Proc. Int. Conf. on Frontiers in Handwriting Recognition, Montreal, 2008.

[20] U. Pal and S. Datta, “Segmentation of Bangla unconstrained handwritten text,” in Proc. Int. Conf. on Document Analysis and Recognition, 2003, pp. 1128–1132.

398

Isolated Tamil Word Speech Recognition System Using ...

offline handwritten word recognition using a hybrid neural network and ...

Developmental Word Recognition

Reexamining the word length effect in visual word recognition ... - crr

SPEAKER-TRAINED RECOGNITION USING ... - Vincent Vanhoucke

Efficient Speaker Recognition Using Approximated ...

IC_55.Dysarthric Speech Recognition Using Kullback-Leibler ...

Rapid Face Recognition Using Hashing

Activity Recognition Using a Combination of ... - ee.washington.edu

Customized Cognitive State Recognition Using ... - Semantic Scholar

Speech Recognition Using FPGA Technology

SPEAKER-TRAINED RECOGNITION USING ... - Vincent Vanhoucke

Automatic speaker recognition using dynamic Bayesian network ...

Rapid Face Recognition Using Hashing

Speech Recognition Using FPGA Technology

Face Recognition Using Eigenface Approach

Syllabic length effects in visual word recognition ... - Semantic Scholar

Word Embeddings for Speech Recognition - Research at Google

Syllabic length effects in visual word recognition and ...

An Offline Cursive Handwritten Word Recognition System

optical character recognition pdf to word

Word Translation Disambiguation Using Bilingual ...