Two-Cost Stroke Segment Grouping Mechanism for Off-line Cursive Hand-written Word Recognition ♦
Yong Haur Tay, ∇Marzuki Khalid, αStefan Knerr, ♣Pierre-Michel Lallican, ♣ Christian Viard-Gaudin
♦∇
Centre for Artificial Intelligence & Robotics (CAIRO), Universiti Teknologi Malaysia, Jalan Semarak, 54100 Kuala Lumpur, Malaysia. ♦
[email protected], ∇
[email protected]
α Vision Objects SARL, 11, Rue de la Fontaine Caron 44300 Nantes, France.
♣
Laboratoire SEI/EP CNRS 63, IRESTE, Université de Nantes, Rue Christian Pauc, La Chantrerie, BP 60601, 44306 Nantes Cedex 3, France
Abstract: This paper presents a new technique in constructing bigger objects called graphemes from smaller stroke segments of hand-written words. This algorithm is part of our offline cursive hand-written word recognition system. The main idea behind is based on the competition between each stroke segment to become a grapheme or to join an existing grapheme. It eliminates many IF-THEN-ELSE rules in the algorithm by replacing them with quantifiable cost values. This technique helps to obtain reasonable recognition accuracy. This research is based on the French cheque words taken from the IRONOFF database. Keywords: cursive word recognition, segmentation, grapheme, off-line word recognition, script recognition.
1
INTRODUCTION
Cursive hand-written word recognition is one of the most challenging tasks in pattern recognition, in view of the great variability in the handwriting styles. It is the problem of transforming two-dimensional spatial form of handwriting into symbolic representation. Generally, it can be divided into online and offline recognition. Online recognition systems refer to those systems where the data to be recognised is input through a tablet digitizer, which acquires the position of the pen tip, its pressure and velocity as the user writes. On the other hand, offline recognition system acquires the spatial information of the handwritings from paper document using scanner or camera. Online recognition deals with onedimensional representation of the input, whereas the offline recognition involves analysis of two-dimensional image. Offline
handwriting recognition systems have many applications such as reading legal and courtesy amounts on cheques [KAB+98] [ABP+98], reading addresses on mail pieces, routing facsimile messages, reading input fields on forms and so on. There are mainly two approaches to analyse the off-line cursive handwritings: holistic and analytic approaches. The holistic approach bypasses the segmentation process, by recognising the word using global features. The analytic approach segments the word into smaller objects before recognition. The formal approach is simple but offer lower performance; however, the latter approach has more potential in gaining better performance but would require more processing. One of the most tedious processes in the analytic approach is to determine the segmentation points, which is a research in itself.
Our approach is to segment the words into smaller strokes based on the regularity and curvature of the handwritings. The stroke segments are sometimes very small in nature, thus carry less information. As a result, we develop a new algorithm named Two-cost grouping, to group one or more stroke segments into bigger objects, called graphemes. In this research, we apply the algorithm to recognise French cheque words as our hidden Markov models (HMMs) are trained with such data. An example of a French cheque is given in Figure 1. In this paper, we first present the organisation of our cursive word recognition system in Section 2. Section 3 describes in brief the stroke segmentation used in our recognition system. Section 4 describes the mechanism and algorithm of our Two-cost segment grouping mechanism. Section 5 describes the succeeding modules that follow the two-cost grouping mechanism. Section 6 shows the experiments and the recognition results using this approach. Section 7 concludes the paper.
2
SYSTEM OVERVIEW
The overall recognition system organisation can be illustrated as in Figure 2. The offline information of the cursive hand-written French cheque words is used in the recognition system as raw information. The
segmentation process cuts word into smaller stroke segments based on the regularity and curvature of strokes [Lal96][LV97][LV98]. These stroke segments are sometimes too small in nature and distributed in twodimensional (2D) space. Hence, the segment grouping process is needed in order to group several stroke segments together, to form more informative objects, called graphemes. Grapheme is either a character or a subcharacter. The graphemes are ordered in left-to-right sequence, as required by the HMMs. Currently, we are using discrete density HMMs, which only consider discrete observation symbols, therefore, we need to represent a grapheme with an observation symbol. Thus, a vector quantization process is required to classify graphemes with similar features into clusters, where each cluster of grapheme can be represented by a symbol. First, we extract the most discriminant information from each of the grapheme in the feature extraction process. Next, the K-means clustering algorithm is used to classify graphemes with similar features into the same clusters. After that, a sequence of observation symbols is presented to the HMMs for either training or recognition purpose.
Figure 1. An example of French cheque.
256-grey scale image
Stroke Segmentation
stroke segments
Segment Grouping
graphemes
Left-Right Ordering
Ordered grapheme sequence
Feature Extraction Height Width Ratio …
1
Height Width Ratio …
Height Width Ratio …
… 2
T
Feature vector sequence
Vector Quantization 87, 67, 44, 2, 97, 3, 8,… Symbol sequence
… Hidden Markov Models
…
Word likelihood for each word in dictionary
Figure 2.
Recognition system organisation.
Model “trente” sequence
Model “trois” sequence
3
SEGMENTATION
The stroke segmentation process is described in [Lal96][LV97][LV98] from IRESTE. The main idea behind this process is to cut a word into smaller pieces of stroke segments based on the regularity and curvature of strokes. A Canny-Deriche edge detector is used to extract edges of a word. Once all the edges are detected, a matching process is undertaken. Each edge pixel is matched with the best corresponding edge pixel of the opposite edge. When no match is possible, then, the pixel belongs to non-regular region (or singular region). It derives from this procedure a set of lists of ordered cross-sections, which represents the whole tracing, a particular list corresponding to a sub-stroke of tracing between two intersections points (singular region) or an intersection point and a terminal point. Figure 3 shows the original 256-grey level image of the word ‘trente’. After the stroke segmentation, the result is illustrated in Figure 4.
Figure 3. - ‘trente’.
Original image of the word
Figure 4. Result of the segmentation process. Each segment is illustrated in different colour. The limitation of this stroke segmentation process, compared to other segmentation processes, based on ligatures as the segmentation points are: Ø The stroke segments are distributed in 2D space, thus they are not left-right ordering.
Ø The stroke segments are too small in nature, which contain little structural information.
4
TWO-COST GROUPING
The stroke segmentation process generates a 2D distribution of stroke segments. On the other hand, HMMs expect 1D signals as input. Therefore, it is necessary to have an intermediate process to bridge the gap between the two processes. Consequently, a segment grouping process is implemented in order to group the strokes into bigger objects, called graphemes. A grapheme is either a character or a sub-character. The graphemes will then be ordered from left to right, resulting in 1D signals. Since graphemes consist of one or more stroke segments, they provide more structural information than stroke segments. The main objectives of the segment grouping process are: 1. To have a quantitative evaluation for the grouping process instead of using the usual IF-THEN-ELSE rule-based approach for finding segmentation points in the handwriting signal. 2. Group segments into a left-right ordered sequence of graphemes 3. No use of the character concept. 4. Avoid grouping of segments from different character into a single grapheme. 5. Grapheme should provide more structural information than stroke segments. We use a novel approach to solve this grouping problem, named Two-cost Grouping. The main idea behind this approach is to iteratively group the stroke segments by using their geometrical features (such as height, width, distance to the corezone, etc.) and their context (distance to their neighbouring graphemes, height and width of their neighbouring graphemes, etc.). For each stroke segment, we calculate two cost functions, namely Cost1 and Cost2. Cost1 is defined as the cost for a stroke segment to become a grapheme by itself, and Cost2 is defined as the cost for a stroke
Start
Initialize Segment List Empty Grapheme List
Y
Segment List EQUAL NULL
End
N For Each seg in Segment List Cost1List[seg] = Cost1(seg) Cost2List[seg] = Cost2(seg, GraphemeList)
Action = TakeAction(Cost1List, Cost2List)
Y
N
Action EQUAL NewGrapheme
Grapheme = Segment with Min Cost1 Value Add Grapheme to GraphemeList
Add Segment to GraphemeList[grapheme]
Subtract Segment from SegmentList
Figure 5.
Algorithm of the Two-cost grouping
segment to join one of the already existing graphemes. This process iteratively calculates the two costs for each segment. At the end of each iteration, an action is taken. The action taken is determined by comparing the minimum Cost1 value with the minimum Cost2 value of all stroke segments. This action is either to make a stroke segment a grapheme by itself, or to join the stroke segment into an existing grapheme. After this, the process recalculates the two cost-values for the remaining stroke segments and the process continue until no more stroke segments remain. The algorithm to group stroke segments into bigger graphemes is illustrated in Figure 5.
4.1 COST1 FUNCTION Cost1 is defined as the cost for a stroke segment to become a grapheme by itself. The smaller the value, the more likely a stroke segment is to become a grapheme. The value of Cost1 is between [0..1]. Geometrical features of the stroke segments are used to compose the total cost value. Four geometrical features that are used in our current experiments are as follow. 1. Height to width ratio of the Segment, R. The associated cost is K11 = Sigmoid-1(R). 2. Distance to the nearest grapheme, DNG. The associated cost is K 21 = Sigmoid-1(DNG)
3. Distance to the core-zone, DCZ. The associated cost is K31 = Sigmoid(DCZ) 4. Size of the segment, SIZ. The associated cost is K 41 = Sigmoid1 (SIZ) The total cost value for Cost1 is as below: 4
K1 =
∑K w 1 i
1 i
1
4
∑w
1 i
1
where K 1 is the Total Cost1
RW. The associated cost is K52 = Sigmoid(RW) 6. Width Increment when the segment joins the grapheme, WI. The associated cost is 2 K6 = Sigmoid(WI) 7. Connectivity of the segment to the grapheme, C. The associated cost is (1) 1 if C = Not _ Connected 2 K7 = C = Connected 0 The total cost value for Cost2 is as below: 7
K =
1 i
2
K is the contributing cost to the total Cost1 w1i is the weighting parameter for the
2 i
wi2
1
7
∑w
2 i
1
where K 2 is the Total Cost1
associated contributing cost, Ki1 4.2 COST2 FUNCTION Cost2 is defined as the cost for a stroke segment to be grouped to in an existing grapheme. The smaller the value, the more easily it is to be attracted to join a neighbouring grapheme. The value of cost2 is between [0..1]. Geometrical features of the stroke segment and context to the neighbouring graphemes are used to compose the total cost value. Seven geometrical features are used in our current experiments. 1. Height to width ratio of the Segment, R. The associated cost is K12 = Sigmoid(R) 2. Distance to the grapheme, DG. The associated cost is K 22 = Sigmoid(DG) 3. Distance to the core-zone, DCZ. The associated cost is K32 = Sigmoid-1(DCZ) 4. Size of the segment, SIZ. The associated cost is K 42 = Sigmoid(SIZ) 5. Resulting Width when the segment joins the grapheme,
∑K
Ki2 is the contributing cost to the total Cost1 wi2 is the weighting parameter for the associated contributing cost, Ki2
5
FURTHER PROCESSING
After ordering the graphemes into a leftright sequence, further processing is needed before the learning or recognition by the HMMs. The feature extraction process extracts the most discriminant information from each grapheme. Fifty-two (52) geometrical features are extracted from each grapheme. After that, the vector quantization process that uses simple Kmeans clustering algorithm, group graphemes with similar features into same clusters. Graphemes having the same clusters have the same observation symbols. Thus, left-right ordered of grapheme sequences are represented by left-right ordered of observation symbol sequences. The HMMs learn and recognise using the observation symbol sequences. Detail discussion of this section is out of the scope of this paper.
6
EXPERIMENTS AND RESULTS
Figure 6. After the segment grouping process, each grapheme is illustrated in different colour. An example of output of the Two-cost grouping is illustrated in Figure 6. To evaluate the recognition accuracy of the system, it is trained to learn the 30 French cheque words, from the IRONOFF database [VL99]. About 7000 data are used to train the system and another 3000 data for testing. Two performance measures are used to evaluate the performance of a recognition process: 1. pos , average position of the correct class in the candidate list generated by the HMMs. The value of 1.0 is the best, which can only be achieved if all the test data are correctly recognised. 2. Rec(K), K=1, …, C, the cumulative percentage of samples for which the correct class has a rank R( wcorr ) ≤ K in the candidate list. For example, Rec(4) = 99% means that for 99% of the P samples, the correct class is among the top 4 candidate in the list. The recognition rate from our experiments is shown in Table 1. 87.3% of the 3000 test data can be correctly recognised in the first position in the rank candidate list. However, only 98.5% of the data are able to be in the top10 list in the rank candidate list, which shows that there are some words that are recognised poorly by the system.
Table 1. of our system
The recognition rate
Rec(1) Rec(2)
87.3% 93.3%
Rec(3)
95.5%
Rec(5)
97.1%
Rec(10)
98.7%
pos
1.37
7
CONCLUSION
From the visual analysis of the output and good recognition results of the Two-cost grouping, we can conclude that the process is capable of providing reasonable groupings, given a 2D distribution of strokes segments. The advantages of the algorithm are: Ø No hard-coded IF-THEN-ELSE statements in determining the grouping criteria. A framework of the segment grouping, which is based on a competition between two tendencies for a particular segment, the Cost1 and Cost2 functions, has been developed. Any enhancement to the processing scheme can be made within the Cost1 and Cost2 function, without any further modifications of the framework of the segment grouping process. This shows the feasibility of an optimisation-based approach versus a rule-based approach. Ø No knowledge of characters is required in this process. In contrary, the disadvantages of the Twocost grouping are: Ø There are a lot of parameters to adjust in order to get a reasonably good grouping. It makes the fine-tuning a very laborious task. This should be alleviated when replacing the current scheme by a trainable algorithm. Ø The sequential nature of the algorithm is making the decision of fine tuning the parameters even harder. This is because an action on one particular iteration will
affect the calculations on succeeding iteration. So, one can hardly predict the result of grouping by just adjusting the parameters. Actually, the Two-cost grouping process is just a bootstrap process, which is used in an initial stage of the training process, so that the HMMs can be able to learn from the training data. The segment grouping process will subsequently be replaced by another trainable segment grouping process. The new process will probably be composed of a neural network, and will then learn the segment grouping process, initially from the Two-cost grouping, and after that from the trained HMMs. So, a non-perfect Two-cost grouping process is adequate as a bootstrap process.
8
ACKNOWLEDGEMENTS
The research is partly funded by the French government through French Embassy in Kuala Lumpur.
9
REFERENCES +
[ABP 98] E. Augustin, O. Baret, D. Price, S. Knerr, “Legal Amount Recognition on French Bank Checks Using a Neural Network-Hidden Markov Model Hybrid”, International Workshop on Frontiers in Handwriting Recognition, IWFHR’6, Taejon, Korea, Aug 1998, pp. 45-54. [KAB+98] S. Knerr, E. Augustin, O. Baret, and D. Price, “Hidden Markov Model Based Word Recognition and Its Application to Legal Amount Reading on French Checks”, Computer Vision and Image Understanding, vol. 70, no. 3, June 1998, pp. 404-419. [LV97] P. M. Lallican, C. ViardGaudin, “A Kalman Approach for Stroke Order Recovering from Offline Handwriting”, ICDAR’95, Uml, pp. 519-522. [LV98] P. M. Lallican, C. ViardGaudin, “Off-line handwriting modeling as a trajectory tracking problem”, International Workshop on Frontiers in Handwriting Recognition, IWFHR’6, Taejon, Korea, Aug 1998, pp. 347-356. [VLK+99] C. Viard-Gaudin, P.M. Lallican, S. Knerr, P. Binter, “The IRESTE On/Off (IRONOFF) Dual Handwriting Database”, to be published in the proceeding of ICDAR’99.