IJRIT International Journal of Research in Information Technology, Volume 2, Issue 11, November 2014, Pg. 213-216
International Journal of Research in Information Technology (IJRIT) www.ijrit.com
ISSN 2001-5569
Robust Text Detection and Extraction in Natural Scene Images Using Conditional Random Field Model and OCR Pratik Yadav Lecturer,
Computer Engineering, Pune University/ VACOE Ahmednagar Ahmednagar, Maharashtra, India
[email protected]
Abstract In Natural Scène Image, Text detection is Important task which are used for many content based image analysis .Maximally stable external regions based method is used for scene detection .This MSER based method includes stages character candidate extraction ,text candidate construction ,text candidate elimination & text candidate classification. Main limitation of this method are how to detect highly blurred text in low resolution natural scene images, How to detect multi orientation text & the current technology not focuses on any text extraction method. In proposed system a Conditional Random field (CRF) model is used to assign candidate component as one of the two classes (“’text”&”’Non Text”)by Considering both unary component properties and binary contextual component relationship. For this purpose we are using connected component analysis method. The proposed system also performs a single character extraction or text extraction using optical character recognition (OCR) Keywords: Conditional random field (CRF), Optical Character Regeneration (OCR), Text Detection, Text Extraction
1. Introduction TEXT in natural scene images contains valuable information .The text in the Natural scene image contains sign or letters which include Building Name, Street Name, Company Name, Commercial Advertisement, Announcement .This Information can be Useful for content-based image and video applications, such as content-based web image retrieval, video information retrieval. Text information as a main component of scene images, it usually provides an important clue for scene understanding. Given the vast number of text-based search engines, retrieving image using the embedded text offers an efficient supplement to the visual search systems. Also with the wide use of smart phones and rapid development of the mobile internet, it has become a living style for people to capture information by using of cameras embedded in mobile terminals. The camera image contains text base image .There for Text Detection In Natural Scene Image is Important task .Maximally Stable Extremal Regions (MSERs) based methods are used for this purpose .After Detection of this text the next parts comes how to extract this
Pratik Yadav, IJRIT- 213
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 11, November 2014, Pg. 213-216
text. Automatic extracting text from scene text images after text detection is still a problem. The main difficulty while extracting the text is high variability of text appearance, for instance, variation in color, changing font style, size and different languages. Apart from that the problems are complex background, uneven illumination and blur make the problem of scene text extraction much more challenging. Researchers have reported many methods to solve this problem, and some can give good results. Form those methods we use Text Extraction optical character recognition method is used. We use fast and accurate MSERs pruning algorithm that enables us to detect most characters even when the image is in low quality. Second, we use self-training distance metric learning algorithm that can learn distance weights and clustering threshold simultaneously; text candidates are constructed by clustering character candidates by the single-link algorithm using the learned parameters. Third, we put forward to use a character classifier to estimate the posterior probability of text candidate corresponding to non-text and eliminate text candidates with high non text probability, which helps to build a more powerful Text classifier
2. Related Work In this paper we use a robust and accurate MSER (maximal stable extermal region) method. This methods uses the the includes the following stages: 2.1. Character candidates extraction - In Character Candidate extraction pruning most algorithms is used for character identification as each character is unique and different from other character. Parent –child relationship is from to calculate repeated text value Fast and accurate MSERs pruning algorithm that enables us to detect characters even when the image is in low quality. 2.2 Text candidates construction –A single-link clustering algorithm is used to calculate Distance weights and clustering threshold values of the character. In the case of single-link clustering, the two clusters whose two closest members have the smallest distance are merged in each step. A distance threshold can be specified such that the clustering process is terminated when the distance between nearest Clusters exceed the threshold. The resulting clusters of single-link algorithm form a hierarchical cluster tree or cluster forest if termination threshold is specified Clustered into text candidates by the single-link clustering algorithm using the learned parameters. To learn the distance function the Strategy follows as learn the distance function by minimizing distance between two points. One the strategy of metric learning is to learn the distance function by minimizing distance between point pairs in C while maximizing distance between point pairs in M, where C specifies pairs of points in different clusters and M specifies pairs of points in the same cluster. In single-link clustering, clusters are formed by merging smaller clusters; the final resulting clusters will form a binary cluster tree, in which non-singleton clusters have exactly two direct sub clusters. By minimizing regularized variations Character Candidate are extracted with the help of MSER algorithm. 2.3 .Text candidates elimination: - In this Step the probabilities of Text Candidates related with text and non text are calculated and text with high non text probabilities is removed.
3. Propose Work 3.1 Text candidates classification: - Text candidates corresponding to true texts are identified by the text classifier. This Text Candidate classifier decides whether text is true text or not. We propose a conditional random field (CRF) model to assign candidate components as Text or non text .Two classes of text are declared as text or Non text by Considering Binary Conceptual Component Property and Unary Component property .Method used for this purpose is connected Component analysis method. Each text has distinct geometric features also the neighboring components have close spatial and geometric relationship. For this we are using connected Component analysis method. Connected components -based methods are based on observations that texts can be seen as a set of connected components, each of which has distinct geometric
Pratik Yadav, IJRIT- 214
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 11, November 2014, Pg. 213-216
features, and neighboring components have close spatial and geometric relationships. These methods normally consist of three stages: 1) CC extraction to segment candidate text components from images; 2) Post-processing to group text components into text blocks (e.g., words and lines). We are using SVM classifier for classifier for better performance for text and non text components from candidate components which increases accuracy also.CRF model differentiates text components from nontext components better than other classifiers. 3.2 Text Extraction:The Next parts come of text extraction. If this text information can be extracted efficiently, we can provide a more reliable content-based access to the image data. Many approaches have been proposed in the literature but text extraction is still a challenging problem because of various text sizes and orientations, complex backgrounds, varying lighting, and distortions. Therefore in the propose system the text extraction provided Optical character recognition (OCR). This technique capture motions, the order in which segments are drawn, the direction, and the pattern of putting the pen down and lifting it. This useful information can make the end-to-end process more accurate. The text is clearly extracted by Method. This technology is also known as "dynamic character recognition “or "real-time character recognition".
Natural Scene Image
Character Candidates Extraction
Text Candidate Construction
Text Candidate Elimination
Text Candidates Classification
By CRF Model
Text Extraction
By OCR Method Fig1: Step by Step Process of Text Detection and
Pratik Yadav, IJRIT- 215
IJRIT International Journal of Research in Information Technology, Volume 2, Issue 11, November 2014, Pg. 213-216
4. Conclusion In previous MSER-based scene text detection method Ada Method is Used for Text Candidate Classification Instead of that we Proposed CRF Model used for Text Detection. The Text extraction is performed by OCR method. The main advantage Our Modified MSER-based methods over traditional connected component based methods are in the usage of the modified MSERs algorithm for character extraction. The MSERs algorithm is able to detect most characters even when the image is in low quality. Main limitations of our technology is Detecting a variety of multilingual texts is also a challenge. Second text needs to be further investigated for similar multiple text lines with a skewed distortion
5. References [1] Y. Zhong, H. Zhang, and A. Jain, “Automatic caption localization in compressed video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 4, pp. 385–392, Apr. 2000. [2] Text detection and recognition in images and video frames Datong Chen, Jean-Marc Odobez, Herve Bourlard Received 30 December 2002; accepted 20 June 2003 [3] J. Weinman, E. Learned-Miller, and A. Hanson, “Scene text recognition using similarity and a lexicon with sparse belief propagation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 10, pp. 1733–1746, Oct. 2009. [4] D. Karatzas, S. Robles Mestre, J. Mas, F. Nourbakhsh, P. Pratim Roy, "ICDAR 2011 Robust Reading Competition - Challenge 1: Reading Text in Born-Digital Images (Web and Email)", In Proc. 11th International Conference of Document Analysis and Recognition, 2011, IEEE CPS, pp. 1485-1490. [5] P. Shivakumara, Q. P. Trung, and C. L. Tan, “A laplacian approach to multi-oriented text detection in video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 412–419, Feb. 2011. [6] X. Chen and A. Yuille “Detecting and reading text in natural scenes,” in Proc. IEEE Conf. CVPR, vol. 2. Washington, DC, USA, 2004, pp. 366–373. [7] J.-J. Lee, P.-H. Lee, S.-W. Lee, A. Yuille, and C. Koch, “Adaboost for text detection in natural scene,” in Proc. ICDAR, Beijing, China, 2011, pp. 429–434. [8] C. Yi and Y. Tian, “Text string detection from natural scenes by structure-based partition and grouping,” IEEE Trans. Image Process., vol. 20, no. 9, pp. 2594–2605, Sept. 2011 [9] C. Mancas-Thillou and B. Gosselin, “Color text extraction with selective metric-based clustering,” Comput. Vis. Image Und., vol. 107, no. 1–2, pp. 97–107, 2007. [10] Y.-F. Pan, X. Hou, and C.-L. Liu, “A hybrid approach to detect and localize texts in natural scene images,” IEEE Trans. Image Process., vol. 20, no. 3, pp. 800–813, Mar. 2011. [11] C. Yi and Y. Tian, “Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification,”IEEE Trans. Image Process, vol. 21, no. 9, pp. 4256–4268, Sept. 2012. [12] C. Yi and Y. Tian, “Text extraction from scene images by character appearance and structure modeling,” Comput. Vis. Image Und., vol. 117, no. 2, pp. 182–194, 2013. [13] J. D. Lafferty, A. McCallum, and F. C. N. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proc. Int. Conf. Mach. Learn., San Francisco, CA, USA, 2001, pp. 282–289. [14] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions,” in Proc. Brit. Mach. Vis. Conf., vol. 1. 2002, pp. 384–393. [15] A. Shahab, F. Shafait, and A. Dengel, “ICDAR robust reading competition challenge 2: Reading text in scene images,” in Proc. ICDAR, 2011, pp. 1491–1496. [16] H. Chen et al., “Robust text detection in natural images with edge enhanced maximally stable extremal regions,” in Proc. IEEE Int. Conf. Image Process., 2011, pp. 2609–2612. [17] L. Neumann and J. Matas, “Real-time scene text localization and recognition,” in Proc. IEEE Conf. CVPR, Providence, RI, USA, 2012, pp. 3538–3545. [18] C. Shi, C. Wang, B. Xiao, Y. Zhang, and S. Gao, “Scene text detection using graph model built upon maximally stable extremal regions,” Pattern Recognit. Lett, vol. 34, no. 2, pp. 107–116, 2013.
Pratik Yadav, IJRIT- 216