TH-H3-1
SCIS & ISIS 2008
Robust Eye Localization for Lip Reading in Mobile Phone Environments Thanh Trung Pham *, Jin Young Kim*, Seung Yu Na*, Sung Taek Hwang** * Dept. of ECE, Chonnam National University ** Telecommuniation R&D Center, Samsung Electronics * Buk-Gu Yongbong-Dong 300, Gwangju, 500-757, South Korea ** 416 Metan-Dong, Yeongtong-Gu, Suwon-si, Gyeonggi-do, 443-747, South Korea Emails: {
[email protected],
[email protected]}
Abstract- In this paper we present a new robust approach for eye localization adopted into lip reading in mobile environments, where the input image is assumed to contain a single face. Firstly, we segment eye candidates regions using intensity information in YCbCr color space. Then coupled regions of eye candidates including assumed eye brows are extracted with the restrictions of morphology and geometry characteristics of eyes in frontal face. GMM is applied next to validate exact eye couple. Keyword: eye localization, lip-reading, coupled eyes, Gaussian Mixture Model I.
INTRODUCTION
Since the last few years, lip reading or speech reading has attracted lots of attentions to enhance speech recognition performances. It’s because the visual movements of lip contain the information of speech articulation. The first step in lip reading system is to detect lip regions in face images. A common approach is the face detection based method. That is, the face region is detected by skin color information and then the lip detection is performed using the information of facial structure. However, the color information is so variable due to illumination change that exact face detection is not easy in dynamic lighting condition of indoor and outdoor environments. By the way, fortunately, in mobile application of lip reading only one face is assumed to be located near the image center. Thus, we can directly detect eye centers without face detection. Eye centers are very good features for setting a proper region including lip. In this paper, we propose a robust method of eye localization for mobile phone based lip reading with a little tolerance of localization accuracy. The main purpose of eye detection is to set a sufficient lip region, where the fine lip analysis would be performed. Various approaches have been proposed for eye localization during the last decade. These approaches are generally grouped into three categories: image matching, machine learning and image processing based methods. Image matching based methods [1, 2] first construct eye templates, and then compare them with subimages in the areas to be detected. The sub-image with the highest matching score is considered as the eye. Machine learning methods usually construct a classifier or detector to distinguish eye and non-eye regions. This kind of methods needs numerous training data to obtain good performance
and consume much time for searching eye regions. In [3], Adaboost detector is applied to segment eye regions and further fast radials symmetry process is used to locate eye center. Image processing based methods usually use edges, corners, intensity, color or other characteristics of the face to locate eyes. In [9], face is extracted firstly base on skin color information, then vertical projection in hue image is used to locate eye region; at last the eye center is located by a peak value extraction algorithm. In this paper, we present a new robust eye localization algorithm being applicable to lip reading system. The proposed method uses both intensity and geometry information for extracting eye candidate regions. In mobile phone environments, the image or video is assumed to contain single frontal face. The image is converted to YCbCr color space, and binarilized using threshold technique. As the result, we segment eye candidate regions such as: eyes, eyebrows, nostrils, lips, spectacles, hair, ears, and so on. Then geometry characteristics of eyes are applied to reject non-eye regions and keep possible ones. Finally, we use GMM to validate eye candidates and pick up the eye couple with the highest probability. Because this eye detection is applied to lip reading system, nearly exact eye center is acceptable instead of precise eye center. So the center point of eye region can be considered eye center. In next sections, the whole eye localization algorithm and experiment results will be explained in details. II.
EYE LOCALIZATION
Our eye localization method is summarized in Figure 1. Input RGB Image
Y Image
Eye Candiates Segmentation
Eye Center
Eye Validation Figure 1 Flow chart of eye localization. A. Eye Candidate Segmentation In mobile lip-reading application, the face is supposed locating at center of screen, so the eye region should belong
385
to a potential region in image, called mask (Figure 2). The input image is firstly converted to YCbCr color space, and then the Y component is used to segment eye candidate regions. One important property of eye is that it’s always darker than skin and other regions. So we can take advantage of this to segment image into regions that can be considered as eye regions. The image in mask region is then binarilized by using threshold techniques. The pixels having intensity less than threshold value are set to 1, otherwise set to 0. Now in binary image, we have eye candidate regions with pixel value 1 (Figure 3, 4).
Thresholds
Eye Candidates Segmentation
Eye Couples Extraction
<1
Couples of eyes
Decrease Thresholds
>=1 Eye Validation
Figure 2 Mask Image. Figure 5 Adaptive Segmentation Algorithm.
Figure 3 Input Y Image and Binarilized Image. 1. 2. 3.
4. 5. 6. 7. 8.
Init mask image, contain potential eye region Compute min threshold and max threshold based on intensity information of Y image For pixel in Y Image having intensity value less than min threshold is set to 1 (object), otherwise set to 0 (background). Create edge map image edgeIm from Y image Dilate edgeIm by rectangle structure element. edgeIm=1-edgeIm The binarilized image is the product of Y image, edgeIm and mask. Connected component analysis is applied on this binarilized image for next processing.
Figure 6 Eye Candidates Remaining. GMM is a statistical model or density model that comprise a number of component functions. We contruct Gaussian Mixture probality density function (pdf) from training data. Then for each eye candidate, we calculate its probability. After that, one having hightest probability will be chosen as eye region (Figure 7). Eye candidates
Training data Feature extraction Module
Figure 4 Segmentation Algorithm.
Feature vectors
Because intensity value is very sensitive to the changes of illumination so our segmentation algorithm is made adaptively. The threshold value is increased or decreased until eye candidates are well segmented (Figure 5).
Gaussian Mixture PDF {Probability}
B. Eye Validation using GMM
Eye
This step aims to validate which candidate region is eye region. To reduce time cost in validation step, the process rejects non-eye regions using physical characteristics of eye. The regions, having too big or small areas, height greater then width 3 times, will be considered as non-eye and be rejected. Figure 6 shows in the result after rejecting some non eye regions.
Max
Figure 7 Eye Validation Diagram. In feature extraction module (Figure 11), Haar wavelet transform is aplied to extract features for eye image. It is well known that Haar wavelet transform can deal very well with features in face such as nostril, eyes, and lip. And then PCA is used to reduce the feature dimension.
386
In our study, we construct 3 training data types to perform experiments (Figure 8, 9, 10). The first data contains single eyes, in which left eye and right eye are independent. The second one, each image is a couple of left eye and right eye. And the last one, each image is a couple of eyes including eyebow regions. Figure 12 Couple eye candidates. In this case, we model the second data with 5 Gaussian mixtures. The validation procedure is the same as the last one. All sample images are normalized to size 142x31, downsampled to 36x8. After taking Haar wavelet transform, we get 18x4 features for each image. Next, we apply PCA to extract first 15 principle components. The performance of this validation method is much better than the last one. But in some cases, eyebrow is very similar to close eye, so this may cause wrong detection. In order to solve this shortcoming, eyebrow region is included to eye-couple image for both GMM and validation (Figure 13).
Figure 8 Single eye samples.
Figure 9 Couple eye samples.
Figure 10 Couple eye with eyebrow samples. 1.
Validate single eye
All images in first sample data are normalized to standard size 71x31 and downsampled to 36x16 before Haar wavelet transform is applied to get 16x8 features. We also use PCA to reduce dimensionality of feature vectors. As result, each eye image is presented by 8 features. All 600 feature vectors are used to estimate Gaussian mixture PDF models with 3 conponents. From the center of each eye candidate region, we take eye image with size 71x31 and then the same feature extraction procedure is applied to this eye image. Finally, each eye candidate has its own probability. The candidates with greatest probabilities are chosen as true eyes in image. This validation method is so simple but does not show an excellent performance.The mose errors are due to eyebrows, thick spectacles, hair and other objects. In the next section, coupled eye validation apprach is introduced, in which physical properties of eyes, such as symetry and correlation , are taken into account to enhance detection rate.
Figure 13 some eyes and eye couples candidates. III.
EXPERIMENT RESULTS
In our study, we recorded videos of 105 different persons in standard, indoor and outdoor environments. Each frame has size 640x480. This data is used to verify our proposed algorithm. We contructed 600 single eye samples, 1000 eye couple samples and 1000 eye couple with eyebrow samples to build Gaussian mixture PDF seperately. Figure 14-16 show somes detection results with different eye status and diffirent environments. The experiment shows that when we validate eye candidate with eyebrow , the performance is much more efficient.
Downsampling
Haar Wavelet Transform
PCA
Normal Eye
Closed Eye with Gap
Figure 11 Feature extraction Module. 2.
Validate couple eye
Among eye candidates obtained from last step, a preprocessing step selects eye couples by considering geometry characteristics of left and right eye. All couples, having too large or small distance, low correlation, large height distance, are rejected. Firgure 12 shows the result as couple eye extraction.
Eye with Spectacle
Eye with Spectacle and cap
Figure 14 Detection results - Standard DB.
387
IV.
Normal Eye
Eye with Spectacle
Closed Eye with Spectacle
Closed Eye with Gap
CONCLUSION
In this paper, we presented a simple and robust method for eye localization. Both intensity information and physical properties of eyes are used to find the center of left eye and right eye. The main idea is to validate cadidate regions of left and right eyes simultaneously with coupled images. The intensity information is used to segment eye candidate regions, and then the potential eye couples are extracted based on geometry characteristics of eyes in frontal face image. These couples of eyes are finally validated using GMM to decide the best candidate. The experiment shows that our method works so well against the dynamicity of eyes and lightening conditions. and different environments. Our method aslo take small time cost, so that it is feasible to lip reading system in mobile environments. REFERENCES
Figure 15 Detection results - Indoor DB.
Normal Eye
Eye with Spectacle
Closed Eye
Closed Eye with Gap
Figure 16 Detection results - Outdoor DB. Table 1 shows the detection comparision of three validation methods in standard, indoor and outdoor environments. Table 1 Comparision of different validation methods. Standard DB Single eye Validation Couple eye Validation Couple eye with eyebrow Validation
Indoor DB
Outdoor DB
Number of samples
60%
--
--
105
90.3%
87.2%
85.6%
105
98.1%
96.6%
94%
105
[1] S. Kim, S.T. Chung, S. Jung, D. Oh, J. Kim, and S. Cho. Multiscale Gabor Feature based Eye Localization, Proceedings of World Academy of Science, Vol. 21, pp. 483-487, Jan. 2007. [2] K. Peng, L. Chen. A Robust Algorithm for Eye Detection on Gray Intensity Face without Spectacles, Journal of Computer Science & Technology, Vol. 5, No. 3, 2005. [3] Z. Wencong, C. Hong, Y. Peng, L. Bin and Z. Zhenquan. Precise Eye Localization with AdaBoost and Fast Radial Symmetry, Proceedings of Computational Intelligence and Security, Vol. 1, pp. 725-730, Nov. 2006. [4] H. Lu, W. Zhang, D. Yang. Eye Detection Based on Rectangle Features and Pixel-pattern-based Texture Features, Proceedings of International Symposium on Intelligent Signal Processing and Communication Systems, pp. 746-749, Nov.28-Dec. [5] Z. Niu, S. Shan, S. Yan, X. Chen, and W. Gao. 2D Cascaded AdaBoost for Eye Localization, Proceedings of the 18th International Conference on Pattern Recognition, Vol. 2, pp. 1216-1219, 2006. [6] P. Wang, M. Green, Q. Ji, and J. Waymanm. Automatic eye detection and its validation, Proceddings of IEEE Conf. Computer Vision and Pattern Regconition, Vol. 3, pp 164-164, 2005. [7] X. Tang, Z. Ou, T. Su, H. Sun, and P. Zhao. Robust Precise Eye Location by AdaBoost and SVM Techniques. Proceedings of Int’l Symposium on Neural Networks, pp. 93–98, 2005. [8] S. Du, Ward R. A Robust Approach for Eye Localization Under Variable Illuminations, Proceedings of ICIP 2007, Vol. 1, pp. 377-380, 2007. [9] W. T. Wang, C. Xu H. Shen. Eye localization based on hue image processing, Proceedings of ISPACS 2007, pp 730733, 2007. [10] Y. Chen, and K. Kubo. A Robust Eye Detection and Tracking Technique using Gabor Filters, Proceedings of Third International Conference on IIHMSP 2007, Vol. 1, pp. 109112, 2007. [11] O. Jesorsky, K. J. Kirchberg, and R. W. Frischholz. Robust Face Detection Using The Hausdorff Distance, Proceedings of Audio and Video-Based Biometric Person Authentication, pp. 91-95, 2001. [12] M. Hamouz, J. Kittler, J.K. Kamarainen, and H. Kalvioinen. Affine-Invariant Face Detection and Localization Using GMMBased Feature Detectors and Enhanced Appearance Model, Proceedings of IEEE Sixth International Conference on Automatic Face and Gesture Recognition, pp. 67-72, 2004.
388