3092

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012

Impact of the Lips for Biometrics Yun-Fu Liu, Student Member, IEEE, Chao-Yu Lin, and Jing-Ming Guo, Senior Member, IEEE

Abstract—In this paper, the impact of the lips for identity recognition is investigated. In fact, it is a challenging issue for identity recognition solely by the lips. In the first stage of the proposed system, a fast box filtering is proposed to generate a noise-free source with high processing efficiency. Afterward, five various mouth corners are detected through the proposed system, in which it is also able to resist shadow, beard, and rotation problems. For the feature extraction, two geometric ratios and ten parabolic-related parameters are adopted for further recognition through the support vector machine. Experimental results demonstrate that, when the number of subjects is fewer or equal to 29, the correct accept rate (CAR) is greater than 98%, and the false accept rate (FAR) is smaller than 0.066% CAR FAR Subjects . Moreover, the processing speed of the overall system achieves 34.43 frames per second, which meets the real-time requirement. Thus, the proposed system can be an effective candidate for facial biometrics applications when other facial organs are covered or when it is applied for an access control system. Index Terms—Feature extraction, lip analysis, lip detection, lip recognition, pattern recognition.

I. INTRODUCTION

N

OWADAYS, biometric systems are widely used in identification and recognition applications, due to the bio-invariant characteristics of some specific structures such as fingerprint, face, and iris. Among these features, face recognition is able to work at a greater distance between the prospective users and the camera than other types of features. Yet, one critical issue of the face recognition system is that the system cannot work well if the target face is partially covered. Thus, considering a smaller part of a face for further identification/recognition can be an effective way to solve this problem. Detecting lip contour with high accuracy is an important requirement for a lip identification system, and it has been widely discussed in former works. One of the studying directions considers the color information [1]–[3], [5]. Hosseini and Ghofrani’s work [1] and the method presented in [2] converted and CIE color a RGB color image into CIE and were combined together to spaces. Components generate a new image to emphasize the lip features, and the

Manuscript received July 17, 2011; revised November 14, 2011; accepted January 14, 2012. Date of publication January 31, 2012; date of current version May 11, 2012. This work was supported by the National Science Council, R.O.C., under Contract NSC 100-2221-E-011-103-MY3. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Patrick Flynn. The authors are with the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei 106, Taiwan (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2012.2186310

results were analyzed through the 2-D fast wavelet transform. Finally, the morphology was employed to smooth and binarize the image and then removed the noises to obtain the lip region. In [3], the images were converted into Caetano and Barone’s chromatic color space [4]. Afterward, the mean and the threshold were computed from the red channel of each pixel, and these were employed to separate the lips’ and nonlips’ regions. In [5], a new color mapping method, which integrated color and intensity information, was developed for the lips’ contour extraction, and Otsu’s thresholding [6] was adopted to extract the binary result. In fact, the influence incurring from the variability of the environment is the common problem for the aforementioned methods since the hue or the brightness is employed to distinguish the lips’ and nonlips’ regions. To overcome the aforementioned lighting problems, other methods with the iterative strategy, which only rely on the characteristic of contour, are proposed. Such as [7], they improved the active contour model proposed in [8] to detect the lip contour; Rizon et al. [9] proposed a model to iteratively formed the lip contour through two irregular ellipses with different minor axes, and the fitting parameters of the ellipse equation were online optimized by the genetic algorithm [10]. Wang et al. [11] first calculated a probability map to construct a rough region of lips, and three hyperbolas were used to iteratively model these lips’ region. Although an accurate lips’ region can be obtained by these methods, the processing efficiency is the common problem of these methods. To cope with this, Li and Cheung [12] utilized the grayscale statistic of the lips’ connecting part to locate the mouth corners and used the grayscale variation to detect the contours of the upper and lower lips. This method is able to detect the lips, but the problem of rotated lips (image) cannot be solved. Moreover, normally, the boundary of the lower lip is rather complex; the aforementioned methods cannot fully cope with these issues. The sources for a lip recognition system include video and image two types. The methods [13], [14] with the former source are mainly used to recognize different speakers with changing and moving mouth. In this scenario, a video stream is required since the important estimated motion feature of a mouth plays the key role, and even the audio information is accompanied in [13]. Notably, the real-time processing is not a requirement for these methods. Conversely, the methods solely utilizing image source are mainly used to identify persons with neutral expression [15]–[18]. The work of Gómez et al. [15] utilized the color transform to separate the skin and lip colors. In addition, a binarization method is used to extract the envelope contour of the lips and then further explored the corresponding geometric features. The work of Choraś [16] also transferred a color image into the HSV color domain for the same intention, but the fixed thresholding manner for segmenting the lip region constrains the possible applications. Except the geometric

1057-7149/$31.00 © 2012 IEEE

LIU et al.: IMPACT OF THE LIPS FOR BIOMETRICS

3093

Fig. 1. Five feature points of lips.

feature, this paper also additionally considered the color feature. However, this feature is sensitive to lighting conditions. The work of Briceño et al. [17] is similar to the aforementioned two methods, including the color transformation and the characteristic of the used features. It is noteworthy that an adaptive thresholding method and morphology are employed for lip contour detection. The aforementioned three methods have a common drawback that the shadow will significantly affect detection results, because the shadow reduces the contrast between the skin and the lips. Consequently, this issue indirectly degrades the robustness of the extracted geometric features. The work of Choraś [18] adopted a special feature using the lips’ print, but it is rather sensitive to rotation, since the directions of the lips’ prints extracted by the employed steerable filters are used to describe a person. In this paper, both of the lip detection and recognition are discussed. To detect the lip contour, more grayscale characteristics from lips and various contrast enhancements are employed to resist the influences from various skin colors and environments. In addition, a few iterative strategies are used to obtain an accurate contour of lips, whereas the real-time requirement is still maintained. Moreover, a complete algorithm is designed for detecting the corner of the lower lip. These intensive and focused considerations ensure that the light changing or even the shadow does not obviously affect the detection result. In feature extraction and recognition parts, 12 various normalized geometric features are used to distinguish the identities of different subjects through the support vector machine (SVM) [19]. As documented in the experimental results, the detection and recognition rates are promising. Thus, the proposed method can be applied for an access control system or can be regarded as an effective candidate for assisting the partial covered face recognition system. The rest of this paper is organized as follows: Section II introduces the proposed lip detection method in which five mouth corners are detected for further recognition. Section III explains the proposed lip recognition method and the feature extraction. Section IV gives the experimental results, and the conclusions are drawn in Section V. II. MOUTH CORNERS DETECTION To extract the features for recognition, five different points, as shown in Fig. 1, are demanded. Fig. 2 illustrates the proposed algorithm, and the details are discussed below. A. Preprocessing In this paper, the face region is directly detected by the powerful method, namely, Viola and Jones’ face detection algorithm [20] from an image, an example of which is shown in Fig. 3 with

Fig. 2. Flowchart of the proposed lip detection scheme.

Fig. 3. Relationship between a face and the possible lip region.

the red box. For further refining the possible region of the lips, a subregion is roughly extracted by the following estimation: (1) (2) where and denote the origin and the top-right position of the face’s bounding box of size ; positions and denote the origin and the top-right position of the estimated lip region of size . For easing the influences from the camera noise and various lighting changes, and achieving a lower computation complexity simultaneously, the proposed fast box filtering (FBF) and the well-known histogram stretching method are used to obtain a contrast enhanced and smoothed result. The first step of the proposed FBF method is to obtain an integral image as derived by (3) where and denote the integral value and the original grayscale value, respectively. Each of the obtained denotes the summation result of the whole grayscale values within

3094

Fig. 4. Result obtained through the proposed FBF with the histogram stretching.

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012

and Fig. 6. Example of the CSR.

Fig. 5. Edge refined results. (a) Generated with the Sobel operator. (b) Quan. tized result with threshold

Fig. 7. Three different lip images with and without shadow at the bottom of the lower lip.

its bottom-left region. Afterward, the response of the box filtering (BF) can be obtained by the following calculation:

(4) where denotes the smoothed result, the odd parameter denotes the size of the employed filter size, and notations and denote the round-down and round-up operators, respectively. The rough lip region (bounding with the obtained and positions) is processed with the proposed FBF and the histogram stretching [2] method. The corresponding region, as shown in Fig. 3, is adopted as an example to exhibit the result, and the enhanced smoothed image with (just for explanation) is shown in Fig. 4. This image is named in this paper, which is widely used in this paper. B. Left and Right Mouth Corners CSR: To further exclude the unrelated parts for precisely detecting the left and right mouth corners, the size of the search region is further reduced. First, the gradients of the image is obtained through the Sobel operator as (5) An instance is shown in Fig. 5(a). Since the gradients of the left and right corners are always higher than other gradients, a further refining procedure is required. For this, the Isodata algorithm [21] is adopted for automatically finding a threshold to filter out some weak gradients as follows: (6) where ; two values 0 and 255 denote black and white in this paper. To derive , a temporary threshold is calculated through Isodata applied on . Thus, threshold can be obtained by the same Isodata with . A result of the obtained is shown in Fig. 5(b).

Fig. 8. Template for searching the left and right mouth corners.

Scanning each row of image of size , the isolate edge points (no neighbors at its left and right sides) are removed. Afterward, for each row, calculating the distance (in pixel) between the left- and rightmost and the amount of the remaining between them, the left- and rightmost are labeled when the maximum distance and its amount is also higher than one third of its distance. In an ideal condition, the two labeled locations are very close to the exact left and right mouth corners. Yet, some problems may be encountered, e.g., when an input face is in rotated. For this, the corner search region (CSR) is constructed for the further left and right mouth corners’ detection. A conceptual diagram of this region is illustrated in Fig. 6, where the vertical position of the top row of the CSR is and the position of the bottom one is . The two variables and denote the vertical position of the left and right labeled points. In the case when the labeled positions are not found, the area of will be assigned as the CSR for following detection. Acquiring the Left and Right Mouth Corners: The boundary between the two lips is always darker than their neighboring regions because of the shadow. Fig. 7 shows some examples to demonstrate this observation. This characteristic is adopted in this paper to robustly detect the left and right mouth corners. Fig. 8 shows a template constructed for detecting the candidates, where position denotes the position of interest, and the gray and white pixels represent the corresponding neighbors are darker or whiter, respectively. The searching operation is limited in the left half of the CSR on image (horizontal positions from 0 to ) for obtaining the (the left mouth corner) candidates. The candidates have to meet some conditions. 1) All neighbors marked with white have to be brighter than all of the neighbors marked with gray. 2) , where , are the neighdenotes the threshold that bors marked with gray;

LIU et al.: IMPACT OF THE LIPS FOR BIOMETRICS

Fig. 9. Ratio of the mouth corners at various

3095

Fig. 10. Distribution of the upper lip’s corner difference.

values.

is obtained from the whole image with the Isodata method; parameter is used to adjust the value of to detect the candidates. For detecting the candidates adaptively, an adjustable threshold is obtained in the second condition. Condition can be easily derived from (candidates’ should bigger than ), and to adopt a suited , the probability density function and the cumulative distribution function of obtained from a statistic are illustrated in Fig. 9. To obtain this result, in total, 2158 samples [22] with manually labeled the five mouth corners (as shown in Fig. 1) are adopted. When , almost 100% real corners are detected, but lots of the false positives are also acquired. Thus, is set at 15 to maintain 25% of the real corners for the tradeoff between the true and false positives. When more than one of the candidates are obtained with , the leftmost candidate is determined as the in this paper. Conversely, when no candidate is available, a smaller is used until candidates appear. The procedure for locating (right mouth corner) is identical to that of . To cope with this, a left–right mirror version of the template, as shown in Fig. 8, is used to determine , and the corresponding search range is the right half of the CSR on image (horizontal positions from to ). In addition, the setting of is identical as that used for . C. Upper Lip Corner Fig. 7 demonstrates that the characteristic of the upper lip labeled in Fig. 1) has a strong variation around the corner ( boundary between the philtrum and the upper lip. The following two equations describe the two characteristics: (7) (8) and are searching within a line region of one where and intersects pixel width. This line is perpendicular to the middle point , which starts from this middle point up with a length of ( denotes the distance of ). The different between the upper and lower points should have a positive difference and should be smaller than threshold (adjustable as discussed below). Moreover, the grayscale of the upper point is also higher than threshold to describe the skin brightness. According to the statistical difference, as shown in Fig. 10 obtained by (7), a strict initial threshold is adopted.

Fig. 11. Distribution of the difference of the middle point of lips.

The dummy variable is used to describe the brightness of the philtrum, which is initialized at 0. The detection procedure is described. Step 1) Candidates are detected through (7) and (8) with the initial parameter settings. Among these, the most top candidate is determined as (terminated). Step 2) If is not found, decrease with a scale of ten in Step 1 until is detected (terminated). Step 3) If is still not found, increase with a scale of ten , and an initial is used in Steps 1 and 2 until is detected (terminated). D. Middle Point The strategy for detecting (the middle point of the lips) is similar to that of , and the search line region of size starts from the detected down. The boundary between the two lips is always darker than the rest of the lip region because of the shadow. This characteristic is described as (9) (10) where denotes the minimal grayscale in the search line region and the initial dummy variable denotes the grayscale tolerance to determine . To obtain a suited threshold , the distribution of , as shown in Fig. 11, is adopted to reference, where a strict initial threshold is used. The detection procedure is introduced. Step 1) Candidates are obtained by (9) and (10) with the initial parameter settings. Among these, the most bottom candidate is determined as (terminated). Step 2) If is not found, decrease with a scale of ten for Step 1 until is detected (terminated). Step 3) If is still not found, increase with a scale of ten , and the initial is used in Steps 1 and 2 until is detected (terminated).

3096

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012

Fig. 12. Flowchart of the lower lip’s corner determination.

Fig. 14. Joint distribution of the two normalized heights of the upper and lower lips.

TOP

AND

TABLE I BOTTOM BOUNDARIES OF LSR1 AND LSR2 LOWER LIP CORNER ( )

FOR

SEARCHING

Fig. 13. Examples of the two different LSRs.

E. Lower Lip Corner Comparing with the detection of the other corners, the detection of the bottom corner is more complex because of shadow and the grayscale difference is unstable between the colors of the skin and the lower lip. Fig. 7 shows some instances, such as the color of the lower lip in Fig. 7(a) (focus on the region around the lower lip corner), which is brighter than the color of the neighboring skin is totally conversed in the case of Fig. 7(b). The case that is shown in Fig. 7(c) is rather ambiguous between the color of the lower lip and the color of the neighboring skin. Consequently, an algorithm is proposed to fully cope with these conditions, and Fig. 12 shows the corresponding flowchart. First, the existence of the shadow under the lower lip is estimated for selecting the following processing methods, named Methods 1 and 2, since it almost appears accompanied with a clear lower lip corner. Before this estimation, two local regions, namely, LSR1 and LSR2 [Lower lip Search Region (LSR)], contained in are defined. These regions exclude lots of unrelated parts for the searching of the lower lip corner ( ). The proposed method only searches from , which is sufficient and also directly reduces the computation requirement. The corresponding examples about the related LSRs are illustrated in Fig. 13. Based upon a reasonable suggestion that normally appears only on the line (called central axis), which is perpendicular to , and intersects at ; thus, the search region should be around the central axis. The left and right boundaries of LSR1 and LSR2 are roughly at (the left boundary of LSR2), (the left boundary of LSR1), and (the right boundaries of LSR1 and LSR2). The following two reasons make these settings: 1) The captured lips may suffer from rotation; the search region has to be enlarged to cover all of the possible locations of . 2) The aforementioned requirement is

only for Method 2. Regarding the top and bottom boundaries of LSR1 and LSR2, and expecting for the top one of LSR2 which is defined with the second reason, the other settings are determined according to the following statistic. Fig. 14 shows the statistical result which provides the distribution of the relationship between the normalized heights of upper ( , where the longest distance is regarded as the normalized term) and lower lips , and which is used to constrain the search region. According to this result, the top and bottom boundaries of LSR1 and LSR2 can be determined through the possible positions of , and the search region size is affected by the . Table I organizes the top and bottom boundaries of LSR1 and LSR2 for searching lower lip corner. To determine the existence of shadow, the region LSR1 central axis is investigated. A simple way which uses the distribution of the difference ( , ) is employed, and lots of the shadow can and be perfectly found. If the cases of positive difference are more than that of the negative difference, the shadow appears (since when shadow appears, the lower lip is brighter than the neighboring skin color). Before explaining the strategy about the lower lip’s corner determination algorithm as shown in Fig. 12, the properties of the three points , , and , and their possible positions are described and illustrated in Fig. 15 for better understanding. The upper part of the is brighter than its lower part, and vice versa for . These two scenarios are handled by Method

LIU et al.: IMPACT OF THE LIPS FOR BIOMETRICS

Fig. 15. Practical detected and Two cases when is found.

3097

candidates. (a) Only is found. (b) and (c) are found. (d) No shadow is detected, and Fig. 17. Ratios of the lower lip corner among various differences between the lip’s color and the neighboring skin color.

Fig. 16. Image enhancement with different parameters . (a)

. (b)

.

1, and the remaining is handled by Method 2. Considering the example of the first decision block in Fig. 12 with “yes” to Method 1 (the detailed algorithm will be revisited later), if the is not found, meaning that the brightness of the lower lip is similar to its neighboring skin; thus, locating through Method 2 is the only way we can do. For the other case when can be found, since it is close to the exact but may be affected by shadow [an example is shown in Fig. 15(b)], another candidate is required. Moreover, to further determine which candidate is the exact [since both and are possible candidates for , as shown in Fig. 15(b)-(c)], distances and are calculated for determining the location of the exact (the detailed decision is illustrated in Fig. 12). Conversely, if no shadow is found, only Method 2 is adopted for detection. Notably, in this algorithm, the case “No can be found” is just set to make the algorithm complete, since all are found according to our experimental results. Method 1: This subsection explains the details of Method 1, containing the image enhancement and feature-point detection methods. To exclude the unrelated too bright or dark grayscales and further increase the contrast, the following is used: if if

(11) where the two of thresholds and are yielded by the Isodata method, where the first threshold is calculated from all of the values contained in and the second threshold is calculated from the candidates that meet ; parameter denotes a factor that controls the excluded area. Normally, is suited for most of the cases, but an exception is shown in Fig. 16(a), where excluding the region of the lower lip (set at 0) may occur. Thus, in this case, is set at 2. To estimate the occurrence of this exception, consequently, three pixels on the top of the five pixels in LSR1 central axis are excluded.

Region LSR1 central axis is adopted for locating , where the possible candidates for this point have to meet four conditions. . Condition 1. Those are with the greatest grayscale difference within 25%: Difference is used provides a greater grayscale to determine whether difference. Since should meet this property, a rough percentage is used to ensure that the exact is contained. . Condition 2. Grayscale difference should greater than 20: Fig. 17 shows a statistical result about difference calculated from the manually labeled exact . It can be observed that the differences of the exact are mostly higher than 20; thus, the threshold 20 is employed in this paper. . Condition 3. should darker than , and it should be brighter than its underneath shadows: belongs to the shadow area; thus , where threshold is obtained by the Isodata from image . Moreover, since the center of the shadow area provides the darkest grayscale, the lower pixels are closer to the center. Thus, the condition is organized as (12) are found, the averaged position When the candidates of of those candidates, which provide the largest grayscale difference (possibly multiple candidates have identical grayscale difference), is determined as in this paper. The search region for locating is also LSR1 central axis, but the top and bottom boundaries are modified for providing better accuracy. To avoid the case when is affected by the shadow that appeared in the lip’s region, as shown in Fig. 15(c), the top boundary is set at . Conversely, since is always close to the exact , the boundary on the bottom is to ease the influence from its folset at lowing shadows. Based upon this search region, the candidates of have to meet the three conditions. Two of the conditions are identical to that for , but the difference is modified as . Fig. 18 is used to illustrate the remaining condition. It can be observed that the grayscale of the exact is darker than that of . Consequently, this condition has to be added for detecting . In addition, the strategy for the case when multiple candidates of are found is identical to that for . Method 2: This method is designed for detecting through the whole contour of the lower lip. Since the lower lips from the processed images mostly contain no shadow, it provides a clear

3098

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012

Fig. 18. Example of locating

. (a)

. (b) Located

and

.

Fig. 19. Example of searching left half of the lower lip contour. Fig. 20. Points and curves definitions for

contour for the following detection. First, since the left half of the lower lip’s contour is tracked, is enhanced with the following equation: if if

(13)

This mechanism and the corresponding parameters are identical to that used in (11). To track the left half of the lower lip contour, the search starts from position to the central axis. First, the contour that contains the right column of is detected, and the search region, as shown in Fig. 19, is only with a three-pixel height [the next search area is the detected contour points , , and ]. The contour point has to meet two conditions. . Condition 1. Yielding the greatest grayscale difference, where the difference is calculated as . . Condition 2. Grayscale difference should be greater than 20, which is identical to that for . Then, the case with no contour point is detected in the current search region; the previous detected contour point’s horizontal position is estimated as the contour point. An instance is also provided in Fig. 19. III. LIPS RECOGNITION To effectively distinguish each individual, the two independent features, i.e., aspect ratio and curvature , are adopted for recognition. The dimension of the feature vector is 12, where the features contained in are the normalized heights of the upper and lower lips calculated as (14) where is the normalized term. In fact, in total, different distances (except for the normalized term; the identical normalized term is used) can be calculated through the obtained to , but one of the selectable features is not required since it can be easily derived from the two adopted features, and the other selectable features are related to . The remaining ten

feature extraction.

features are included in , which is constructed with the parameters of five parabolic lines. To describe these lines, Fig. 20 shows a conceptual diagram for the related points for better understanding, where the lines are in blue, feature points detected by the algorithm given in the previous section are in red, the rest are in green. Notably, the names shown in this figure are widely used for the following explanation. To obtain the five parabolic lines, the four curves including the upper lip’s and , and the lower lip’s and are initialized. Focusing on the lower lip, due to the complex shadow, the precise contour is difficultly determined. Thus, two approximated curves calculated by the following equation are considered as the parabolic lines for feature extraction:

where (15) where is the vertex of the parabolic lines; variable is used to estimate whether location corresponds to the curve, and the lowest one in its column (vertical direction) belongs to the curve. This formulation is an example for calculating the curve , which can be easily derived for . According to the observation, since the bottom contour of the lower lip may be flat, it cannot be represented by simply one parabolic line. Thus, two parabolic lines are utilized for describing the contour of the lower lip in this paper. Compared with the bottom lip, the upper lip does not suffer from shadow; thus, a robust contour can be provided for further recognition. First, the point is estimated for yielding the upper lip’s two initial parabolic lines, named and . The point is expected as the vertex that constructs the initial line with point or by (15), and the line is very close to the exact contour of the lip. The concave feature of the upper lip leads the possible case ; thus, is located by following calculation:

(16)

LIU et al.: IMPACT OF THE LIPS FOR BIOMETRICS

3099

Fig. 21. Various upper lip contour search scenarios. (a) No candidate is found in ER. Numbers of (b) one, (c)–(d) two, and (e) three candidates are found in ER.

The initial curves for the upper lip are used to constrain the search region of the exact lip contour, which is not similar to that for the lower lip. The upper-left curve ( ) is adopted as an example for explanation, which can be easily derived for the upper-right lip ( ). The steps for detecting the real contour can be summarized into two steps. . Step 1. Rough search region (RSR) construction: This step is to ease the influence caused from other unrelated parts. The RSR is used to ensure all the required contour points are within this region. The spaces of size toward the top, and the bottom of the same column construct the RSR. . Step 2. Contour detection: Searching from the mouth corner ( or ) to the center of the mouth column by column, the method and parameters employed to detect the upper lip corner, as discussed in Section II-C, are employed. Focusing on the Step 1 of the candidate selection strategy discussed in Section II-C, it is modified for providing a better accuracy. Given the case of the first one-search operation (right column of ), the next contour point is involved in region RSR ER, where ER denotes the Eight-neighbor Region of the previous detected contour point ( in this example). Some conceptual diagrams are illustrated in Fig. 21, and the cases when various numbers of candidates are detected are also discussed. For the case in Fig. 21(a), if no candidate is found in ER (i.e., RSR ER), the iterative procedure discussed in Section II-C is applied, and the location associated to the initial line is determined as the contour. When different numbers of candidates are detected, the averaged position such as the circles drawn in Fig. 21 is determined as the contour, although the circle contains two locations. Fig. 21(d) shows the case when the circle contains two locations; the next ER is increased to four pixels height for the afterward searching. The topmost of the detected contour from the RSR constructed with is considered as ( for ). To further describe the contour of the upper lip, the detected contour points near are considered as another parabolic line for feature extraction. Five parabolic lines containing the lower lip’s two initial curves and the upper lip’s three detected contours are given as (17) where coefficients , , and are used to describe each parabolic line. Since each curve contains many points, the average coefficients are considered as the features. However, due to fact

Fig. 22. Comparison between the proposed FBF and the traditional BF.

that only describes the intercept characteristic, which is sensitive to the size of the region detected by Viola and Jones’s face detection algorithm [20], this coefficient is excluded. In addition, since the size of the lips can be affected by the distance between the camera and the user, to avoid the influence, , where denotes various parabolic lines for recognition. In this paper, the LibSVM [19] is adopted as the classifier for recognition.

IV. EXPERIMENTAL RESULTS In this section, the performance of the proposed FBF, the mouth corner detection scheme, and the recognition method are estimated. In the experiments, since no former public data set for lip recognition is available, the database [22] refined from the Cohn–Kanade database [23], which contains 97 different subjects, and in total, 8795 samples of size 640 490 are utilized. The Cohn–Kanade database is formerly used for facial expression recognition; thus, it contains various expressions that are not suited for lip identification. For instance, in the access control system application, the number of allowable expressions for recognition is inversely proportional to the security since more expressions will degrade the robustness of the adopted features. Consequently, the images with neutral expression are adopted to construct the refined database for use, in which, in total, 2158 samples are involved. Notably, one of the 97 subjects is not used in the refined database since this individual does not have neutral expression. Fig. 22 shows the processing speed comparison between the traditional BF and the proposed FBF. The simulation platform gears with Intel Pentium 4 2.2-GHz Processor and 2-GB random access memory. The results demonstrate that the efficiency of the proposed FBF is simply inferior to that of the BF when the filter is of size 3 3 since the integral image is required for constructing with the proposed method. In addition, the accuracy of the detected mouth corners with various filter sizes using FBF is simulated for optimization. The corresponding results are organized in Table II, in which the distances between the manually labeled mouth corners and the detected mouth corners are used for estimation. Herein, the best performance of each mouth corner is circled, and no data indicates that the corner cannot be found. According to this result, most of the mouth corners containing , , , and achieve the best accuracy at , but the best for detecting is 5 since the lower lip is more complex and unstable than the other lip corners. Fig. 23 shows some practical feature extraction results, where the original images of Fig. 23(a) is shown in Fig. 7. This experiment involves three types of variations, such as the shadow, the

3100

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 21, NO. 6, JUNE 2012

TABLE II AVERAGE ACCURACY OF THE FIVE DETECTED MOUTH CORNERS. [DATA: THE MEAN (STANDARD DEVIATION) OF THE DISTANCE DIFFERENCE (IN PIXELS)]

Fig. 24. Performance of the proposed recognition system. (a) CAR. (b) FAR.

TABLE III PERFORMANCE OF THE PROPOSED LIP RECOGNITION SYSTEM

Two criteria, i.e., correct accept rate (CAR) and false accept rate (FAR), are used to estimate the performance of the proposed recognition system as Fig. 23. Lip contour extraction from various kinds of images. (a) Shadow appearance. (b) Men with beard. (c) Rotated lips.

CAR

(18)

beard, and the rotation. According the results, the extracted features with the proposed scheme are robust to withstand these interferences. To make a fair comparison of the mouth corner detection between Li and Cheung’s method [12] and the proposed method, we have additionally implemented Li and Cheung’s algorithm, and both of the methods are tested with the same refined database [22]. The corresponding results are provided in Table II. As shown, the results demonstrate that Li and Cheung’s method cannot accurately detect those mouth corners due to the following facts: 1) The shadows that appear around the and mouth corners are not fully considered and analyzed. 2) The unrealistic assumption that the boundary between the two lips is always a straight horizontal line. 3) The rotation issue cannot be handled with their method. 4) For the upper and lower lip corners, they just considered the grayscale extreme grayscale values to determine the corner’s locations, which are easily suffered from noise and have no other constrains to limit the possible candidates. Yet, Li and Cheung’s method can achieve the processing speed of 68.92 frames per second (fps), which is faster than the proposed corner detection of 45.24 fps, since lots of the computations are consumed on the proposed lower lip corner’s detection. Nonetheless, the processing speed on this stage of the proposed method still does not jeopardizes the realtime processing requirement of the overall proposed system.

FAR

(19)

where variables , , , and denote the numbers of correctly accepted, falsely accepted, correctly rejected, and falsely rejected samples of different subjects. In this paper, the -fold cross validation is adopted for yielding an objective estimation, which means that the samples of each subject are uniformly divided into six ( ) subgroups for this experiment, and each group takes turns as the test set and the remainders are regarded as the training set. Later on, the results are averaged and shown in Fig. 24. In addition, some key performance is organized in Table III. Results demonstrate that, even the number of the subjects is as much as 79, the CAR is still higher than 90%, and the FAR is lower than 0.129%. Moreover, the processing efficiency of the whole lip recognition system, including FBF, mouth corner detection, and recognition procedure, can achieve the real-time processing requirement with 34.43 fps. V. CONCLUSION In this paper, the ultimate capability of the lips as a biometric has been investigated. Thus, a lip recognition system with well-designed parameters has been proposed to surprisingly achieve recognition accuracy when only a partial part of the face is available. Also, an additional advantage of handling

LIU et al.: IMPACT OF THE LIPS FOR BIOMETRICS

various sizes of imported frames and different distances between the camera and the prospective users have been provided by applying Viola and Jones’s face detection algorithm and utilizing a predefined scaling parameter to adapt to faces of various sizes in a searching window, and the lips’ features have been normalized through an adaptive feature parameter. Several stages have been involved in the proposed system, and the corresponding contributions have been organized as follows: 1) The proposed FBF provides a faster processing speed than typical BF scheme when a greater filter size is required. 2) The mouth corners detection scheme demands fewer iteration compared with former schemes in the literature, and thus, the overall recognition system can operate in real-time fashion. 3) This proposed lip recognition is able to handle the critical issues when the shadow, the beard, or the rotation is involved. Although the proposed system can achieve promising performance, future possible improvement can be put to explore more features along with dimension reduction, since only 12 features are employed in this paper. REFERENCES [1] M. M. Hosseini and S. Ghofrani, “Automatic lip extraction based on wavelet transform,” in Proc. WRI Global Congr. Intell. Syst., 2009, vol. 4, pp. 393–396. [2] R. C. Gonzalez and R. E. Woods, Digital Image Processing 2/E. Upper Saddle River, NJ: Prentice-Hall, 2002. [3] Q. C. Chen, G. H. Deng, X. L. Wang, and H. J. Huang, “An inner contour based lip moving feature extraction method for Chinese speech,” in Proc. Int. Conf. Mach. Learn. Cybern., 2006, pp. 3859–3864. [4] T. S. Caetano and D. A. C. Barone, “A probabilistic model for the human skin color,” in Proc. 11th Int. Conf. Image Anal. Process., 2001, pp. 279–283. [5] W. C. Wei, C. Jeon, K. Kim, D. K. Han, and H. Ko, “Effective lip localization and tracking for achieving multimodal speech recognition,” in Proc. IEEE Int. Conf. Multisens. Fusion Integr. Intell. Syst., 2008, pp. 90–93. [6] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst., Man, Cybern., vol. SMC-9, no. 1, pp. 62–66, Jan. 1979. [7] H. Seyedarabi, W. Lee, and A. Aghagolzadeh, “Automatic lip tracking and action units classification using two-step active contour and probabilistic neural networks,” in Proc. Can. Conf. Elect. Comput. Eng., 2007, pp. 2021–2024. [8] M. Kass, A. Witkin, and D. Terzopoulos, “Snakes: Active contour models,” Int. J. Comput. Vis., vol. 1, no. 4, pp. 321–331, Jan. 1988. [9] M. Rizon, M. Karthigayan, S. Yaacob, and R. Nagarajan, “Japanese face emotions classification using lip features,” in Proc. GMAI, 2007, pp. 140–144. [10] G. G. Yen and N. Nithianandan, “Facial feature extraction using genetic algorithm,” in Proc. Evol. Comput., 2002, vol. 2, pp. 1895–1900. [11] S. L. Wang, W. H. Lau, and S. H. Leung, “Automatic lip contour extraction from color images,” Pattern Recognit., vol. 37, no. 12, pp. 2375–2387, Dec. 2004. [12] M. Li and Y. M. Cheung, “Automatic lip localization under face illumination with shadow consideration,” Signal Process., vol. 89, no. 12, pp. 2425–2434, Dec. 2009. [13] H. E. Çetingül, E. Erzin, Y. Yemez, and A. M. Tekalp, “Multimodal speaker/speech recognition using lip motion, lip texture and audio,” Signal Process., vol. 86, no. 12, pp. 3549–3558, Dec. 2006. [14] U. Saeed, “Person identification using behavioral features from lip motion,” in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit. Workshops, 2011, pp. 155–160. [15] E. Gómez, C. M. Travieso, J. C. Briceño, and M. A. Ferrer, “Biometric identification system by lip shape,” in Proc. IEEE Annu. Int. Carnahan Conf. Security Technol., 2002, pp. 39–42. [16] M. Choraś, “The lip as a biometric,” Pattern Anal. Appl., vol. 13, no. 1, pp. 105–112, Feb. 2010. [17] J. C. Briceño, C. M. Travieso, J. B. Alonso, and M. A. Ferrer, “Robust identification of persons by lips contour using shape transformation,” in Proc. Int. Conf. Intell. Eng. Syst., 2010, pp. 203–207.

3101

[18] R. S. Choraś, “Lip-prints feature extraction and recognition,” in Advances in Intelligent and Soft Computing. New York: Springer-Verlag, 2011, pp. 33–42. [19] R. E. Fan, P. H. Chen, and C. J. Lin, “Working set selection using second order information for training support vector machines,” J. Mach. Learn. Res., vol. 6, pp. 1889–1918, Dec. 2005. [20] P. Viola and M. Jones, “Robust real-time face detection,” Int. J. Comput. Vis., vol. 57, no. 2, pp. 137–154, May 2004. [21] A. El-Zaart, “Images thresholding using Isodata technique with gamma distribution,” Pattern Recognit. Image Anal., vol. 20, no. 1, pp. 29–41, Mar. 2010. [22] Multimedia Signal Processing Laboratory, National Taiwan University of Science and Technology, Taipei, Taiwan, “Lip Recognition Database,” 2012. [Online]. Available: http://msp.ee.ntust.edu.tw/publicfile/ FaceRecSet.rar [23] T. Kanade, J. F. Cohn, and Y. Tian, “Comprehensive database for facial expression analysis,” in Proc. IEEE Int. Conf. Autom. Face Gesture Recognit., 2000, pp. 46–53. Yun-Fu Liu (S’09) was born in Hualien, Taiwan, on October 30, 1984. He received the M.S.E.E. degree from Chang Gung University, Taoyuan, Taiwan, in 2009. He is currently working toward the Ph.D. degree in the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan. His research interests include digital halftoning, steganography, image compression, object tracking, and pattern recognition. Mr. Liu is a student member of the IEEE Signal Processing Society. He was a recipient of the Special Jury Award from Chimei Innolux Corporation in 2009 and the third Master’s Thesis Award from Fuzzy Society, Taiwan, in 2009.

Chao-Yu Lin was born in Taipei, Taiwan, on January 28, 1989. He received the B.S.E.E. degree from the National Taiwan University of Science and Technology, Taipei, Taiwan, in 2011. He is currently working toward the M.S.E.E. degree in the Institute of Electrical Control Engineering, National Chiao Tung University, Hsinchu, Taiwan. His current research interests include emotion recognition, pattern recognition, human-robot interaction, and robot control. Mr. Lin is a member of the Phi Tau Phi Scholastic Honor Society.

Jing-Ming Guo (M’06–SM’10) was born in Kaohsiung, Taiwan, on November 19, 1972. He received the B.S.E.E. and M.S.E.E. degrees from the National Central University, Taoyuan, Taiwan, in 1995 and 1997, respectively, and the Ph.D. degree from the National Taiwan University, Taipei, Taiwan, in 2004. From 1998 to 1999, he was an Information Technique Officer with the Chinese Army. From 2003 to 2004, he was granted the National Science Council scholarship for advanced research from the Department of Electrical and Computer Engineering, University of California, Santa Barbara. He is currently a Professor with the Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei. His research interests include multimedia signal processing, multimedia security, computer vision, and digital halftoning. Dr. Guo is a senior member of the IEEE Signal Processing Society. He was a recipient of the Outstanding Youth Electrical Engineer Award from the Chinese Institute of Electrical Engineering in 2011, the Outstanding Young Investigator Award from the Institute of System Engineering in 2011, the Best Paper Award from the IEEE International Conference on System Science and Engineering in 2011, the Excellence Teaching Award in 2009, the Research Excellence Award in 2008, the Acer Dragon Thesis Award in 2005, the Outstanding Paper Awards from the Institute for Public Policy Research and the Computer Vision and Graphic Image Processing in 2005 and 2006, and the Outstanding Faculty Award in 2002 and 2003.

Impact of the Lips for Biometrics - IEEE Xplore

Afterward, five various mouth corners are detected through the proposed system, in which it is also able to resist shadow, beard, and ro- tation problems. For the feature extraction, two geometric ratios and ten parabolic-related parameters are adopted for further recognition through the support vector machine. Experimental.

2MB Sizes 0 Downloads 229 Views

Recommend Documents

Impact of Practical Models on Power Aware Broadcast ... - IEEE Xplore
The existing power aware broadcast protocols for wireless ad hoc and sensor networks assume the impractical model where two nodes can communicate if and only if they exist within their transmission radius. In this paper, we consider practical models

Impact of Load Forecast Uncertainty on LMP - IEEE Xplore
always contain certain degree of errors mainly due to the random nature of the load. At the same time, LMP step change exists at critical load level (CLL).

IEEE Photonics Technology - IEEE Xplore
Abstract—Due to the high beam divergence of standard laser diodes (LDs), these are not suitable for wavelength-selective feed- back without extra optical ...

Spectrum Requirements for the Future Development of ... - IEEE Xplore
bile telecommunication (IMT)-2000 and systems beyond IMT-2000. The calculated spectrum ... network environments as well, supporting attributes like seam-.

wright layout - IEEE Xplore
tive specifications for voice over asynchronous transfer mode (VoATM) [2], voice over IP. (VoIP), and voice over frame relay (VoFR) [3]. Much has been written ...

Device Ensembles - IEEE Xplore
Dec 2, 2004 - time, the computer and consumer electronics indus- tries are defining ... tered on data synchronization between desktops and personal digital ...

wright layout - IEEE Xplore
ACCEPTED FROM OPEN CALL. INTRODUCTION. Two trends motivate this article: first, the growth of telecommunications industry interest in the implementation ...

Evolutionary Computation, IEEE Transactions on - IEEE Xplore
search strategy to a great number of habitats and prey distributions. We propose to synthesize a similar search strategy for the massively multimodal problems of ...

The Viterbi Algorithm - IEEE Xplore
HE VITERBI algorithm (VA) was proposed in 1967 as a method of decoding convolutional codes. Since that time, it has been recognized as an attractive solu-.

On the Impact of Dissimilarity Measure in k-Modes ... - IEEE Xplore
Jan 15, 2007 - intrasimilarity and to efficiently cluster large categorical data sets. ... Index Terms—Data mining, clustering, k-modes algorithm, categorical data.

I iJl! - IEEE Xplore
Email: [email protected]. Abstract: A ... consumptions are 8.3mA and 1.lmA for WCDMA mode .... 8.3mA from a 1.5V supply under WCDMA mode and.

Gigabit DSL - IEEE Xplore
(DSL) technology based on MIMO transmission methods finds that symmetric data rates of more than 1 Gbps are achievable over four twisted pairs (category 3) ...

IEEE CIS Social Media - IEEE Xplore
Feb 2, 2012 - interact (e.g., talk with microphones/ headsets, listen to presentations, ask questions, etc.) with other avatars virtu- ally located in the same ...

Grammatical evolution - Evolutionary Computation, IEEE ... - IEEE Xplore
definition are used in a genotype-to-phenotype mapping process to a program. ... evolutionary process on the actual programs, but rather on vari- able-length ...

Throughput Maximization for Opportunistic Spectrum ... - IEEE Xplore
Abstract—In this paper, we propose a novel transmission probability scheduling scheme for opportunistic spectrum access in cognitive radio networks. With the ...

SITAR - IEEE Xplore
SITAR: A Scalable Intrusion-Tolerant Architecture for Distributed Services. ∗. Feiyi Wang, Frank Jou. Advanced Network Research Group. MCNC. Research Triangle Park, NC. Email: {fwang2,jou}@mcnc.org. Fengmin Gong. Intrusion Detection Technology Divi

striegel layout - IEEE Xplore
tant events can occur: group dynamics, network dynamics ... network topology due to link/node failures/addi- ... article we examine various issues and solutions.

Digital Fabrication - IEEE Xplore
we use on a daily basis are created by professional design- ers, mass-produced at factories, and then transported, through a complex distribution network, to ...

Iv~~~~~~~~W - IEEE Xplore
P. Arena, L. Fortuna, G. Vagliasindi. DIEES - Dipartimento di Ingegneria Elettrica, Elettronica e dei Sistemi. Facolta di Ingegneria - Universita degli Studi di Catania. Viale A. Doria, 6. 95125 Catania, Italy [email protected]. ABSTRACT. The no

Device Ensembles - IEEE Xplore
Dec 2, 2004 - Device. Ensembles. Notebook computers, cell phones, PDAs, digital cameras, music players, handheld games, set-top boxes, camcorders, and.

Fountain codes - IEEE Xplore
7 Richardson, T., Shokrollahi, M.A., and Urbanke, R.: 'Design of capacity-approaching irregular low-density parity check codes', IEEE. Trans. Inf. Theory, 2001 ...

(PCR) for Real-Time Differentiation of Dopamine from ... - IEEE Xplore
array (FPGA) implementation of a digital signal processing. (DSP) unit for real-time processing of neurochemical data obtained by fast-scan cyclic voltammetry ...

a generalized model for detection of demosaicing ... - IEEE Xplore
Hong Cao and Alex C. Kot. School of Electrical and Electronic Engineering. Nanyang Technological University. {hcao, eackot}@ntu.edu.sg. ABSTRACT.