Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, 12-15 July 2009
A DECISON THEORY BASED MULTIMODAL BIOMETRIC AUTHENTICATION SYSTEM USING WAVELET TRANSFORM ANWESHA BHATTACHARJEE2, MONISHA SAGGI2, RAMYA BALASUBRAMANIAM1, MR. AKASH TAYAL1, DR. ASHWINI KUMAR1 1
Department of Electronics and Communication Engineering, 2 Department of Computer Science and Engineering Indira Gandhi Institute of Technology, Guru Gobind Singh Indraprastha University, Delhi, India E-MAIL:
[email protected],
[email protected],
[email protected],
[email protected],
[email protected]
Abstract: The Multi resolution nature and energy compaction of wavelet transform particularly makes them suitable for analysis of biological features both behavioral and physiological. The paper proposes to use Daubechies 3 and 4 wavelets to analyze the iris pattern and speech samples with the application of Decision theory in case the stand alone systems of speech identification and iris recognition yield conflicting results.
Keywords: Daubechies wavelets; Pitch period estimation; Decision theory; Feature extraction
1. Introduction The entire reason behind the use of biometric identification is security. Most biometric systems today are unimodal in nature, i.e. they rely on a single feature to identify an entity. Even if some features cannot be impersonated, identification systems may not be completely fool proof so authentication might fail. Nimalan Solayappan et.al [1, 2] provides a comparative study between various unimodal biometric recognition techniques [3]. In the multimodal biometric systems that we are using, we take a unique combination of Iris and Speech Recognition coupled together to provide a tool for Biometric identification of an individual [4, 5]. The uniqueness of our research lies in this that we use wavelet transformation in our analysis of the iris image and speech signal. Instead of Gabor filters or Phasor coefficients we use Daubechies wavelets at level 4 decomposition. Section I and II describe the preprocessing of the iris image and speech signal. Section III deals with the analysis while
Section IV describes feature extraction and the wavelet family used. Section V discusses decision theory and the final authentication. Section VI then lists our results and the performance evaluation. 2. Iris Localization The eye image is subjected to preprocessing before we proceed with the most intensive part of our feature extraction, the segmentation. The image cannot be directly used in some cases where the iris is highly pigmented and additionally contrast has to be generated through preprocessing technique called Contrast stretching. Contrast is defined as the difference between the gray scale levels of the constituent pixels in an image. Contrast stretching can be linear or non linear, we prefer to use gamma adjustment particularly with a gamma of greater than 1 to create greater contrast in a darker band of gray scale levels. Now we need to concentrate on the following steps which become the chief objective of our segmentation. (1) Finding the centers and radii of iris and pupil, these regions are not necessarily concentric. [7] (2) Removing the occluding eye lashes: bunched and isolated[7] (3). Removing the Specular reflections These steps have been already proposed in the Daugman’s algorithm [11] [12] [13]. The circular contours of the iris and pupil region could be extracted by performing the Circular Hough transform which narrows down our search to the circular curves. Before even starting the process of tracing the circular contours in our eye image we need to find the
978-1-4244-3703-0/09/$25.00 ©2009 IEEE 2336
Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, 12-15 July 2009 edges. As proposed by Daugman in his papers on iris recognition [11, 2, 13] we use the canny edge detection technique for detection of all the edges that could be possibly found in the image. The edge mapping that we usually use would be gradient based. For this we need to essentially use binary images with only two gray levels.[15, 16] The orientation of these gradient calculated while performing the edge detection are used further in order to create finer edges as they help us in tracing the local maxima[17].. Where While (1) The Canny Edge detection used is as follows: We begin with an edge of known cross-section bathed in white Gaussian noise (Fig. 2(a)). We convolve this with a filter whose impulse response could be illustrated by either Fig. 2(b) or (d). We will mark the center of an edge at a local maximum in the output of the convolution. The design problem then becomes one of finding the filter which gives the best performance [18].
2 4 5 4 2
4 9 12 9 4
5 12 15 12 5
4 9 12 9 4
2 4 5 4 2
Hough transform (HT) is the most well-known method for circle detection. Let (x, y) be an edge pixel on a circle with center coordinates (a, b) and radius r; then the circle can be expressed as (3) From Eq. (2), every edge pixel in the image can be mapped into a conic surface in the 3-dimensional (3-D) (a, b, r)parameter space. Using the conventional HT (CHT) [6] for detecting circles, it requires a large amount of computing time to vote on such a 3-D array, i.e., a 3-D accumulator [20] [21]. In particular, for the circular limbic or papillary boundaries , a and a set of recovered edge points Hough transform is defined as (4) Where And [17].
Figure 2. Canny Edge Detection [18].
The Gaussian filter is defined by the following transfer function. − (( u − µu ) 2 + ( v − µ v ) 2 )
H (u , v ) = e
2
2 (σ u +σ v2 )
(2)
Where, µ u is the mean in the u direction, µ v is the mean in the v direction, σ u is the standard deviation in the u direction, σ v is the standard deviation in the v direction. The Gaussian Mask used is as follows: [19]
Figure 3. Circular Hough Transform [22].
There are two types of eyelashes; bunched eyelashes removed by using simple thresholding and isolated eyelashes removed by using radon transformation. The simple thresholding is carried out in which all pixel values below a certain level v1 are removed. Radon transformation is very essential in tracing or extracting lines from a noisy figure, it generally uses the parametric form of the line as presented below:
2337
Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, 12-15 July 2009
ρ = x cos θ + y sin θ (5) ρ Where, is the smallest (perpendicular) distance of the line from the origin and θ is the angle that the perpendicular makes with the real axis. The radon transform is given by: ^
∞ ∞
g ( ρ , θ ) = ∫ ∫ g ( x , y )δ ( ρ − x cos θ − y sin θ ) dxdy − ∞− ∞
Where, δ
(6)
is the Delta Dirac function which takes a value 1 only where the independent variable is equal to zero, so we are integrating all those points which lie on the line represented by equation 5, G(x, y) is a pixel in the image [23]. The points between these inner and outer boundary contours are interpolated linearly by a homogeneous rubber sheet model, which automatically reverses the iris pattern deformations caused by pupillary dilation or constriction. Under assumptions of uniform iris elasticity (which may be questionable), this normalization maps the iris tissue into a doubly-dimensionless coordinate system. The remapping or normalization of the iris image I(x, y) from raw coordinates (x, y) to a doubly dimensionless and non concentric coordinate system (r, θ) can be represented as
Figure 4. Segmented Speech signal
We segment out the blank samples leaving behind the samples with information which we analyze further using frame analysis. For the frame analysis, we pull out the samples with highest density of signal and use it for the pitch period estimation. 4. The Analysis Once the preprocessing stage has been completed for both the iris and speech, we move on to the feature extraction stage.
(7) where x(r, θ) and y(r, θ) are defined as linear combinations between the set of pupillary boundary points (xp(θ), yp(θ)) determined by the internal active contour, and the set of outer boundary points along the limbus (xs(θ), ys(θ)) determined by the external active contour describing the iris/sclera boundary
(8) This homogeneous rubber sheet model maps the iris into a dimensionless, normalized coordinate system that is sizeinvariant, and therefore invariant to changes in the target distance and the optical magnification factor, as well as invariant to the position of the eye in the image, and invariant to the pupil dilation (assuming uniform iris elasticity) [6]. 3. Speech Segmentation and Frame Analysis Segmentation and frame analysis are the usual pre-processing steps before pitch-period estimation of the speech signal [25] [26] [27]. Segmentation involves the isolation of the voiced/unvoiced components of the input by neglecting the silence durations [24].
Figure 5. The process of analysis
[28]. Usually the new user who enrolls into the system has to provide the eye image and the speech sample which undergoes the process of preprocessing and segmentation as described above. These processed signals are then subjected to the wavelet analysis which, as mentioned above, are far more superior to the Fourier Analysis in terms of energy compaction and Multi resolution Approach [30][32][34]. After wavelet encoding of the iris pattern and speech sample we get the templates that are stored in the databases locally generated in the system. During the process of authentication that is the identification of a user as an authorized user or an imposture, we need to compare the templates acquired in situ with the templates already stored in the database as mentioned above.
2338
Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, 12-15 July 2009 For iris template comparisons, we use the hamming distance as a criteria for decision making. The coefficients are equivalent to one if their value is higher than the mean coefficient of the entire image; else they are zero. The hamming distance [29] is calculated between the two templates and the result is compared to a threshold, depending upon which the iris is a match or not a match. For the speech template, however, we use pitch period estimation. We find the cross correlation coefficients of the two templates. If the coefficient is 1, then the speech signals are a match, else they are not [24]. 5. Feature Extraction Using Wavelets For the analysis of the iris and the speech templates we are performing wavelet analysis which is a Multi resolution Analysis [30]. The iris pattern is marked by many discontinuities like spikes and it is better to model it using the wavelets that do not extend infinitely into the space. Similarly the speech signal which is a non stationary signal cannot be analyzed using the Fourier series or transform which models the time-dependent signal whose frequency components remain the same throughout the length of the signal or the non stationary signal. where τ is the We consider a mother wavelet function translation factor and s is the scaling factor. Translation factor determines the movement of the mother wavelet over the input vector and scaling factor determines the expanse of the wavelet whether it will be highly compressed or highly expanded . (9) Like all the other transforms this function is taken as the basis function but the difference being that it only exists in a finite interval, it can be expanded and contracted with a factor s to change it’s resolution and it can be time shifted according to the t factor[31]. If the input vector is v, then (10) If v is a function of time t then - (11) This represents the property of orthogonality [32] and then we have orthonormal vectors in which the vectors are orthogonal with respect to each other but the wavelet function is additionally having the length equal to unity.
<Φ, Φ>=
∫ φ (t )φ * (t )dt = 1
(12)
For the dbN family of signals, the support length of ψ and φ is 2N-1, and number of vanishing moments of φ is N. If we can make the moments of wavelet function to be zero up to a certain order K - 1, for any polynomial with order lower than K, all its wavelet coefficients will be zero [33]. The choice of the wavelet algorithm depends on the application. The Haar or Gabor wavelet algorithm has the advantage of being simple to compute and easier to understand. The Daubechies algorithm has a slightly higher computational overhead and is conceptually more complex. There is overlap between iterations in the Daubechies transform step. This overlap allows the Daubechies algorithm to pick up detail that is missed by the Haar or Gabor wavelet algorithm [34]. We are dealing with the Daubechies wavelets which are orthonormal. We use for our analysis Db4 and Db3 wavelets and to carry out level four decomposition in our wavelets analysis. The lowpass filter is given by [35]
(13) The highpass filter is given by (14) Once we have the filter coefficients we can create the synthesis vector and then carry out the four levels of decomposition to obtain the templates for further analysis. 6. The Final Authorization To decide whether the user is authentic or not, decisions of both the iris and speech must be weighed. If decisions from both are identical (i.e. both authenticate the user or both detect the user as an imposter) the final choice is easy to make. The problem arises, however, when the two systems do not concur in their decision. That is to say that the iris recognizes the match with user A while speech recognizes the match with user B, or if one authenticates and the other does not. To resolve this problem, we make use of decision theory [36]. We convert the two output templates (for iris and speech
2339
Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, 12-15 July 2009 templates) into probability distributions and use the Min-Max criterion [37]. The min max decision region is defined by
(15) For all Z2 ≠ . In other words, is the decision region that yields the minimum cost for the least favorable P {m1}. Three cases arise and we shall examine how to resolve each one by one. Case I: Iris matches with User A, Speech matches User B {(1, 1), (Name1 ≠ Name2)}. In this case, we calculate the means square error between the two Iris templates, and the two speech templates. The decision follows from whichever mean square error is least, and the final decision identifies the user A is MSE of iris is lower else it matches user B. (16) are the two Iris templates of User A, and where and are the two speech templates of User B. is the name of the match with minimum MSE. Case II: Iris states imposter, speech matches User A {(0, 1), (Imposter, Name1)}. Here, we calculate the mean square error of the two respective pairs of templates. If the MSE of the Iris hamming distance matrix is lower then the user is an imposter, and if the MSE of the speech matrix is lower the user is matched.
samples and barring one or two samples as shown by the graph plotting time taken in seconds on the y-axis and sample number on the x-axis, Db4 took lesser time to encode.
Figure 6. Encoding time
The success rate of the system was found to be 99.6% with a false reject rate of 0.4%. This was evaluated by comparing the stored templates in the database and the templates generated by taking the samples randomly or in situ. The PSNR or Peak Signal to noise ratio of the original image and image reconstructed from the Db3 and Db4 wavelet fourth level approximation coefficients was compared and the PSNR of Db4 reconstructed image is found to be better than the PSNR of Db3 reconstructed image. This is shown in following figure:
Where
and are the two Iris templates of User A, and are the two speech templates of User B. is the decision of the template pair with minimum MSE. Case III: Iris matches User A, speech states imposter {(1, 0), (Name1, Imposter)} which is treated similar to Case II. The final decision is then displayed and the user is either authenticated or rejected. 7. Performance evaluation and results We carried out the analysis in MATLAB 7.0.1 © and were able to compare the extraction patterns of db3 and db4 wavelet transforms and their computation time. For the preliminary performance analysis of our proposed method, this paper use the CASIA-IrisV3 collected by the Chinese Academy of Sciences’ Institute of Automation (CASIA)”, containing three types of iris images [38]. The time complexity of the encoding process in case of iris and speech with respect to Db3 and Db4 was tested for twelve 2340
Figure 7. Comparison of PSNR values of Db3 and Db4
Figure 8(a) Original image
Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, 12-15 July 2009
Figure 8(b) image with noise added
In the eye images we had included ‘salt and pepper’ noise and found the following modification occurred when the noise was added to the original image. The success rate of authentication of the system with noisy Iris images was again 92% as compared to the stand alone Iris recognition system alone, where noise tolerance is extremely low. The segmentation time and search techniques have scope for further optimization, which the authors propose to work on in the future.
References [1]. Internet Report, “Biometric Identification Systems”, Technovelgy.com [2]. Solayappan, N., Latifi, S. “A Survey of Unimodal Biometric Methods”, University of Nevada, Las Vegas. [3]. Ross, A., Jain, A. K. “Multimodal Biometrics: An Overview”, Proc. of 12th European Signal Processing Conference (EUSIPCO), (Vienna, Austria), pp. 12211224, September 2004. [4]. Internet Report, “Biometrics”, Wikipedia [5]. Khan, I. “Multimodal Biometric-Is two better than one?”, Findbiometrics.com. [6]. Daugman, J., “Probing the Uniqueness and Randomness of Iris Codes: Results from 200 Billion Iris Pair Comparisons”, Vol. 94, 0018-9219/$20.00 _2006 IEEE No. 11, November 2006 | Proceedings of the IEEE. [7]. Gupta, P., Mehrotra, H., Rattani, A., Chatterjee, A., Kaushik, A.K. “Iris Recognition using Corner Detection”. [8]. Boles, W.W., Boashash, B. “A Human Identification technique using Image of the Iris and Wavelet Transform”, IEEE transactions on signal processing, vol. 46, no. 4, April 1998. [9]. Ballard D. H. and Brown C. M., “Computer Vision”. Englewood Cliffs, NJ: Prentice-Hall, 1982.
[10]. W. K. Pratt, “Digital Image Processing”. New York: Wiley,1978. [11]. Daugman, J. (1993) "High confidence visual recognition of persons by a test of statistical independence." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 15(11), pp. 1148-1161. [12]. Daugman, J, “Biometric decision landscapes”, Technical Report www.cl.cam.ac.uk/TechReport.s [13]. Daugman, J. “Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression”, IEEE Trans., 1988, ASP-X, (7) [14]. Daugman, J. and Downing, C. “Epigenetic Randomness, Complexity and Singularity of human iris patterns”, The Computer Laboratory, University of Cambridge. [15]. Daugman, J. “The importance of being random: statistical principles of iris recognition”, The Computer Laboratory, University of Cambridge. [16]. Daugman, J. “New Methods in Iris Recognition”, IEEE transactions on systems, man, and cybernetics—part b: cybernetics, vol. 37, no. 5, October 2007 [17]. Wildes, R.P. “Iris Recognition: An emerging Biometric Technology”, proceedings of the IEEE vol. 85, no. 9, September 1997 [18]. Canny, J. “A Computational Approach to Edge Detection”, IEEE transactions on pattern analysis and machine intelligence, vol. Pami-8, no. 6, November 1986 [19]. Green, B. “Canny Edge Detection tutorial”, Internet Report, Drexel.edu [20]. Teh-Chuan Chen and Kuo-Liang Chung, “An Efficient Randomized Algorithm for Detecting Circles”, Computer Vision and Image Understanding 83, 172–191 (2001). [21]. Qi-Chuan Tian, Quan Pan, Yong-Mei Cheng, Quan-Xue Gao “Fast Algorithm And Application Of Hough Transform In Iris Segmentation”, Proceedings of the Third International Conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004. [22]. Just, S. and Pedersen, K. “Circular Hough Transform”, Aalborg University, Vision, Graphics, and Interactive Systems, November 2007. [23]. Toft, P. “The Radon Transform - Theory and Implementation”, PhD Thesis, IMM, DTU, 1996. [24]. Bernadin, S.L and Foo, S.Y “Wavelet Processing for pitch period estimation”, Proceedings of the 38th Southeastern Symposium on System Theory Tennessee Technological University.
2341
Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding, 12-15 July 2009 [25]. Johnson, D. “Modeling the Speech Signal”, [26]. Rabiner, L.R. “Speaker Recognition”. [27]. Internet report, “Wav”, Wikipedia.com [28]. Li Ma, Tieniu Tan, Yunhong Wang and Dexin Zhang “Personal Identification Based on Iris Texture Analysis”, ieee transactions on pattern analysis and machine intelligence, vol. 25, no. 12, december 2003. [29]. C. Han Yang, “Hamming Distance”, Thesis on Prioritized Model Checking, Stanford.edu. [30]. Strang, G. and Nguyen, T. “Wavelets and Filter Banks”, Wellesley-Cambridge Press. [31]. Boggess, A. and Narcowich, F. J. “A First Course in Wavelets using Fourier analysis”. [32]. Polikar, R. “A wavelet Tutorial”.
[33]. Qiao, Feng, Milam, Rachael, “Moments and Vanishing Wavelet Moments”, Version 2.1: Jun 7, 2005 3:04 pm GMT-5. [34]. Internet report, “Daubechies 4 wavelet”, bearcave.com [35]. Sandberg, K. “The Daubechies wavelet transform”, Dept. of Applied Mathematics, University of Colorado at Boulder. [36]. Bahl, L.R., Jelinek, F., Mercer, R.L. "A Maximum Likelihood Approach to Continuous Speech Recognition." IEEE Journal of Pattern Analysis and Machine Intelligence (1983) [37]. Melsa, J.L., Cohn, D. L. “Decision and Estimation Theory”, Mc-Graw Hill, 1978. [38]. “CASIA-IrisV3”, www.cbsr.ia.ac.cn/IrisDatabase.htm
2342