2010 International Conference on Pattern Recognition

A Gradient Descent Approach for Multi-Modal Biometric Identification

Nalini Ratha IBM TJ Watson Research Centre, USA [email protected]

Jayanta Basak, Kiran Kate,Vivek Tyagi IBM Research - India, India bjayanta, kirankate, [email protected]

engines is assumed to be available [3], [4]. At score level, the matching score (similarity or dissimilarity) is available for fusion purposes [5], [6]. These two methods tend be highly popular as they do not require any prior knowledge about the underlying matching algorithm. The next level of fusion involves integrating features from different biometrics engines. At the lowest level of integration, the signals acquired for a modality can be fused to improve the basic signal or supplement the signal with additional information. The feature-level and signal-level fusion while promising, have not been focus of research in many cases. Similarly, as pointed out earlier, most of the current research is focussed around verification problem. For identification systems, often rank based fusion is also considered where the candidate list generated by the biometrics classifier is used as the input to the fusion stage [7], [3]. In this paper, we focus on score level fusion for identification problems. There are two stages in the proposed algorithm. In the training phase, we try to learn weights for each modality in such a way that the weighted score of genuine candidate is more than the weighted score of impostors. The weights are learnt using a gradient descent method. In the testing stage, these weights are used for the members of the candidate list from each modality. Our method has been tested using the NIST BSSR-1 dataset. We compare our results with published results on this dataset and demonstrate the superiority of our algorithm both in terms of accuracy and speed. The rest of the paper is organized as follows. In section 2, we formulate the weight learning problem and present the gradient descent algorithm for learning these weights. The database used in testing the proposed algorithm along with the results are presented in Section 3. We present conclusions from our work in Section 4.

Abstract—While biometrics-based identification is a key technology in many critical applications such as searching for an identity in a watch list or checking for duplicates in a citizen ID card system, there are many technical challenges in building a solution because the size of the database can be very large (often in 100s of millions) and the intrinsic errors with the underlying biometrics engines. Often multi-modal biometrics is proposed as a way to improve the underlying biometrics accuracy performance. In this paper, we propose a scorebased fusion scheme tailored for identification applications. The proposed algorithm uses a gradient descent method to learn weights for each modality such that weighted sum of genuine scores is larger than the weighted sum of all the impostor scores. During the identification phase, top K candidates from each modality are retrieved and a super-set of identities is constructed. Using the learnt weights, we compute the weighted score for all the candidates in the superset. The highest scoring candidate is declared as the top candidate for identification. The proposed algorithm has been tested using NIST BSSR1 dataset and results in terms of accuracy as well as the speed (execution time) are shown to be far superior than the published results on this dataset.

I. I NTRODUCTION Biometrics-based recognition systems have already proven to be useful in many high security applications. A key step in building a trusted biometrics systems requires a secure enrolment process where no duplicates will be allowed into the system. For example, citizen ID card systems, passport issue systems and voter ID systems can not function with duplicates in the system. Such 1:N matching systems are also needed for watch-list matching. Often it is believed that one can use a 1:1 biometrics system and iterate over the database to pick the best matching cases from the database. Such approach can be non-scalable and error prone. Multibiometrics systems have been proposed to improve accuracy performance for both verification and ID systems. While the multi-biometrics for verification systems have received considerable interest in the research community, very little research has been reported for identification systems involving biometrics fusion [1]. It is assumed that identification based on such approaches would improve performance as the core biometrics authentication performance improves using multi-biometrics. Biometrics fusion can happen at several levels [2]: decision level, score level, feature level and signal level. In decision level fusion, only decisions from the biometrics 1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.329

II. G RADIENT D ESCENT A PPROACH Let x be a query sample for an individual, and r(x, y) be the matching score when an individual y is matched against x. We can have M different modalities. For example, we can have two different fingerprint matchers, and two different face matchers. In that case, we have M = 4. Let us represent the matching score of the individual x against y for ith modality as ri (x, y). We view the score fusion such a way that the total matching score for an individual x matched 1326 1322

against x should be highest when compared to matching scores with other individuals. We assign certain weight to each modality. The task is to determine these weights such that for any x, the weighted sum of the matching scores against that particular x should be maximum. In other words,   wi ri (x, x) > wi ri (x, y) (1) i

In other words, assuming the local error surface to be linear, we change the weight such that the total error reduces to zero. Further we have  ∂E Δwi (8) ΔE = ∂wi i From Equation 5, we have

i

for all y = x. In order to have a margin of separation between the maximum and the next maximum value, we modify the condition as   wi ri (x, x) > λ wi ri (x, y) (2) i

ΔE = −η

2 M   ∂E i=1

∂wi

(9)

Using linear error surface, from Equations 7 and 9, we have E η=  2 M ∂E i=1

i

where λ = 1 + ,  > 0 being a small constant. In order to determine the weights, first we define an error measure for a pair of samples (x, y). If for a pair (x, y), the condition as in Equation 2 is satisfied then there is no error, otherwise the pair contributes to the error measure. The error measure is given as  M 0 if i=1 wi (ri (x, x) − λri (x, y)) > 0 M exy = w (λr otherwise i (x, y) − ri (x, x)) i=1 i (3) The objective of gradient descent based approach is to adapt the weights such that the total error thus defined is minimized. We define the total error as 1 2 e (4) E= 2 x,y xy

(10)

∂wi

Expanding the partial, we obtain the learning (adaptation) rate η as  2 x,y exy η=  (11)  2 M 2 i=1 e (r (x, x) − λr (x, y)) xy i i x,y In the vicinity of the minima of the error surface, the denominator of Equation 11 becomes very small, and that can cause instability in the value of η. We therefore modify the value of η as  2 x,y exy η= 2 (12) M  1 + 2 i=1 e (r (x, x) − λr (x, y)) i x,y xy i During the training phase, we adapt the weights for certain number of iterations until the change in weights becomes too small (less than certain specified threshold or for a specified number of iterations). Once the weights are obtained from the training data, we use these weights for the test data. For each query, we retrieve top K samples for each modality. We obtain the superset of these K samples. In computing the superset, some of the samples may not have scores for all modalities. We use the minimum score that has been retrieved for the respective modality to fill the blank scores of these samples. For example, let in the fingerprint modality, the retrieved samples be A, B, and C; and for the face modality, the retrieved samples be B, C, and D. Let the order be A > B > C and B > C > D respectively. We do not fill face score of A as zero. Similarly we do not fill finger score for D as zero. We fill the face score for A as the same as that of D and the finger score for D as the same as C (the minimum scores). Here the size of the superset is 4. Once we have the superset with respective scores, we compute the weighted aggregated scores using the weights as learned during the training phase. We then obtain the sample which has the maximum aggregated score as the matched identity. We also obtain top k identities according to the aggregated score and find out if the query sample is contained in the top k samples thus obtained for a specific value of K.

Since exy is not symmetric, it is not possible to perform linear regression over wi to determine the optimal weights. We determine the optimal weights by gradient descent given as ∂E (5) Δwi = −η ∂wi where η is an adaptation or learning rate. Evaluating the partial, we obtain  exy (ri (x, x) − λri (x, y)) (6) Δwi = η x,y

We can observe that the pair of samples (x, y) for which exy = 0 do not contribute to the weight adaptation. If for certain modality, the exy > 0 then the method reduces the weight (negative contribution to Δwi ) for that particular modality and vice-versa. The convergence depends on the selection of η. For a very small η, the convergence can be slow whereas for a relatively large value of η, the weights may not converge at all. We determine η automatically using line search method. We adapt the weights in each iteration such that E + ΔE = 0 (7)

1323 1327

III. E XPERIMENTAL R ESULTS

with half-half training-test split of the dataset. The table clearly shows that our method is much faster than the LRTGMM method. The large difference in the execution time for NIST-Finger dataset shows that our method is more scalable than LRT-GMM. It is to be noted that the highest rank and gradient descent algorithms were implemented in MATLAB and LRT-GMM in C++. Higher values of execution time of the highest rank method may be attributed to its sub-optimal implementation in MATLAB. In real life identification scenarios, the training is performed off-line and hence only the efficiency of testing phase affects the identification performance. The testing phase of our method involves ranking according to the weighted sum of scores of individual modalities and hence is efficient and scalable.

We have evaluated the identification performance of the gradient descent method on a public-domain dataset NISTBSSR1 [8]. This dataset contains multi-modal (two fingerprint and two face) scores for 517 users (NIST-517). It also contains scores from two fingerprint matchers (left and right index fingerprint) for 6000 users (NIST-Fingerprint) and scores from two face matchers for 3000 users (NIST-Face). We have compared the gradient descent approach for fusion with the individual modalities and a score fusion method and a rank fusion method. The score fusion technique is a likelihood ratio based identification as proposed in [1] which is referred to as LRT-GMM in this section. The rank fusion is the well-known highest rank method [9] which assigns a rank to a user such that it is the minimum of all the ranks assigned by different matchers. In accordance with [1], we observed that this rank fusion technique gives better performance than other methods like sum rule, Borda count, logistic regression for ranks greater than the number of matchers. The cumulative match characteristic (CMC) curves in Fig. 1 and Fig. 2 show the results for the NIST-Face and NISTFinger datasets respectively. Since the size of the NIST-517 dataset is very small and our method as well as LRT-GMM resulted in 100% rank-1 accuracy, we have not reported results on that dataset. The identification accuracies are the average values over 20 trials where each trial was conducted by randomly splitting the number of users in the dataset into half for training and half for testing. The value of K is 50 for all these experiments. While the performance of our method was good on the original dataset, we observed that it improved after a simple preprocessing step. The preprocessing involved raising the data points with a positive power. We have chosen the values of power empirically and have reported them in the results. The CMC curves and Table I indicate that our method outperforms the existing methods by a significant amount for all values of rank. For the NIST-Face dataset, the rank1 to rank-10 accuracies of our method are approximately 0.7-1% better than those of the LRT-GMM technique. The rank-1 accuracy of our method is around 4% better than that of the highest rank method and rank-2 to rank-10 accuracies are also better by 0.3-1%. Clearly, the fusion achieves a large improvement in the identification accuracy values over the individual matchers for all values of rank. Similar results are observed for the NIST-Finger dataset as well. Our method achieves an improvement of 0.5-0.7% in the accuracy over LRT-GMM consistently from rank 1 to 10. The rank-1 accuracy is around 9% greater than that of the highest rank fusion. Table II gives the execution time in seconds for different methods. We have measured the execution time for calculation of rank-1 accuracy for a single trial of the experiment

Rank−k Identification Accuracy (%)

95

90

85 Face Matcher 1 Face Matcher 2 Highest Rank Fusion LRT−GMM Gradient Descent Power = 1.2 80

1

2

Figure 1.

3

4

5 6 Rank (k)

7

8

9

10

Comparison of CMC curves on NIST-Face database

98

Rank−k Identification Accuracy (%)

96 94 92 90 88 86 Left Index Finger Right Index Finger Highest Rank Fusion LRT−GMM Gradient Descent Power = 0.5

84 82 80

1

Figure 2.

1324 1328

2

3

4

5 6 Rank (k)

7

8

9

10

Comparison of CMC curves on NIST-Finger database

Table I C OMPARISON OF RANK - K IDENTIFICATION ACCURACY (%).

Dataset Face

Finger

Method Matcher 1 Matcher 2 Highest Rank LRT-GMM Gradient Descent Matcher 1 Matcher 2 Highest Rank LRT-GMM Gradient Descent

k=1 84.47 81.13 83.06 87.24 87.91 81.80 88.71 85.71 94.33 94.94

k=2 87.25 84.57 89.23 89.48 90.51 83.73 90.18 95.00 95.09 95.65

k=3 88.57 86.08 90.68 90.64 91.65 84.56 90.87 95.49 95.40 95.97

Table II C OMPARISON OF EXECUTION TIME ( IN S ECONDS ) FOR DIFFERENT FUSION METHODS .

Method LRT-GMM Gradient Descent Highest Rank

NIST-Face Train Test 338.15 62.46 277.14 13.76 N/A 168.88

[5] M. Villegas and R. Paredes, “Score fusion by maximizing the area under the roc curve,” in Pattern Recognition and Image Analysis: 4th Iberian Conference, IbPRIA, 2009. [6] K. Nandakumar, Y. Chen, S. C. Dass, and A. K. Jain, “Likelihood ratio based biometric score fusion,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp. 342–347, 2008. [7] A. Abaza and A. Ross, “Quality based rank-level fusion in multibiometric systems,” in In proc. of IEEE BTAS, 2009. [8] “National institute of standards and technology, nist biometric scores set release 1, http://www.itl.nist.gov/iad/894.03/biometricscores/,” 2004. [9] T. K. Ho, J. J. Hull, and S. N. Srihari, “Decision combination in multiple classifier systems,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66–75, 1994.

NIST-Finger Train Test 1809.05 593.18 194.58 11.54 N/A 671.67

IV. C ONCLUSION Designing effective and scalable biometric fusion algorithms for identification is an important and challenging problem. At the level of score fusion, the weighted sum of scores is a well known method. In this paper, we proposed a gradient descent technique to learn these weights for individual modalities. We considered linear combination of the modalities to perform the identification, however, the method can be extended to incorporate non-linear combinations as well. We conducted experiments on the NISTBSSR1 dataset, and demonstrated that the gradient-descent approach is able to perform better than the state-of-the-art identification algorithms in terms of accuracy and speed. As a future study we will perform the experiments on other multi-modal biometric datasets. R EFERENCES [1] K. Nandakumar, A. K. Jain, and A. Ross, “Fusion in multibiometric identification systems: What about the missing data?,” in ICB ’09: Proceedings of the Third International Conference on Biometrics, pp. 743–752, 2009. [2] A. K. Jain, K. Nandakumar, and A. Ross, “Score normalization in multimodal biometric systems,” Pattern Recognition, vol. 38, no. 12, pp. 2270–2285, 2005. [3] M. L. Gavrilova and M. M. Monwar, “Fusing multiple matcher’s outputs for secure human identification,” International Journal of Biometrics, vol. 1, no. 3, pp. 329–348, 2009. [4] S. Prabhakar and A. K. Jain, “Decision-level fusion in fingerprint verification,” Pattern Recognition, vol. 35, no. 4, pp. 861– 874, 2002.

1325 1329

A Gradient Descent Approach for Multi-modal Biometric ...

A Gradient Descent Approach for Multi-Modal Biometric Identification. Jayanta Basak, Kiran Kate,Vivek Tyagi. IBM Research - India, India bjayanta, kirankate, vivetyag@in.ibm.com. Nalini Ratha. IBM TJ Watson Research Centre, USA ratha@us.ibm.com. Abstract—While biometrics-based identification is a key technology in ...

394KB Sizes 3 Downloads 231 Views

Recommend Documents

Functional Gradient Descent Optimization for ... - public.asu.edu
{v1,...,vp} of Vehicles Under Test (VUT). The state vector for the overall system is x ..... [2] K. Bengler, K. Dietmayer, B. Farber, M. Maurer, C. Stiller, and. H. Winner ...

A Block-Based Gradient Descent Search Algorithm for ...
is proposed in this paper to perform block motion estimation in video coding. .... shoulder sequences typical in video conferencing. The NTSS adds ... Hence, we call our .... services at p x 64 kbits,” ITU-T Recommendation H.261, Mar. 1993.

QPLC: A novel multimodal biometric score fusion method
many statistical data analysis to suppress the impact of outliers. In a biometrics system, there are two score distributions: genuine and impostor as shown in Fig.

a decison theory based multimodal biometric authentication system ...
Jul 15, 2009 - ... MULTIMODAL BIOMETRIC. AUTHENTICATION SYSTEM USING WAVELET TRANSFORM ... identification is security. Most biometric systems ..... Biometric Methods”, University of Nevada, Las Vegas. [3]. Ross, A., Jain, A. K. ...

Functional Gradient Descent Optimization for Automatic ...
Before fully or partially automated vehicles can operate ... E-mail:{etuncali, shakiba.yaghoubi, tpavlic, ... from multi-fidelity optimization so that these automated.

Hybrid Approximate Gradient and Stochastic Descent for Falsification ...
able. In this section, we show that a number of system linearizations along the trajectory will help us approximate the descent directions. 1 s xo. X2dot sin.

a decison theory based multimodal biometric ...
Jul 15, 2009 - E-MAIL: [email protected], [email protected], [email protected], ... gamma of greater than 1 to create greater contrast in a darker band of .... For the analysis of the iris and the speech templates we are.

Hybrid Approximate Gradient and Stochastic Descent ...
and BREACH [13] are two software toolboxes that can be used for falsification of .... when we deal with black-box systems where no analytical information about ...

cost-sensitive boosting algorithms as gradient descent
Nov 25, 2008 - aBoost can be fitted in a gradient descent optimization frame- work, which is important for analyzing and devising its pro- cedure. Cost sensitive boosting ... and most successful algorithms for pattern recognition tasks. AdaBoost [1]

cost-sensitive boosting algorithms as gradient descent
Nov 25, 2008 - on training data. ... (c) Reweight: update weights of training data w(t) i. = w(t−1) .... where ai,bi are related to the cost parameters and label infor-.

Gradient Descent Efficiently Finds the Cubic ...
at most logarithmic dependence on the problem dimension. 1 Introduction. We study the .... b(1) = 0 every partial limit of gradient descent satisfies Claim 1 and is therefore the unique global minimum s, which ... The slopes in this log-log plot reve

Gradient Descent Only Converges to Minimizers: Non ...
min x∈RN f (x). Typical way; Gradient Descent (GD) xk+1 = xk − α∇f (xk),. (1) ... Page 5 ... x∗ is a critical point of f if ∇f (x∗) = 0 (uncountably many!). ▷ x∗ is ...

a video-based biometric authentication for e- learning ...
The Internet popularization enabled the development of technologies that ... costs, speed for acquiring knowledge, self-paced learning ..... presents a high level rate, considering the total amount of ... Strategic Business Report. Marais, E.

A Dual Coordinate Descent Algorithm for SVMs ... - Research at Google
International Journal of Foundations of Computer Science c World ..... Otherwise Qii = 0 and the objective function is a second-degree polynomial in β. Let β0 ...

A Gradient Based Method for Fully Constrained Least ...
IEEE/SP 15th Workshop on. IEEE, 2009, pp. 729–732. [4] J. Chen, C. Richard, P. Honeine, H. Lantéri, and C. Theys, “Sys- tem identification under non-negativity constraints,” in Proc. of. European Conference on Signal Processing, Aalborg, Denma

Towards a 3D digital multimodal curriculum for the ... - Semantic Scholar
Apr 9, 2010 - ACEC2010: DIGITAL DIVERSITY CONFERENCE ... students in the primary and secondary years with an open-ended set of 3D .... [voice over or dialogue], audio [music and sound effects], spatial design (proximity, layout or.

A tandem clustering process for multimodal datasets
clustering process (TCP) designed for data with ... tional clustering techniques are hierarchical ..... [2] P. Berkin, Survey of Clustering Data Mining Techniques,.

Multimodal Signal Processing and Interaction for a ...
attention and fatigue state is based on video data (e.g., facial ex- pression, head ... ment analysis – ICARE – Interaction modality – OpenInterface. – Software ..... elementary components are defined: Device components and Interaction ...

Towards a 3D digital multimodal curriculum for the ...
Apr 9, 2010 - (http://www.kahootz.com) to all primary and secondary schools in their ..... Submitted to Australian Journal of Educational Technology.

Bema: A Multimodal Interface for Expert Experiential ... - Bret L. Jackson
technique, re-conceived to support multi-touch input within a 4- wall Cave .... 3D tracking, and other forms of computer input to create coherent multimodal ..... conducted via desktop-based visualization; the first aspect of the interface to assess 

Towards a 3D digital multimodal curriculum for the ... - Semantic Scholar
Apr 9, 2010 - movies, radio, television, DVDs, texting, youtube, Web pages, facebook, ... and 57% of those who use the internet, are media creators, having.

Multimodal Signal Processing and Interaction for a Driving ... - CiteSeerX
In this paper we focus on the software design of a multimodal driving simulator ..... take into account velocity characteristics of the blinks are re- ported to have ...