Analysis of i-vector Length Normalization in Speaker Recognition Systems Daniel Garcia-Romero Carol Espy-Wilson Department of Electrical & Computer Engineering University of Maryland, College Park, MD, USA 1
Introduction • Probabilistic generative models of i-vectors: – Gaussian-PLDA (G-PLDA) [Prince, 2007] • Simple and fast due to closed-form solutions
– Heavy-Tailed PLDA (HT-PLDA) [Kenny, 2010] • Superior performance -> empirical evidence of non-Gaussianity
• GOAL: Get the best of both worlds! – Keep the simple Gaussian model – Achieve performance equivalent to HT-PLDA • HOW? – Transform the i-vectors to reduce non-Gaussian behavior – Use G-PLDA for the model 2
Outline • Overview of the elements of the speaker recognition system relevant to this work • Identification of a major source of non-Gaussian behavior
• Propose nonlinear transformation of i-vectors to compensate it • Validate the ideas on cond. 5 of SRE10 evaluation • Conclusions
3
i-vector extractor (overview) Development data
ML + min DIV subspace MFCC extraction
MAP point estimate Alignment with Gaussians
4
i-vector extractor (details) Weighted Least Squares
Regularization
• i-vector is a “shrunk” version of the weighted least squares solution • The amount of shrinkage of each coordinate depends on the eigenvalues of Regularization path
5
Generative models of i-vectors • Ignore i-vector extractor and prescribe a gen. model • Simplified version of PLDA [Kenny, 2010]: Gaussian PLDA
Heavy-tailed PLDA +
• Hyper-params
using ML and min. DIV
• Development set should be close to evaluation set 6
Full recognition system Development data
DEVELOPMENT STAGE
ML + min DIV subspace i-vector extractor
Development data i-vectors:
PLDA training
EVALUATION STAGE Test 1 Test 2
i-vector extractor
PLDA scoring
Score 7
i-vector length analysis • i-vector extractor with min DIV -> i-vectors • Let , then with + SRE10 – eval tel data (C5) + DEV data: SRE04, 05, 06, Fisher and Switchboard
Dataset shift
• i-vec. extraction procedure -> mismatch dev and eval 8
i-vector transformation • Radial Gaussianization (RG) [Lyu et. al, 2009]: – Nonlinear transf. that Gaussianizes the family of Elliptically Symmetric Densities (ESD) (e.g., Multivariate Laplacian, Student’s t, Cauchy, … ) – Success of HT-PLDA indicates that i-vectors behave according to an ESD Step 1 Whitening
Step 2 Histogram warping
• Length normalization (LN):
– Avoids the need of an additional held-out set to estimate the distribution of evaluation i-vector lengths 9
Experimental setup • Parameterization: 60 MFCC <- (19 + energy) + + • UBM: Gender ind. 2048 mixtures full-cov GMM trained on telephone data from SRE04, 05 and 06
Conclusions • Identified mismatch induced by the i-vector extraction procedure as a major source of nonGaussian behavior (i.e., dataset shift) • Explored 2 non-linear transformation techniques to Gaussianize i-vectors • Boosted performance of G-PLDA for all operating points (as much as 50% in EER for male trials) • Performance of LN G-PLDA is as good as HT-PLDA with the advantage of simplicity and speed
14
Acknowledgments • Thanks to BUT for providing i-vectors and Carlos Vaquero for the HT-PLDA system • Thanks to Niko Brummer, Lukas Burget and Patrick Kenny for helpful discussions during preparation
• Thanks to Alan McCree and Ed De Villiers for comments after submission
... of Electrical & Computer Engineering. University of Maryland, College Park, MD, USA ... GOAL: Get the best of both worlds! â Keep the simple Gaussian model.
E-step -> Compute counts and centered first order suf. stats. ... E-step -> Posterior means and correlation matrices ... equations with right-hand side elements ...
G. Saha is with Department of E & ECE, Indian Institute of Technology,. Kharagpur 721302, India (e-mail: [email protected]). proposed MMSE spectral components estimation approaches using Laplacian or a special case of the gamma modeling of sp
Click here if your download doesn't start automatically. Page 1 of 1. speech enhancement theory and practice pdf. speech enhancement theory and practice pdf.
In [10], a DCT domain speech enhancement method is pro- posed based on ... where we want to get an estimate of X from a given obser- vation of Y. We split the ...
This paper was first presented at a conference on Disability and Equal Opportunity at. Bergen in 2006. .... or companies, rather than the tax payer. Those who ..... on what we might call liberal democratic, rather than strict egalitarian, grounds.
ECE/OPTI533 Digital Image Processing class notes 138 Dr. Robert A. Schowengerdt 2003. IMAGE ENHANCEMENT I (RADIOMETRIC). IMAGE DISPLAY. â¢. Input quantized image pixel values (integers):. Digital Number. (DN). â¢. Output quantized image pixel value
Nov 3, 2012 - obtained by considering the effect of feedback overhead on the total throughput of the MIMO IMAC model. I. INTRODUCTION. Interference ...
The human body, like the rest of the universe, is composed of matter and ... I can understand that the sort of training, the sort of physical experience that is given to .... end. In both guinea pigs and human beings these homeostatic mechanisms.
supported in part by an ACIAr scholarship. âAt the moment there is an IT boom in India and not many parents like their children to get into agricultural science,â ...
Keywords: Unidirectional cell, Standard cell layout, Floating metal. 1. .... As an example of nine-track cell architecture in Figure 5(a), standard cells are ...
blurry image without any reconfiguration. This technique is not count on significant variance on transmission or surface shading in the input image. This technique is independent on the users update or purchase expensive equipment either. The result
Apr 12, 2016 - and it intends to apply part of the proceeds of this loan to payments under the ... The Quetta Electric Supply Company Limited (QESCO, âthe ...
xD. A. W. â. â . +. â . = λ λ λ λ λ λ. (1). At the scene point x, the artificial light ..... R q y eqyfqyf. W Ï. (18) where q is the coordinate of support pixel centered around.
a working interconnected system of systems, they are not people-oriented, and they are ... and foreground trust both enhance security for devices and increase the under- standing of .... models should not assume what is 'best' for the user.
Nov 17, 2010 - variations in loudness of speech between different programs. 5'457'769 A ..... In an alternative implementation, the loudness esti mator 14 also ... receives an indication of loudness or signal energy for all segments and makes ...