Analysis of i-vector Length Normalization in Speaker Recognition Systems Daniel Garcia-Romero Carol Espy-Wilson Department of Electrical & Computer Engineering University of Maryland, College Park, MD, USA 1

Introduction • Probabilistic generative models of i-vectors: – Gaussian-PLDA (G-PLDA) [Prince, 2007] • Simple and fast due to closed-form solutions

– Heavy-Tailed PLDA (HT-PLDA) [Kenny, 2010] • Superior performance -> empirical evidence of non-Gaussianity

• GOAL: Get the best of both worlds! – Keep the simple Gaussian model – Achieve performance equivalent to HT-PLDA • HOW? – Transform the i-vectors to reduce non-Gaussian behavior – Use G-PLDA for the model 2

Outline • Overview of the elements of the speaker recognition system relevant to this work • Identification of a major source of non-Gaussian behavior

• Propose nonlinear transformation of i-vectors to compensate it • Validate the ideas on cond. 5 of SRE10 evaluation • Conclusions

3

i-vector extractor (overview) Development data

ML + min DIV subspace MFCC extraction

MAP point estimate Alignment with Gaussians

4

i-vector extractor (details) Weighted Least Squares

Regularization

• i-vector is a “shrunk” version of the weighted least squares solution • The amount of shrinkage of each coordinate depends on the eigenvalues of Regularization path

5

Generative models of i-vectors • Ignore i-vector extractor and prescribe a gen. model • Simplified version of PLDA [Kenny, 2010]: Gaussian PLDA

Heavy-tailed PLDA +

• Hyper-params

using ML and min. DIV

• Development set should be close to evaluation set 6

Full recognition system Development data

DEVELOPMENT STAGE

ML + min DIV subspace i-vector extractor

Development data i-vectors:

PLDA training

EVALUATION STAGE Test 1 Test 2

i-vector extractor

PLDA scoring

Score 7

i-vector length analysis • i-vector extractor with min DIV -> i-vectors • Let , then with + SRE10 – eval tel data (C5) + DEV data: SRE04, 05, 06, Fisher and Switchboard

Dataset shift

• i-vec. extraction procedure -> mismatch dev and eval 8

i-vector transformation • Radial Gaussianization (RG) [Lyu et. al, 2009]: – Nonlinear transf. that Gaussianizes the family of Elliptically Symmetric Densities (ESD) (e.g., Multivariate Laplacian, Student’s t, Cauchy, … ) – Success of HT-PLDA indicates that i-vectors behave according to an ESD Step 1 Whitening

Step 2 Histogram warping

• Length normalization (LN):

– Avoids the need of an additional held-out set to estimate the distribution of evaluation i-vector lengths 9

Experimental setup • Parameterization: 60 MFCC <- (19 + energy) + + • UBM: Gender ind. 2048 mixtures full-cov GMM trained on telephone data from SRE04, 05 and 06

• i-vector extractor*: 400 dimensions – Gender-dependent: SRE04, 05, 06, SWB and Fisher

• PLDA models: (Same data as i-vec extractor w/o Fisher) – G-PLDA: 120 eigenvoices and full-cov residual – HT-PLDA: LDA 120 preprocessing -> 120 eigenvoices – NO score normalization

• EVAL DATA: C5 of SRE10-extended (i.e., tel data) * i-vectors provided by BUT

10

Effect of transformation in DOF Transformation type

Eigenvoices DOF

Residual DOF

Male

Female

Male

Female

Raw dev data

11.09

12.39

17.10

17.42

RG dev data

25.35

27.30

13.24

14.81

LN dev data

48.07

54.71

9.21

10.42

• ML point estimates: (warning -> may have a lot of uncertainty) – Consistent behavior between male and female – Both RG and LN increase the value of

and decrease

– Partially-HT model: eigenvoices have lighter tails and residual strong HT 11

Results I

• LN G-PLDA improves over G-PLDA for all operating points • LN G-PLDA as good as the more complex HT-PLDA

12

Results II System codes UN-UN G-PLDA UN-RG G-PLDA UN-LN G-PLDA LN-LN G-PLDA RG-RG G-PLDA UN-UN HT-PLDA LN-LN HT-PLDA RG-RG HT-PLDA

Male scores

Female scores

EER(%) minDCF

EER(%) minDCF

3.08 1.44 1.29 1.27 1.37 1.48 1.28 1.27

0.4193 0.3032 0.3084 0.3019 0.3066 0.3357 0.3036 0.3143

3.41 2.15 1.97 2.02 2.16 2.21 1.95 1.95

0.4008 0.3503 0.3511 0.3562 0.3393 0.3410 0.3297 0.3339 13

Conclusions • Identified mismatch induced by the i-vector extraction procedure as a major source of nonGaussian behavior (i.e., dataset shift) • Explored 2 non-linear transformation techniques to Gaussianize i-vectors • Boosted performance of G-PLDA for all operating points (as much as 50% in EER for male trials) • Performance of LN G-PLDA is as good as HT-PLDA with the advantage of simplicity and speed

14

Acknowledgments • Thanks to BUT for providing i-vectors and Carlos Vaquero for the HT-PLDA system • Thanks to Niko Brummer, Lukas Burget and Patrick Kenny for helpful discussions during preparation

• Thanks to Alan McCree and Ed De Villiers for comments after submission

15

Speech enhancement

... of Electrical & Computer Engineering. University of Maryland, College Park, MD, USA ... GOAL: Get the best of both worlds! – Keep the simple Gaussian model.

731KB Sizes 1 Downloads 202 Views

Recommend Documents

Speech enhancement
E-step -> Compute counts and centered first order suf. stats. ... E-step -> Posterior means and correlation matrices ... equations with right-hand side elements ...

Speech Enhancement by Marginal Statistical ...
G. Saha is with Department of E & ECE, Indian Institute of Technology,. Kharagpur 721302, India (e-mail: [email protected]). proposed MMSE spectral components estimation approaches using Laplacian or a special case of the gamma modeling of sp

speech enhancement theory and practice pdf
Click here if your download doesn't start automatically. Page 1 of 1. speech enhancement theory and practice pdf. speech enhancement theory and practice pdf.

pdf-0741\speech-enhancement-a-signal-subspace-perspective-by ...
... loading more pages. Retrying... pdf-0741\speech-enhancement-a-signal-subspace-persp ... jensen-mads-graesboll-christensen-jingdong-chen.pdf.

Speech Enhancement using Intra-frame Dependency in DCT Domain
In [10], a DCT domain speech enhancement method is pro- posed based on ... where we want to get an estimate of X from a given obser- vation of Y. We split the ...

disability, status enhancement, personal enhancement ...
This paper was first presented at a conference on Disability and Equal Opportunity at. Bergen in 2006. .... or companies, rather than the tax payer. Those who ..... on what we might call liberal democratic, rather than strict egalitarian, grounds.

Liquidity enhancement scheme - NSE
Nov 27, 2014 - NATIONAL STOCK EXCHANGE OF INDIA LIMITED. DEPARTMENT ... 80% of the trading time within the top 10 price points. • The minimum ...

notes7 Image Enhancement I
ECE/OPTI533 Digital Image Processing class notes 138 Dr. Robert A. Schowengerdt 2003. IMAGE ENHANCEMENT I (RADIOMETRIC). IMAGE DISPLAY. •. Input quantized image pixel values (integers):. Digital Number. (DN). •. Output quantized image pixel value

Enhancement of electronic transport and ...
Jun 15, 2007 - mable read-only memory), flash memories in computer applications and uncooled infrared imaging systems, became possible due to the ...

Limited Feedback and Sum-Rate Enhancement
Nov 3, 2012 - obtained by considering the effect of feedback overhead on the total throughput of the MIMO IMAC model. I. INTRODUCTION. Interference ...

Orthomolecular Enhancement of Human Development
The human body, like the rest of the universe, is composed of matter and ... I can understand that the sort of training, the sort of physical experience that is given to .... end. In both guinea pigs and human beings these homeostatic mechanisms.

The international nature of germplasm enhancement - ACIAR
supported in part by an ACIAr scholarship. “At the moment there is an IT boom in India and not many parents like their children to get into agricultural science,” ...

Routability enhancement through unidirectional standard cells with ...
Keywords: Unidirectional cell, Standard cell layout, Floating metal. 1. .... As an example of nine-track cell architecture in Figure 5(a), standard cells are ...

Interference Mitigation and Capacity Enhancement based on ...
Interference Mitigation and Capacity Enhancement ba ... Dynamic Frequency Reuse for Femtocell Networks.pdf. Interference Mitigation and Capacity ...

Underwater Image Enhancement Techniques: A Survey - International ...
blurry image without any reconfiguration. This technique is not count on significant variance on transmission or surface shading in the input image. This technique is independent on the users update or purchase expensive equipment either. The result

3096-PAK: Power Distribution Enhancement Investment Program ...
Apr 12, 2016 - and it intends to apply part of the proceeds of this loan to payments under the ... The Quetta Electric Supply Company Limited (QESCO, “the ...

UNDERWATER SCENE ENHANCEMENT USING ...
xD. A. W. ∈. ⋅. +. ⋅. = λ λ λ λ λ λ. (1). At the scene point x, the artificial light ..... R q y eqyfqyf. W σ. (18) where q is the coordinate of support pixel centered around.

SECURITY ENHANCEMENT WITH FOREGROUND ... - Anirban Basu
a working interconnected system of systems, they are not people-oriented, and they are ... and foreground trust both enhance security for devices and increase the under- standing of .... models should not assume what is 'best' for the user.

CASA Based Speech Separation for Robust Speech Recognition
National Laboratory on Machine Perception. Peking University, Beijing, China. {hanrq, zhaopei, gaoqin, zhangzp, wuhao, [email protected]}. Abstract.

Controlling loudness of speech in signals that contain speech and ...
Nov 17, 2010 - variations in loudness of speech between different programs. 5'457'769 A ..... In an alternative implementation, the loudness esti mator 14 also ... receives an indication of loudness or signal energy for all segments and makes ...