Joint Factor Analysis for Speaker Recognition reinterpreted as Signal Coding using Overcomplete Dictionaries Daniel Garcia-Romero Carol Espy-Wilson Department of Electrical & Computer Engineering University of Maryland, College Park, MD, USA 1

Outline • Introduce JFA configuration and notation

• Present an alternative perspective – Link JFA with Signal Coding using Overcomplete Dictionary – Discuss algorithmic differences

• Experimental validation • Remarks about cross-pollination opportunities

2

JFA configuration • Based on point estimates

• Hyperparameters: –

and

from UBM and fixed

– Independent training of subspaces

by ML

• Speaker models: – MAP estimation with one EM iteration

• Scoring: – Linear scoring

3

JFA: Model training (1) • Given observed vectors

and assuming:

Speaker supervector

with and

• Obtain the MAP speaker model E-step -> Compute counts and centered first order suf. stats. and M-step -> Minimize surrogate objective with 4

JFA: Model training (2)

Feature extraction

Solve for

Data alignment

E-step

M-step

5

JFA: Hyperparameter estimation • Given utterances and an initial use EM to solve for the ML estimate [Kenny et al., 2008]: E-step -> Posterior means and correlation matrices and M-step -> For -dim subset of rows for mixtures

If then solve independent linear systems of equations with right-hand side elements 6

JFA: Scoring • Linear scoring [Glembek et al., 2009]: Speaker model ->

Test utterance -> Score ->

7

Definitions • Overcomplete dictionary [Rubinstein et al., 2010]: – Full row-rank matrix with more columns than rows – Analytical (i.e., unions of FFT, DCT, Wavelets) or data driven (learned from data)

– Applications: • Compression, denoising, source separation, face recognition

• Signal Coding (SC): – Given a signal and a dictionary , obtain an optimal encoding according to a given objective

8

Signal generation (E-step)

Feature extraction

Data alignment

• Utterance represented by fixed-length vector and a weighting matrix 9

Signal Coding (Ridge regression)

10

Dictionary learning • Given utterances

solve:

• Block-coordinate descent [Bertsekas, 1999]: – Alternating optimization Signal coding (SC) Dictionary Update (DU) – SC is performed keeping fixed – DU is performed by keeping the fixed 11

DU: Algorithmic opportunity • Keeping the coefficients

fixed, the DU step:

• Comparing with JFA ML estimation: E-step -> Posterior means and correlation matrices and M-step -> For -dim subset of rows for mixtures

• No explicit matrix inversions in SC step -> 2 times faster 12

Hybrid approach

JFA ->

DU ->

Hybrid -> is the average weighting matrix from training utterances 13

Scoring • Notions of model and test segments are blurred – Both are treated as signals to be encoded on dictionary

• Given two utterances

and

1. Signal coding ->

and

2. Compensation -> zero coeffs from

->

3. Similarity computation: (cosine similarity in SV space)

where candidates for

are

and 14

Experimental setup • SWB-I database: – 520 speakers balanced in gender with 4856 speech files – Telephone speech (approx. 70% elec. 30% carb. button) – Two balanced partitions P1 and P2

• Parameterization*: – 38 MFCC (19 + Delta) band limited 300-3400 Hz every 10ms with 20ms Hamming window

• UBM: 2048 Mixture GMM trained on P2 data • Simple dictionary (eigenchannel configuration) also learned from P2 -> * Feature extraction and UBM training done with MIT-LL software

15

Analysis of DL procedure • Evaluate the distance between subspaces: – Use projection distance –

are orthonormal basis for the subspaces

16

Analysis of DL procedure • Examine recognition accuracy: – Closed-set identification on P1 • 2408 utterances from 260 speakers -> 33,866 trials • Model and test utterances encoded and compensated • Cosine similarity score with Dimension 128 64 32

JFA 95.0% 94.5% 93.3%

as the metric Hybrid 94.9% 94.5% 93.3%

DU 94.9% 94.5% 93.3%

– No apparent degradation from either alternative – How these results hold for reduce amounts of data will be studied in the near future 17

Analysis of encoding and scoring (I) • Model and test utterances encoded same way • Cosine similarity score

18

Analysis of encoding and scoring (II) • Verification experiments: – Leave-one-out on 2408 utterances from P1 • 33,866 target trials and 5,764,598 non-target Mixed results for SRE 2010 • Only U and D: • Slightly better treat model and test utterances the same way • Full JFA with U, V and D: • Mixed results, some tasks better encode both others

19

Discussion (Speculation) • Many public resources for dictionary learning: – K-SVD and multiple variations, FOCUSS-DL, MOD

• Discriminatively trained dictionaries: – In Proc CVPR, [Miral et al., 2008] – Emphasize the fact that the ultimate goal is discrimination not representation

• Sparsity inducing priors: (i.e., Laplacian prior) – LASSO, [Tibshirani, 1996] • Reduced amount of data <-> Image occlusion then L1 regularization has proved extremely effective [Wright et al., 2008] in PAMI 20

Conclusions • Different perspective on JFA

• Algorithmic suggestion for ML subspace training • Explored different scorings and metrics • Opportunities for cross-polination

21

Acknowledgments • Special thanks to MIT-LL for binaries to parametrize data and compute UBM

22

Sparse Coding (SC)

23

SRE 2010

24

Speech enhancement

E-step -> Compute counts and centered first order suf. stats. ... E-step -> Posterior means and correlation matrices ... equations with right-hand side elements ...

1MB Sizes 2 Downloads 218 Views

Recommend Documents

Speech enhancement
... of Electrical & Computer Engineering. University of Maryland, College Park, MD, USA ... GOAL: Get the best of both worlds! – Keep the simple Gaussian model.

Speech Enhancement by Marginal Statistical ...
G. Saha is with Department of E & ECE, Indian Institute of Technology,. Kharagpur 721302, India (e-mail: [email protected]). proposed MMSE spectral components estimation approaches using Laplacian or a special case of the gamma modeling of sp

speech enhancement theory and practice pdf
Click here if your download doesn't start automatically. Page 1 of 1. speech enhancement theory and practice pdf. speech enhancement theory and practice pdf.

pdf-0741\speech-enhancement-a-signal-subspace-perspective-by ...
... loading more pages. Retrying... pdf-0741\speech-enhancement-a-signal-subspace-persp ... jensen-mads-graesboll-christensen-jingdong-chen.pdf.

Speech Enhancement using Intra-frame Dependency in DCT Domain
In [10], a DCT domain speech enhancement method is pro- posed based on ... where we want to get an estimate of X from a given obser- vation of Y. We split the ...

disability, status enhancement, personal enhancement ...
This paper was first presented at a conference on Disability and Equal Opportunity at. Bergen in 2006. .... or companies, rather than the tax payer. Those who ..... on what we might call liberal democratic, rather than strict egalitarian, grounds.

Liquidity enhancement scheme - NSE
Nov 27, 2014 - NATIONAL STOCK EXCHANGE OF INDIA LIMITED. DEPARTMENT ... 80% of the trading time within the top 10 price points. • The minimum ...

notes7 Image Enhancement I
ECE/OPTI533 Digital Image Processing class notes 138 Dr. Robert A. Schowengerdt 2003. IMAGE ENHANCEMENT I (RADIOMETRIC). IMAGE DISPLAY. •. Input quantized image pixel values (integers):. Digital Number. (DN). •. Output quantized image pixel value

Enhancement of electronic transport and ...
Jun 15, 2007 - mable read-only memory), flash memories in computer applications and uncooled infrared imaging systems, became possible due to the ...

Limited Feedback and Sum-Rate Enhancement
Nov 3, 2012 - obtained by considering the effect of feedback overhead on the total throughput of the MIMO IMAC model. I. INTRODUCTION. Interference ...

Orthomolecular Enhancement of Human Development
The human body, like the rest of the universe, is composed of matter and ... I can understand that the sort of training, the sort of physical experience that is given to .... end. In both guinea pigs and human beings these homeostatic mechanisms.

The international nature of germplasm enhancement - ACIAR
supported in part by an ACIAr scholarship. “At the moment there is an IT boom in India and not many parents like their children to get into agricultural science,” ...

Routability enhancement through unidirectional standard cells with ...
Keywords: Unidirectional cell, Standard cell layout, Floating metal. 1. .... As an example of nine-track cell architecture in Figure 5(a), standard cells are ...

Interference Mitigation and Capacity Enhancement based on ...
Interference Mitigation and Capacity Enhancement ba ... Dynamic Frequency Reuse for Femtocell Networks.pdf. Interference Mitigation and Capacity ...

Underwater Image Enhancement Techniques: A Survey - International ...
blurry image without any reconfiguration. This technique is not count on significant variance on transmission or surface shading in the input image. This technique is independent on the users update or purchase expensive equipment either. The result

3096-PAK: Power Distribution Enhancement Investment Program ...
Apr 12, 2016 - and it intends to apply part of the proceeds of this loan to payments under the ... The Quetta Electric Supply Company Limited (QESCO, “the ...

UNDERWATER SCENE ENHANCEMENT USING ...
xD. A. W. ∈. ⋅. +. ⋅. = λ λ λ λ λ λ. (1). At the scene point x, the artificial light ..... R q y eqyfqyf. W σ. (18) where q is the coordinate of support pixel centered around.

SECURITY ENHANCEMENT WITH FOREGROUND ... - Anirban Basu
a working interconnected system of systems, they are not people-oriented, and they are ... and foreground trust both enhance security for devices and increase the under- standing of .... models should not assume what is 'best' for the user.

CASA Based Speech Separation for Robust Speech Recognition
National Laboratory on Machine Perception. Peking University, Beijing, China. {hanrq, zhaopei, gaoqin, zhangzp, wuhao, [email protected]}. Abstract.

Controlling loudness of speech in signals that contain speech and ...
Nov 17, 2010 - variations in loudness of speech between different programs. 5'457'769 A ..... In an alternative implementation, the loudness esti mator 14 also ... receives an indication of loudness or signal energy for all segments and makes ...