Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods edited by Joseph Keshet Samy Bengio

Contents Preface 1

Introduction Samy Bengio1 and Joseph Keshet2 1.1 The Traditional Approach to Speech Processing . . 1.2 Potential Problems of the Probabilistic Approach . 1.3 Support Vector Machines for Binary Classification 1.4 Outline . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

11 1 . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

I

Foundations

2

Theory and Practice of Support Vector Machines Optimization Shai Shalev-Shwartz and Nathan Srebo 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 SVM and L2 -Regularized Linear Prediction . . . . . . . . . . 2.2.1 Binary Classification and the Traditional SVM . . . . 2.2.2 More General Loss Functions . . . . . . . . . . . . . 2.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Kernels . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.5 Incorporating a Bias Term . . . . . . . . . . . . . . . 2.3 Optimization Accuracy from a Machine Learning Perspective . 2.4 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . 2.4.1 Subgradient calculus . . . . . . . . . . . . . . . . . . 2.4.2 Rate of convergence and stopping criteria . . . . . . . 2.5 Dual Decomposition Methods . . . . . . . . . . . . . . . . . 2.5.1 Duality . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

2 4 5 7 8

9 11 . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

11 12 12 13 13 14 15 16 18 20 21 22 23 26 26

From Binary Classification to Categorial Prediction 29 Koby Crammer 3.1 Multi Category Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Hypothesis Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.3

Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Combinatorial Loss Functions . . . . . . . . . . . . . . . . 3.4 Hinge Loss Functions . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 A Generalized Perceptron Algorithm . . . . . . . . . . . . . . . . . 3.6 A Generalized Passive-Aggressive Algorithm . . . . . . . . . . . . 3.6.1 Dual Formulation . . . . . . . . . . . . . . . . . . . . . . . 3.7 A Batch Formulation . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.1 Derivation of the Dual of the Passive-Aggressive Algorithm 3.9.2 Derivation of the Dual of the Batch Formulation . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

II 4

5

6

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

Acoustic Modeling

53

A Large Margin Algorithm for Forced Alignment Joseph Keshet1 , Shai Shalev-Shwartz2, Yoram Singer3 and Dan Chazan4 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Cost and Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 A Large Margin Approach for Forced Alignment . . . . . . . . . 4.5 An Iterative Algorithm . . . . . . . . . . . . . . . . . . . . . . . 4.6 Efficient Evaluation of the Alignment Function . . . . . . . . . . 4.7 Base Alignment Functions . . . . . . . . . . . . . . . . . . . . . 4.8 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Kernel Wrapper for Phoneme Sequence Recognition Joseph Keshet1 and Dan Chazan2 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Problem Setting . . . . . . . . . . . . . . . . . . . . . . . 5.3 Frame-based Phoneme Classifier . . . . . . . . . . . . . . 5.4 Kernel-based Iterative Algorithm for Phoneme Recognition 5.5 Non-Linear Feature Functions . . . . . . . . . . . . . . . 5.5.1 Acoustic Modeling . . . . . . . . . . . . . . . . . 5.5.2 Duration Modeling . . . . . . . . . . . . . . . . . 5.5.3 Transition Modeling . . . . . . . . . . . . . . . . 5.6 Preliminary Experimental Results . . . . . . . . . . . . . 5.7 Discussion: Can We Hope for Better Results? . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33 35 37 38 41 42 43 45 46 46 49 51

55 . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

56 56 57 58 59 64 66 68 69 70 71

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

71 72 73 73 77 77 79 80 80 81 82

Augmented Statistical Models: using Dynamic Kernels for Acoustic Models 85 Mark J.F. Gales 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 6.2 Temporal Correlation Modelling . . . . . . . . . . . . . . . . . . . . . . . . 87

6.3

Dynamic Kernels . . . . . . . . . . . . 6.3.1 Static and Dynamic Kernels . . 6.3.2 Generative Kernels . . . . . . . 6.3.3 Simple Example . . . . . . . . 6.4 Augmented Statistical Models . . . . . 6.4.1 Generative Augmented Models 6.4.2 Conditional Augmented Models 6.5 Experimental Results . . . . . . . . . . 6.6 Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . 7

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

Large Margin Training of Continuous Density Hidden Markov Models Fei Sha1 and Lawrence K. Saul2 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Maximum likelihood estimation . . . . . . . . . . . . . . . . 7.2.2 Conditional maximum likelihood . . . . . . . . . . . . . . . 7.2.3 Minimum classification error . . . . . . . . . . . . . . . . . . 7.3 Large margin training . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Discriminant function . . . . . . . . . . . . . . . . . . . . . 7.3.2 Margin constraints and Hamming distances . . . . . . . . . . 7.3.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Large margin training . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Comparison to CML and MCE . . . . . . . . . . . . . . . . . 7.4.3 Other variants . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

III Language Modeling 8

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

89 89 90 92 93 94 96 97 99 100 103

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

104 105 106 106 106 107 107 108 108 109 110 110 111 111 114 115

117

A Survey of Discriminative Language Modeling Approaches for Large Vocabulary Continuous Speech Recognition 119 Brian Roark 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 8.2 General Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 8.2.1 Training Data and the GEN Function . . . . . . . . . . . . . . . . . 122 8.2.2 Feature Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 8.2.3 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 129 8.3 Further Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 8.3.1 Novel Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 8.3.2 Novel Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8.3.3 Domain Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8.4 Summary and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 9

Large Margin Methods for Part of Speech Tagging Yasemin Altun 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Modeling Sequence Labeling . . . . . . . . . . . . . . . . . 9.2.1 Feature Representation . . . . . . . . . . . . . . . . 9.2.2 Empirical Risk Minimization . . . . . . . . . . . . . 9.2.3 Conditional Random Fields and Sequence Perceptron 9.3 Sequence Boosting . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Objective Function . . . . . . . . . . . . . . . . . . 9.3.2 Optimization Method . . . . . . . . . . . . . . . . . 9.4 Hidden Markov Support Vector Machines . . . . . . . . . . 9.4.1 Objective Function . . . . . . . . . . . . . . . . . . 9.4.2 Optimization Method . . . . . . . . . . . . . . . . . 9.4.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . 9.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Data and Features for Part of Speech Tagging . . . . 9.5.2 Results of Sequence AdaBoost . . . . . . . . . . . . 9.5.3 Results of HM-SVMs . . . . . . . . . . . . . . . . 9.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10 A Proposal of a Kernel-Based Algorithm for Large Speech Recognition Joseph Keshet 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . 10.2 Segmental Models and Hidden Markov Models . . 10.3 Kernel-Based Model . . . . . . . . . . . . . . . . 10.4 Large Margin Training . . . . . . . . . . . . . . . 10.5 Implementations Details . . . . . . . . . . . . . . 10.5.1 Iterative Algorithm . . . . . . . . . . . . . 10.5.2 Recognition Feature Functions . . . . . . . 10.5.3 The Decoder . . . . . . . . . . . . . . . . 10.5.4 Complexity . . . . . . . . . . . . . . . . . 10.6 Discussion . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

IV

141 . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

Vocabulary Continuous 161 . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

Applications

11 Discriminative Keyword Spotting David Grangier1 , Joseph Keshet2 and Samy Bengio3 11.1 Introduction . . . . . . . . . . . . . . . . . . . 11.2 Previous Work . . . . . . . . . . . . . . . . . . 11.3 Discriminative Keyword Spotting . . . . . . . 11.3.1 Problem Setting . . . . . . . . . . . . .

141 143 143 145 145 146 147 147 151 151 152 153 155 155 156 157 158 158

162 163 165 166 168 168 170 171 172 172 173

175 177 . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

178 179 182 182

11.3.2 Loss Function and Model Parameterization 11.3.3 An Iterative Training Algorithm . . . . . . 11.3.4 Analysis . . . . . . . . . . . . . . . . . . 11.4 Experiments and Results . . . . . . . . . . . . . . 11.4.1 The TIMIT Experiments . . . . . . . . . . 11.4.2 The WSJ Experiments . . . . . . . . . . . 11.5 Conclusions . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . 12 Kernel Based Text-Independnent Speaker Verification Johnny Mari´ethoz1 , Yves Grandvalet1 and Samy Bengio2 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . 12.2 Generative Approaches . . . . . . . . . . . . . . . 12.2.1 Rationale . . . . . . . . . . . . . . . . . . 12.2.2 Gaussian Mixture Models . . . . . . . . . 12.3 Discriminative Approaches . . . . . . . . . . . . . 12.3.1 Support Vector Machines . . . . . . . . . . 12.3.2 Kernels . . . . . . . . . . . . . . . . . . . 12.4 Benchmarking Methodology . . . . . . . . . . . . 12.4.1 Data Splitting for Speaker Verification . . . 12.4.2 Performance Measures . . . . . . . . . . . 12.4.3 NIST Data . . . . . . . . . . . . . . . . . 12.4.4 Pre-Processing . . . . . . . . . . . . . . . 12.5 Kernels for Speaker Verification . . . . . . . . . . 12.5.1 Mean Operator Sequence Kernels . . . . . 12.5.2 Fisher Kernels . . . . . . . . . . . . . . . 12.5.3 Beyond Fisher Kernels . . . . . . . . . . . 12.6 Parameter Sharing . . . . . . . . . . . . . . . . . . 12.6.1 Nuisance Attribute Projection . . . . . . . 12.6.2 Other Approaches . . . . . . . . . . . . . 12.7 Is the Margin Useful for this Problem? . . . . . . . 12.8 Comparing All Methods . . . . . . . . . . . . . . 12.9 Conclusion . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

184 186 187 189 190 192 194 195 197

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

13 Spectral Clustering for Speech Separation Francis R. Bach1 and Michael I. Jordan2 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 13.2 Spectral clustering and normalized cuts . . . . . . . . 13.2.1 Similarity matrices . . . . . . . . . . . . . . . 13.2.2 Normalized cuts . . . . . . . . . . . . . . . . 13.2.3 Spectral relaxation . . . . . . . . . . . . . . . 13.2.4 Rounding . . . . . . . . . . . . . . . . . . . . 13.2.5 Spectral clustering algorithms . . . . . . . . . 13.2.6 Variational formulation for the normalized cut . 13.3 Cost functions for learning the similarity matrix . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

198 199 199 200 201 202 202 203 203 204 205 205 206 206 207 212 215 215 217 218 219 221 221 225

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

225 227 227 228 229 230 232 233 234

13.3.1 Distance between partitions . . . . . . . . . . . 13.3.2 Cost functions as upper bounds . . . . . . . . . 13.3.3 Functions of eigensubspaces . . . . . . . . . . . 13.3.4 Empirical comparisons between cost functions . 13.4 Algorithms for learning the similarity matrix . . . . . . . 13.4.1 Learning algorithm . . . . . . . . . . . . . . . . 13.4.2 Related work . . . . . . . . . . . . . . . . . . . 13.4.3 Testing algorithm . . . . . . . . . . . . . . . . . 13.4.4 Handling very large similarity matrices . . . . . 13.4.5 Simulations on toy examples . . . . . . . . . . . 13.5 Speech separation as spectrogram segmentation . . . . . 13.5.1 Spectrogram . . . . . . . . . . . . . . . . . . . 13.5.2 Normalization and subsampling . . . . . . . . . 13.5.3 Generating training samples . . . . . . . . . . . 13.5.4 Features and grouping cues for speech separation 13.6 Spectral clustering for speech separation . . . . . . . . . 13.6.1 Basis similarity matrices . . . . . . . . . . . . . 13.6.2 Combination of similarity matrices . . . . . . . 13.6.3 Approximations of similarity matrices . . . . . . 13.6.4 Experiments . . . . . . . . . . . . . . . . . . . 13.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

234 235 235 238 240 240 240 241 241 243 243 245 246 246 246 248 248 248 248 249 252 253

List of Contributors Yasemin Altun Dept. Sch¨olkopf, Max Planck Institute for Biological Cybernetics [email protected]

Michael I. Jordan Computer Science Div. and Dept. of Statistics, University of California at Berkeley [email protected]

Francis Bach INRIA - Willow project, D´epartement d’Informatique, Ecole Normale Sup´erieure [email protected]

Joseph Keshet Idiap Research Institute, Martigny, Switzerland [email protected]

Samy Bengio Google Research Labs, Google Inc. [email protected] Dan Chazan Dept. of Electrical Engineering, The Technion Institute of Technology [email protected] Koby Crammer Dept. of Computer and Information Science, University of Pennsylvania [email protected] Mark Gales Dept. of Engineering, University of Cambridge [email protected] Yves Grandvalet Heudiasyc, Universit´e de Technologie de Compi`egne [email protected] David Grangier Dept. of Machine Learning, NEC Laboratories America, Inc. [email protected]

Johnny Mari´ethoz Idiap Research Institute, Martigny, Switzerland [email protected] Lawrence Saul Dept. of Computer Science and Engineering, University of California at San Diego [email protected] Brian Roark Dept. of Computer Science and Electrical Eng., OGI School of Science and Engineering [email protected] Fei Sha Computer Science Dept., University of Southern California [email protected] Shai Shalev-Shwartz Toyota Technological Institute at Chicago [email protected] Yoram Singer Google Research Labs, Google Inc. [email protected] Nathan Srebo Toyota Technological Institute at Chicago [email protected]

Foreword

Preface This is the first book dedicated to uniting research related to speech and speaker recognition based on the recent advances in large margin and kernel methods. The first part of the book presents theoretical and practical foundations of large margin and kernel methods, from support vector machines to large margin methods for structured learning. The second part of the book is dedicated to acoustic modeling of continuous speech recognizers, where the grounds for practical large margin sequence learning are set. The third part introduces large margin methods for discriminative language modeling. The last part of the book is dedicated to the application of keyword spotting, speaker verification and spectral clustering. The book is an important reference to researchers and practitioners in the field of modern speech and speaker recognition. The purpose of the book is twofold; first, to set the theoretical foundation of large margin and kernel methods relevant to speech recognition domain; second, to propose a practical guide on implementation of these methods to the speech recognition domain. The reader is presumed to have basic knowledge of large margin and kernel methods and of basic algorithms in speech and speaker recognition. August 2008

Joseph Keshet Samy Bengio

Automatic Speech and Speaker Recognition ... - Semantic Scholar

7 Large Margin Training of Continuous Density Hidden Markov Models ..... Dept. of Computer and Information Science, ... University of California at San Diego.

47KB Sizes 0 Downloads 468 Views

Recommend Documents

Speaker Recognition using Kernel-PCA and ... - Semantic Scholar
[11] D. A. Reynolds, T. F. Quatieri and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models,". Digital Signal Processing, Vol. 10, No.1-3, pp.

SPAM and full covariance for speech recognition. - Semantic Scholar
tied covariances [1], in which a number of full-rank matrices ... cal optimization package as originally used [3]. We also re- ... If we change Pj by a small amount ∆j , the ..... context-dependent states with ±2 phones of context and 150000.

Speaker Recognition in Two-Wire Test Sessions - Semantic Scholar
system described in [7]. ... virtual 2w training method described in Section 5. ... 7.0. (0.0328). 3.2. Feature warping. Feature warping is the process of normalizing ...

Speaker Recognition in Two-Wire Test Sessions - Semantic Scholar
techniques both in the frame domain and in the model domain. The proposed .... summed the two sides of the 4w conversations to get the corresponding 2w ...

Automatic speaker recognition using dynamic Bayesian network ...
This paper presents a novel approach to automatic speaker recognition using dynamic Bayesian network (DBN). DBNs have a precise and well-understand ...

Fast Speaker Adaptation - Semantic Scholar
Jun 18, 1998 - We can use deleted interpolation ( RJ94]) as a simple solution ..... This time, however, it is hard to nd an analytic solution that solves @R.

Fast Speaker Adaptation - Semantic Scholar
Jun 18, 1998 - where we use very small adaptation data, hence the name of fast adaptation. ... A n de r esoudre ces probl emes, le concept d'adaptation au ..... transform waveforms in the time domain into vectors of observation carrying.

Efficient Speaker Identification and Retrieval - Semantic Scholar
Department of Computer Science, Bar-Ilan University, Israel. 2. School of Electrical .... computed using the top-N speedup technique [3] (N=5) and divided by the ...

Efficient Speaker Identification and Retrieval - Semantic Scholar
identification framework and for efficient speaker retrieval. In ..... Phase two: rescoring using GMM-simulation (top-1). 0.05. 0.1. 0.2. 0.5. 1. 2. 5. 10. 20. 40. 2. 5. 10.

Approaches to Speech Recognition based on Speaker ...
best speech recognition submissions in its Jan- ... ity such as telephone type and background noise. ... of a single vector to represent each phone in context,.

Fast Speaker Adaptive Training for Speech Recognition
As we process each speaker we store the speaker-specific count and mean statistics in memory and then at the end of the speaker's data we directly increment ...

Application-Independent Evaluation of Speaker ... - Semantic Scholar
The proposed metric is constructed via analysis and generalization of cost-based .... Soft decisions in the form of binary probability distributions. }1. 0|). 1,{(.

Application-Independent Evaluation of Speaker ... - Semantic Scholar
In a typical pattern-recognition development cycle, the resources (data) .... b) To improve a given speaker detection system during its development cycle.

Approaches to Speech Recognition based on Speaker ...
best speech recognition submissions in its Jan- ... when combined with discriminative training. .... The covariances are shared between the classes, and ..... tions for hmm-based speech recognition. Computer. Speech and Language, 12:75–98 ...

A Study of Automatic Speech Recognition in Noisy ...
each class session, teachers wore a Samson AirLine 77 'True Diversity' UHF wireless headset unidirectional microphone that recorded their speech, with the headset .... Google because it has an easier to use application programming interface (API –

Automatic term categorization by extracting ... - Semantic Scholar
sists in adding a set of new and unknown terms to a predefined set of domains. In other .... tasks have been tested: Support Vector Machine (SVM), Naive Bayes.

Automatic, Efficient, Temporally-Coherent Video ... - Semantic Scholar
Enhancement for Large Scale Applications ..... perceived image contrast and observer preference data. The Journal of imaging ... using La*b* analysis. In Proc.

Challenges in Automatic Speech Recognition - Research at Google
Case Study:Google Search by Voice. Carries 25% of USA Google mobile search queries! ... speech-rich sub-domains such as lectures/talks in ... of modest size; 2-3 orders of magnitude more data is available multi-linguality built-in from start.

Automatic term categorization by extracting ... - Semantic Scholar
We selected 8 categories (soccer, music, location, computer, poli- tics, food, philosophy, medicine) and for each of them we searched for predefined gazetteers ...

A Privacy-compliant Fingerprint Recognition ... - Semantic Scholar
Information Technologies, Universit`a degli Studi di Siena, Siena, SI, 53100,. Italy. [email protected], (pierluigi.failla, riccardo.lazzeretti)@gmail.com. 2T. Bianchi ...

Approachability: How People Interpret Automatic ... - Semantic Scholar
Wendy Ju – Center for Design Research, Stanford University, Stanford CA USA, wendyju@stanford. ..... pixels, and were encoded using Apple Quicktime format.

Pattern Recognition Supervised dimensionality ... - Semantic Scholar
bAustralian National University, Canberra, ACT 0200, Australia ...... About the Author—HONGDONG LI obtained his Ph.D. degree from Zhejiang University, ...

Customized Cognitive State Recognition Using ... - Semantic Scholar
training examples that allow the algorithms to be tailored for each user. We propose a ..... to the user-specific training example database. The TL module is used ...