Minimum Phone error and I-Smoothing for improved Discriminative Training Dan Povey & Phil Woodland May 8th 2001
Cambridge University Engineering Department
IEEE ICASSP’2002
Povey & Woodland: Minimum Phone Error
Overview • Minimum Phone Error (MPE) – General introduction. – MPE objective function. – Comparison with other discriminative objective functions. • Lattice implementation of MPE. • Optimising the MPE criterion with the EB formulae. • Improving generalization: I-smoothing etc. • MPE and MMI results on Switchboard (hub5), up to 265 hours training. • Conclusions Cambridge University Engineering Department
IEEE ICASSP’2002
1
Povey & Woodland: Minimum Phone Error
Minimum Phone Error
• Minimum Phone Error (MPE) is a new criterion for discriminative criterion. • Can give better results than MMI. • CU-HTK submission for the 2002 Switchboard (hub5) evaluation will use MPE. • Training time and complexity of implemetation not much greater than MMIE.
Cambridge University Engineering Department
IEEE ICASSP’2002
2
Povey & Woodland: Minimum Phone Error
MPE Objective Function • Maximise the following function: R P κ κ X r |s) P (s) RawAccuracy(s) s pλ (OP FMPE(λ) = κ P (s)κ (O |s) p λ r s r
where λ are the HMM parameters, Or the speech data for file r, κ a probability scale and P (s) the language model probability pre-scaled by the normal scale factor. • RawAccuracy(s) is a measure of the number of phones correctly transcribed in sentence s. (correct phones in s − inserted phones in s). • Weighted average of RawAccuracy(s) over all s. • As κ → ∞, approaches phone error on data. Cambridge University Engineering Department
IEEE ICASSP’2002
3
Povey & Woodland: Minimum Phone Error
MPE & Other Discriminative Objective Functions • MPE function is an average (weighted by sentence likelihood) of a measure of phone accuracy: R P κ κ X r |s) P (s) RawAccuracy(s) s pλ (OP FMPE(λ) = κ P (s)κ |s) p (O λ r s r • Objective function in MMIE is the probability of the correct utterance given the speech data: R κ X pλ (Or |Msr ) P (sr )κ FMMIE(λ) = log P κ κ |M ) p (O P (s) λ r s s r=1 • MCE (Minimum Classification Error) objective function is a differentiable approximation to the sentence error rate. • MWE/MPE objective functions closest to what we want– the word error rate. Cambridge University Engineering Department
IEEE ICASSP’2002
4
Povey & Woodland: Minimum Phone Error
Lattice implementation of MPE • Implement in a lattice framework, for efficiency (as MMIE). • RawAccuracy(s), defined on sentence level, requires expensive dynamic programming. • Express RawAccuracy(s) as a sum of PhoneAcc(p) for all phones in the sentence: 1 if correct phone 0 if substitution PhoneAcc(p) = . −1 if insertion • Calculating PhoneAcc(p) is still hard . • Use an approximation to PhoneAcc(p) based on time-alignment information. Cambridge University Engineering Department
IEEE ICASSP’2002
5
Povey & Woodland: Minimum Phone Error
Optimising the MPE criterion with EB • Use Extended Baum-Welch (EB) update as in MMI. • Use two sets of statistics (numerator and denominator) as in MMI. • Data from each phone q goes in numerator or denominator MPE (λ) statistic depending on sign of ∂F ∂ log p(q) . • EB is viewed as a gradient descent technique and can be shown to be a valid update for MPE. • Up to twice as many iterations of training as MMI to reach best error rates: 8 iterations of instead of 4.
Cambridge University Engineering Department
IEEE ICASSP’2002
6
Povey & Woodland: Minimum Phone Error
Improving generalisation using I-smoothing • H-criterion is hFMMIE(λ) + (1 − h)FML(λ) (Backoff between MMIE and MLE). • I-smoothing (for MMI) is like H-criterion except proportionof MMI (i.e., h) varies depending on the amount of data for each Gaussian. • In effect, it is like having τ points of extra MLE data for each Gaussian (do this by scaling up the normal MLE counts before updating Gaussian). Use say τ = 100. • For MMIE, I-smoothing gives an improvement on some tasks (no improvement over MMIE on others). • For MPE, I-smoothing makes a lot of difference; without I-smoothing, MPE gives little improvement. Cambridge University Engineering Department
IEEE ICASSP’2002
7
Povey & Woodland: Minimum Phone Error
Improving generalisation: other issues
• Use unigram language model in training (as for MMI). • Set the probability scale κ to the inverse of the normal language model scale factor (as for MMI). • Use phones not words to calculate accuracy– so MPE not MWE.
Cambridge University Engineering Department
IEEE ICASSP’2002
8
Povey & Woodland: Minimum Phone Error
Experimental setup on Switchboard.
• HTK large vocabulary recognition system • PLP cepstral features + first/second derivatives (39 dimensions in total). • Training on h5train00 (265 hours) or h5train00sub (68 hours) • HMM sets with tree-clustered triphone context-dependent states: 6165 HMM states, and 12 or 16 Gaussians/state. • Testing on eval98
Cambridge University Engineering Department
IEEE ICASSP’2002
9
Povey & Woodland: Minimum Phone Error
Results on Switchboard. Results trained on h5train00sub WER Train WER Test eval98 MLE 26.3 46.6 MMIE 18.6 44.3 MMIE+I-smoothing 19.7 43.8 MPE+I-smoothing 20.6 43.1 Results trained on h5train00sub WER Train WER Test eval98 MLE baseline 30.1 45.6 MMIE 23.2 41.8 MMIE+I-smoothing 22.2 41.4 MPE+I-smoothing 23.9 40.8
Cambridge University Engineering Department
IEEE ICASSP’2002
(68h train) Abs test improvement – 2.3% 2.8% 3.5% (68h train) Abs test improvement – 3.8% 4.2% 4.8%
10
Povey & Woodland: Minimum Phone Error
Conclusions.
• MPE training gives good improvements, up to about 5% absolute on Switchboard. – MPE currently being used in Cambridge University Hub5 evaluation system (2002). • MPE can be efficiently implemented using lattices. – Get around need for dynamic programming by approximating the phone accuracy. – Use EB formulae with same setup as MMI, for fast optimisation.
Optimising the MPE criterion: Extended Baum-Welch. ⢠I-smoothing for ... where λ are the HMM parameters, Or the speech data for file r, κ a probability scale and P(s) the .... Smoothed approximation to phone error in word recognition system.
Aug 28, 2009 - Minimum Hypothesis Phone Error as a Decoding ... sentence error used as the evaluation criterion for the recognition system ... 33. 33.5. W. E. R. (%. ) o f th e h y p o th e tic a l re fe re n c e. 0. 20. 40. 60. 80. 100. N-best sente
Aug 2, 2009 - Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, ..... operation and it is identical to the algorithm de-.
[IM â (âIM + Câ1)C]Ïâ2 δ VT. (20). = â P. M. ΣsHT ED. â1. 2 x CÏâ2 δ VT. (21) where. C = (. IN +. P. M Ïâ2 δ VT V. ) â1. (22) and we used the Woodbury matrix identity in eq. 18. Under a minor assumption that the signal covari
Abstract. In this paper we consider the orienteering problem in undirected and directed graphs and obtain improved approximation algorithms. The point to ...
May 30, 1997 - Webster's II NeW College Dictionary, Houghton Mif?in,. 1995, p. .... U.S. Patent. Oct. 28,2003. Sheet 10 0f 25. US RE38,292 E. Fl 6. I4. 200. 220.
Zhao et al. [24] proved that the approximation ratio is. 2 â 3/k for an odd k and 2 â (3k â 4)/(k2 â k) for an even k, if we compute a k-way cut of the graph by iteratively finding and deleting minimum 3-way cuts in the graph. Xiao et al. [23
Page 2 of 2. Minimum educational qualification for open market recruitment.PDF. Minimum educational qualification for open market recruitment.PDF. Open.