Minimum Phone error and I-Smoothing for improved Discriminative Training Dan Povey & Phil Woodland May 8th 2001

Cambridge University Engineering Department

IEEE ICASSP’2002

Povey & Woodland: Minimum Phone Error

Overview • Minimum Phone Error (MPE) – General introduction. – MPE objective function. – Comparison with other discriminative objective functions. • Lattice implementation of MPE. • Optimising the MPE criterion with the EB formulae. • Improving generalization: I-smoothing etc. • MPE and MMI results on Switchboard (hub5), up to 265 hours training. • Conclusions Cambridge University Engineering Department

IEEE ICASSP’2002

1

Povey & Woodland: Minimum Phone Error

Minimum Phone Error

• Minimum Phone Error (MPE) is a new criterion for discriminative criterion. • Can give better results than MMI. • CU-HTK submission for the 2002 Switchboard (hub5) evaluation will use MPE. • Training time and complexity of implemetation not much greater than MMIE.

Cambridge University Engineering Department

IEEE ICASSP’2002

2

Povey & Woodland: Minimum Phone Error

MPE Objective Function • Maximise the following function: R P κ κ X r |s) P (s) RawAccuracy(s) s pλ (OP FMPE(λ) = κ P (s)κ (O |s) p λ r s r

where λ are the HMM parameters, Or the speech data for file r, κ a probability scale and P (s) the language model probability pre-scaled by the normal scale factor. • RawAccuracy(s) is a measure of the number of phones correctly transcribed in sentence s. (correct phones in s − inserted phones in s). • Weighted average of RawAccuracy(s) over all s. • As κ → ∞, approaches phone error on data. Cambridge University Engineering Department

IEEE ICASSP’2002

3

Povey & Woodland: Minimum Phone Error

MPE & Other Discriminative Objective Functions • MPE function is an average (weighted by sentence likelihood) of a measure of phone accuracy: R P κ κ X r |s) P (s) RawAccuracy(s) s pλ (OP FMPE(λ) = κ P (s)κ |s) p (O λ r s r • Objective function in MMIE is the probability of the correct utterance given the speech data: R κ X pλ (Or |Msr ) P (sr )κ FMMIE(λ) = log P κ κ |M ) p (O P (s) λ r s s r=1 • MCE (Minimum Classification Error) objective function is a differentiable approximation to the sentence error rate. • MWE/MPE objective functions closest to what we want– the word error rate. Cambridge University Engineering Department

IEEE ICASSP’2002

4

Povey & Woodland: Minimum Phone Error

Lattice implementation of MPE • Implement in a lattice framework, for efficiency (as MMIE). • RawAccuracy(s), defined on sentence level, requires expensive dynamic programming. • Express RawAccuracy(s) as a sum of PhoneAcc(p) for all phones in the sentence:    1 if correct phone  0 if substitution PhoneAcc(p) = .   −1 if insertion • Calculating PhoneAcc(p) is still hard . • Use an approximation to PhoneAcc(p) based on time-alignment information. Cambridge University Engineering Department

IEEE ICASSP’2002

5

Povey & Woodland: Minimum Phone Error

Optimising the MPE criterion with EB • Use Extended Baum-Welch (EB) update as in MMI. • Use two sets of statistics (numerator and denominator) as in MMI. • Data from each phone q goes in numerator or denominator MPE (λ) statistic depending on sign of ∂F ∂ log p(q) . • EB is viewed as a gradient descent technique and can be shown to be a valid update for MPE. • Up to twice as many iterations of training as MMI to reach best error rates: 8 iterations of instead of 4.

Cambridge University Engineering Department

IEEE ICASSP’2002

6

Povey & Woodland: Minimum Phone Error

Improving generalisation using I-smoothing • H-criterion is hFMMIE(λ) + (1 − h)FML(λ) (Backoff between MMIE and MLE). • I-smoothing (for MMI) is like H-criterion except proportionof MMI (i.e., h) varies depending on the amount of data for each Gaussian. • In effect, it is like having τ points of extra MLE data for each Gaussian (do this by scaling up the normal MLE counts before updating Gaussian). Use say τ = 100. • For MMIE, I-smoothing gives an improvement on some tasks (no improvement over MMIE on others). • For MPE, I-smoothing makes a lot of difference; without I-smoothing, MPE gives little improvement. Cambridge University Engineering Department

IEEE ICASSP’2002

7

Povey & Woodland: Minimum Phone Error

Improving generalisation: other issues

• Use unigram language model in training (as for MMI). • Set the probability scale κ to the inverse of the normal language model scale factor (as for MMI). • Use phones not words to calculate accuracy– so MPE not MWE.

Cambridge University Engineering Department

IEEE ICASSP’2002

8

Povey & Woodland: Minimum Phone Error

Experimental setup on Switchboard.

• HTK large vocabulary recognition system • PLP cepstral features + first/second derivatives (39 dimensions in total). • Training on h5train00 (265 hours) or h5train00sub (68 hours) • HMM sets with tree-clustered triphone context-dependent states: 6165 HMM states, and 12 or 16 Gaussians/state. • Testing on eval98

Cambridge University Engineering Department

IEEE ICASSP’2002

9

Povey & Woodland: Minimum Phone Error

Results on Switchboard. Results trained on h5train00sub WER Train WER Test eval98 MLE 26.3 46.6 MMIE 18.6 44.3 MMIE+I-smoothing 19.7 43.8 MPE+I-smoothing 20.6 43.1 Results trained on h5train00sub WER Train WER Test eval98 MLE baseline 30.1 45.6 MMIE 23.2 41.8 MMIE+I-smoothing 22.2 41.4 MPE+I-smoothing 23.9 40.8

Cambridge University Engineering Department

IEEE ICASSP’2002

(68h train) Abs test improvement – 2.3% 2.8% 3.5% (68h train) Abs test improvement – 3.8% 4.2% 4.8%

10

Povey & Woodland: Minimum Phone Error

Conclusions.

• MPE training gives good improvements, up to about 5% absolute on Switchboard. – MPE currently being used in Cambridge University Hub5 evaluation system (2002). • MPE can be efficiently implemented using lattices. – Get around need for dynamic programming by approximating the phone accuracy. – Use EB formulae with same setup as MMI, for fast optimisation.

Cambridge University Engineering Department

IEEE ICASSP’2002

11

Minimum Phone error and I-Smoothing for improved ...

May 8, 2001 - Povey & Woodland: Minimum Phone Error ... Minimum Phone Error (MPE) is a new criterion .... HTK large vocabulary recognition system.

244KB Sizes 1 Downloads 254 Views

Recommend Documents

Minimum Phone Error and I-Smoothing for Improved ...
Optimising the MPE criterion: Extended Baum-Welch. • I-smoothing for ... where λ are the HMM parameters, Or the speech data for file r, κ a probability scale and P(s) the .... Smoothed approximation to phone error in word recognition system.

Minimum Hypothesis Phone Error as a Decoding ...
Aug 28, 2009 - Minimum Hypothesis Phone Error as a Decoding ... sentence error used as the evaluation criterion for the recognition system ... 33. 33.5. W. E. R. (%. ) o f th e h y p o th e tic a l re fe re n c e. 0. 20. 40. 60. 80. 100. N-best sente

Efficient Minimum Error Rate Training and Minimum Bayes-Risk ...
Aug 2, 2009 - Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, ..... operation and it is identical to the algorithm de-.

Lattice-based Minimum Error Rate Training for ... - Research at Google
Compared to N-best MERT, the number of ... and moderate BLEU score gains over N-best. MERT. ..... in-degree is zero are combined into a single source.

Efficient Minimum Error Rate Training and ... - Research at Google
39.2. 3.7. Lattice MBR. FSAMBR. 54.9. 65.2. 40.6. 39.5. 3.7. LatMBR. 54.8. 65.2. 40.7. 39.4. 0.2. Table 3: Lattice MBR for a phrase-based system. BLEU (%). Avg.

Characterization of minimum error linear coding with ...
[IM − (−IM + C−1)C]σ−2 δ VT. (20). = √ P. M. ΣsHT ED. −1. 2 x Cσ−2 δ VT. (21) where. C = (. IN +. P. M σ−2 δ VT V. ) −1. (22) and we used the Woodbury matrix identity in eq. 18. Under a minor assumption that the signal covari

Improved Algorithms for Orienteering and Related Problems
approximation for k-stroll and obtain a solution of length. 3OPT that visits Ω(k/ log2 k) nodes. Our algorithm for k- stroll is based on an algorithm for k-TSP for ...

Improved Algorithms for Orienteering and Related Problems
Abstract. In this paper we consider the orienteering problem in undirected and directed graphs and obtain improved approximation algorithms. The point to ...

minimum
May 30, 1997 - Webster's II NeW College Dictionary, Houghton Mif?in,. 1995, p. .... U.S. Patent. Oct. 28,2003. Sheet 10 0f 25. US RE38,292 E. Fl 6. I4. 200. 220.

An Improved Divide-and-Conquer Algorithm for Finding ...
Zhao et al. [24] proved that the approximation ratio is. 2 − 3/k for an odd k and 2 − (3k − 4)/(k2 − k) for an even k, if we compute a k-way cut of the graph by iteratively finding and deleting minimum 3-way cuts in the graph. Xiao et al. [23

Minimum educational qualification for open market recruitment.PDF ...
Page 2 of 2. Minimum educational qualification for open market recruitment.PDF. Minimum educational qualification for open market recruitment.PDF. Open.

PhoneNet- a Phone-to-Phone Network for Group Communication ...
PhoneNet- a Phone-to-Phone Network for Group Communication within an Administrative Domain.pdf. PhoneNet- a Phone-to-Phone Network for Group ...