An Evidence Framework For Bayesian Learning of Continuous-Density Hidden Markov Models Yu Zhang1 2 , Peng Liu2 , Jen-Tzung Chien3 , Frank Soong2 1 Shanghai
Jiao Tong University Research Asia 3 National Cheng Kung University 2 Microsoft
(Development in ICASSP conference, 2009)
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Outline
Motivation Evidence Framework CDHMMs with VB Experiment
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Outline
Motivation Evidence Framework CDHMMs with VB Experiment
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Motivation
Robust acoustic modeling is critical when collected training data is sparse, noisy and mismatched with test. Hidden Markov Models is the most commonly acoustic models. But the ill-posed conditions severely may hamper the trained HMMs to recognize test data robustly. In an evidence Bayesian framework, we can build a better regularized HMM with given finite data, hence more robust recognition performance.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Motivation
Robust acoustic modeling is critical when collected training data is sparse, noisy and mismatched with test. Hidden Markov Models is the most commonly acoustic models. But the ill-posed conditions severely may hamper the trained HMMs to recognize test data robustly. In an evidence Bayesian framework, we can build a better regularized HMM with given finite data, hence more robust recognition performance.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Motivation
Robust acoustic modeling is critical when collected training data is sparse, noisy and mismatched with test. Hidden Markov Models is the most commonly acoustic models. But the ill-posed conditions severely may hamper the trained HMMs to recognize test data robustly. In an evidence Bayesian framework, we can build a better regularized HMM with given finite data, hence more robust recognition performance.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Motivation
Robust acoustic modeling is critical when collected training data is sparse, noisy and mismatched with test. Hidden Markov Models is the most commonly acoustic models. But the ill-posed conditions severely may hamper the trained HMMs to recognize test data robustly. In an evidence Bayesian framework, we can build a better regularized HMM with given finite data, hence more robust recognition performance.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
In this paper, we √
Apply evidence framework to exponential family √ distribution estimation. Extend it to estimating CDHMMs with naturally built-in model uncertainty.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
In this paper, we √
Apply evidence framework to exponential family √ distribution estimation. Extend it to estimating CDHMMs with naturally built-in model uncertainty.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
In this paper, we √
Apply evidence framework to exponential family √ distribution estimation. Extend it to estimating CDHMMs with naturally built-in model uncertainty.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Outline
Motivation Evidence Framework CDHMMs with VB Experiment
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data
Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η
= arg max η
K Z Y i=1
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
p(Di |λi )p(λi |η)dλi
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data
Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η
= arg max η
K Z Y i=1
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
p(Di |λi )p(λi |η)dλi
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data
Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η
= arg max η
K Z Y i=1
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
p(Di |λi )p(λi |η)dλi
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data
Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η
= arg max η
K Z Y i=1
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
p(Di |λi )p(λi |η)dλi
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data
Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η
= arg max η
K Z Y i=1
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
p(Di |λi )p(λi |η)dλi
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Solution
Key idea: treat λi as hidden variable. E-Step: Q(η, η
old
)=
K Z X i=1
p(λi |Di , η old ) ln p(Di , λi |η)dλi
M-Step: obtain the solution of all models in the exponential family.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Solution
Key idea: treat λi as hidden variable. E-Step: Q(η, η
old
)=
K Z X i=1
p(λi |Di , η old ) ln p(Di , λi |η)dλi
M-Step: obtain the solution of all models in the exponential family.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Solution
Key idea: treat λi as hidden variable. E-Step: Q(η, η
old
)=
K Z X i=1
p(λi |Di , η old ) ln p(Di , λi |η)dλi
M-Step: obtain the solution of all models in the exponential family.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Fundamentals of Exponential Family Distributions Exponential Family p(xi |λi ) = h(xi )g(λi ) exp[λ⊤ i u(xi )] Sufficient statistics X
u(x)
x∈D
Conjugate prior p(λi |χ0 , ν0 ) = f (χ0 , ν0 )g(λi )ν0 exp(ν0 λ⊤ i χ0 )
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Fundamentals of Exponential Family Distributions Exponential Family p(xi |λi ) = h(xi )g(λi ) exp[λ⊤ i u(xi )] Sufficient statistics X
u(x)
x∈D
Conjugate prior p(λi |χ0 , ν0 ) = f (χ0 , ν0 )g(λi )ν0 exp(ν0 λ⊤ i χ0 )
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Fundamentals of Exponential Family Distributions Exponential Family p(xi |λi ) = h(xi )g(λi ) exp[λ⊤ i u(xi )] Sufficient statistics X
u(x)
x∈D
Conjugate prior p(λi |χ0 , ν0 ) = f (χ0 , ν0 )g(λi )ν0 exp(ν0 λ⊤ i χ0 )
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family Note that the Graphical Model Hyper-parameters η
Model Parameters λ1
λ2
λK
Data Set D1
D2
DK
Graph.00 Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family Note that the Graphical Model Hyper-parameters η
Model Parameters λ1
λ2
λK
Data Set D1
D2
DK
Graph.01 Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family Note that the Graphical Model Hyper-parameters η
Model Parameters λ1
λ2
λK
Data Set D1
D2
DK
Graph.02 Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family Note that the Graphical Model Preceding node η
Given parents node λ1
λ2
Succeeding node D1
D2
λK
DK Graph.03
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Independent
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family
Using the two properties With conjugate prior, the posterior can have the same functional form as its prior. Di is conditionally independent of ηi , given λi (Di ⊥ηi |λi ). we get ⇒ Q(η, ηold ) =
K Z X i=1
p(λi |˜ η old i ) ln p(λi |η)dλi + C
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family
Using the two properties With conjugate prior, the posterior can have the same functional form as its prior. Di is conditionally independent of ηi , given λi (Di ⊥ηi |λi ). we get ⇒ Q(η, ηold ) =
K Z X i=1
p(λi |˜ η old i ) ln p(λi |η)dλi + C
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family
Using the two properties With conjugate prior, the posterior can have the same functional form as its prior. Di is conditionally independent of ηi , given λi (Di ⊥ηi |λi ). we get ⇒ Q(η, ηold ) =
K Z X i=1
p(λi |˜ η old i ) ln p(λi |η)dλi + C
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family
Using the two properties With conjugate prior, the posterior can have the same functional form as its prior. Di is conditionally independent of ηi , given λi (Di ⊥ηi |λi ). we get ⇒ Q(η, ηold ) =
K Z X i=1
p(λi |˜ η old i ) ln p(λi |η)dλi + C
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family E-step ν˜i = ν + γi Pγi n=1 u(xi,n ) + νχ ˜i = χ ν˜i M-step hλ, ln[g(λ)]iηnew =
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
K 1 X hλ, ln[g(λ)]iη˜ old i K i=1
Outline Motivation Evidence Framework CDHMMs with VB Experiment
EM Empirical Bayesian Learning for the Exponential Family E-step ν˜i = ν + γi Pγi n=1 u(xi,n ) + νχ ˜i = χ ν˜i M-step hλ, ln[g(λ)]iηnew =
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
K 1 X hλ, ln[g(λ)]iη˜ old i K i=1
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Concavity Analysis
The auxiliary function Q(η, η old ) is concave ⇒ we can obtain its global optimum in the M-step. In general, the objective function F (the evidence) is not concave. F(η) = p(D1 , . . . , DK |η) P Good news: ∇2 F is proportional to i {cov η˜ i − covη } (Note: posterior is usually sharper than its prior)
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Concavity Analysis
The auxiliary function Q(η, η old ) is concave ⇒ we can obtain its global optimum in the M-step. In general, the objective function F (the evidence) is not concave. F(η) = p(D1 , . . . , DK |η) P Good news: ∇2 F is proportional to i {cov η˜ i − covη } (Note: posterior is usually sharper than its prior)
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Concavity Analysis
The auxiliary function Q(η, η old ) is concave ⇒ we can obtain its global optimum in the M-step. In general, the objective function F (the evidence) is not concave. F(η) = p(D1 , . . . , DK |η) P Good news: ∇2 F is proportional to i {cov η˜ i − covη } (Note: posterior is usually sharper than its prior)
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Outline
Motivation Evidence Framework CDHMMs with VB Experiment
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
An Introduction of Approximated Inference
In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.
Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
An Introduction of Approximated Inference
In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.
Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
An Introduction of Approximated Inference
In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.
Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
An Introduction of Approximated Inference
In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.
Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
An Introduction of Approximated Inference
In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.
Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)
We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)
We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)
We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)
We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)
We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
The Pseudo Code of Evidence Framework based Bayesian Training for CDHMMs iteration loop: variational E-step: conduct Baum-welch on the training set, by using expected log likelihoods instead of Gaussian probabilities, and collect statistics, γi , γi (o), γi (oo⊤ ) variational M-step: maximum evidence E-step: calculate η ˜old for all the CDHMM parameters i maximum evidence M-step: solve η new with the expectation equation while the evidence gap is larger than a threshold Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Optimization Procedure
Log Marginal likelihood Log Marginal likelihood
Log Marginal likelihood
KL(q ||p) KL(q ||p)
VBEM Step
KL(q ||p)
Hyper-parameter estimate New lower bound
New lower bound maximum evidence EM
Lower bound
VB Baum-Welch
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )
Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =
See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
M Y
m=1
α0m −1 ωim
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )
Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =
See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
M Y
m=1
α0m −1 ωim
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )
Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =
See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
M Y
m=1
α0m −1 ωim
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )
Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =
See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
M Y
m=1
α0m −1 ωim
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )
Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =
See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
M Y
m=1
α0m −1 ωim
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Outline
Motivation Evidence Framework CDHMMs with VB Experiment
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.
Training Different sized clean training Different sized multi-conditioned training.
Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Bayesian Predictive Classifier
˜ = argmaxs Bayesian Prediction s
R
p(oT |λs )p(λs |D)dλs
Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Bayesian Predictive Classifier
˜ = argmaxs Bayesian Prediction s
R
p(oT |λs )p(λs |D)dλs
Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Bayesian Predictive Classifier
˜ = argmaxs Bayesian Prediction s
R
p(oT |λs )p(λs |D)dλs
Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Bayesian Predictive Classifier
˜ = argmaxs Bayesian Prediction s
R
p(oT |λs )p(λs |D)dλs
Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Bayesian Predictive Classifier
˜ = argmaxs Bayesian Prediction s
R
p(oT |λs )p(λs |D)dλs
Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Experimental Results 70
90
65
85
60 80 word accuracy(%)
word accuracy(%)
55 50 45 40 evidence β0 = ν0 = 2 (best)
35 30
75 70 65 evidence β = ν =2
60
0
0
0
β = ν = 0.1
25
0
0
3
10 # training utterance
0
β0 = ν0 = 0.1 (best)
55
0
ML
ML 20 2 10
0
β = ν = 0.5
β = ν = 0.5
50 2 10
3
10 # training utterance
Figure: Recognition accuracy of Figure: Recognition accuracy of model trained with different sized model trained with different sized clean training data multi-conditional training data
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Conclusion
We apply the evidence framework, to CDHMM training, which automatically learns the priors and their posteriors from data. The evidence framework trained CDHMM has to better regularization, and higher recognition performance.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Conclusion
We apply the evidence framework, to CDHMM training, which automatically learns the priors and their posteriors from data. The evidence framework trained CDHMM has to better regularization, and higher recognition performance.
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Outline Motivation Evidence Framework CDHMMs with VB Experiment
Question?
Y.Zhang, P.Liu, J.-T.Chien, F.Soong
Thank You!
Thank You!
Thank You!
Thank You!
Thank You!
Thank You!
Thank You!
Thank You!
Thank You!
Thank You!