An Evidence Framework For Bayesian Learning of Continuous-Density Hidden Markov Models Yu Zhang1 2 , Peng Liu2 , Jen-Tzung Chien3 , Frank Soong2 1 Shanghai

Jiao Tong University Research Asia 3 National Cheng Kung University 2 Microsoft

(Development in ICASSP conference, 2009)

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Outline

Motivation Evidence Framework CDHMMs with VB Experiment

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Outline

Motivation Evidence Framework CDHMMs with VB Experiment

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Motivation

Robust acoustic modeling is critical when collected training data is sparse, noisy and mismatched with test. Hidden Markov Models is the most commonly acoustic models. But the ill-posed conditions severely may hamper the trained HMMs to recognize test data robustly. In an evidence Bayesian framework, we can build a better regularized HMM with given finite data, hence more robust recognition performance.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Motivation

Robust acoustic modeling is critical when collected training data is sparse, noisy and mismatched with test. Hidden Markov Models is the most commonly acoustic models. But the ill-posed conditions severely may hamper the trained HMMs to recognize test data robustly. In an evidence Bayesian framework, we can build a better regularized HMM with given finite data, hence more robust recognition performance.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Motivation

Robust acoustic modeling is critical when collected training data is sparse, noisy and mismatched with test. Hidden Markov Models is the most commonly acoustic models. But the ill-posed conditions severely may hamper the trained HMMs to recognize test data robustly. In an evidence Bayesian framework, we can build a better regularized HMM with given finite data, hence more robust recognition performance.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Motivation

Robust acoustic modeling is critical when collected training data is sparse, noisy and mismatched with test. Hidden Markov Models is the most commonly acoustic models. But the ill-posed conditions severely may hamper the trained HMMs to recognize test data robustly. In an evidence Bayesian framework, we can build a better regularized HMM with given finite data, hence more robust recognition performance.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

In this paper, we √

Apply evidence framework to exponential family √ distribution estimation. Extend it to estimating CDHMMs with naturally built-in model uncertainty.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

In this paper, we √

Apply evidence framework to exponential family √ distribution estimation. Extend it to estimating CDHMMs with naturally built-in model uncertainty.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

In this paper, we √

Apply evidence framework to exponential family √ distribution estimation. Extend it to estimating CDHMMs with naturally built-in model uncertainty.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Outline

Motivation Evidence Framework CDHMMs with VB Experiment

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data

Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η

= arg max η

K Z Y i=1

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

p(Di |λi )p(λi |η)dλi

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data

Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η

= arg max η

K Z Y i=1

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

p(Di |λi )p(λi |η)dλi

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data

Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η

= arg max η

K Z Y i=1

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

p(Di |λi )p(λi |η)dλi

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data

Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η

= arg max η

K Z Y i=1

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

p(Di |λi )p(λi |η)dλi

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Evidence Framework Notations η: hyper-parameter of the model {λi }: distribution parameters {Di }: set of training data

Model Evidence as the Objective Function ˆ = arg max p(D1 , . . . , DK |η) η η

= arg max η

K Z Y i=1

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

p(Di |λi )p(λi |η)dλi

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Solution

Key idea: treat λi as hidden variable. E-Step: Q(η, η

old

)=

K Z X i=1

p(λi |Di , η old ) ln p(Di , λi |η)dλi

M-Step: obtain the solution of all models in the exponential family.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Solution

Key idea: treat λi as hidden variable. E-Step: Q(η, η

old

)=

K Z X i=1

p(λi |Di , η old ) ln p(Di , λi |η)dλi

M-Step: obtain the solution of all models in the exponential family.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Solution

Key idea: treat λi as hidden variable. E-Step: Q(η, η

old

)=

K Z X i=1

p(λi |Di , η old ) ln p(Di , λi |η)dλi

M-Step: obtain the solution of all models in the exponential family.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Fundamentals of Exponential Family Distributions Exponential Family p(xi |λi ) = h(xi )g(λi ) exp[λ⊤ i u(xi )] Sufficient statistics X

u(x)

x∈D

Conjugate prior p(λi |χ0 , ν0 ) = f (χ0 , ν0 )g(λi )ν0 exp(ν0 λ⊤ i χ0 )

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Fundamentals of Exponential Family Distributions Exponential Family p(xi |λi ) = h(xi )g(λi ) exp[λ⊤ i u(xi )] Sufficient statistics X

u(x)

x∈D

Conjugate prior p(λi |χ0 , ν0 ) = f (χ0 , ν0 )g(λi )ν0 exp(ν0 λ⊤ i χ0 )

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Fundamentals of Exponential Family Distributions Exponential Family p(xi |λi ) = h(xi )g(λi ) exp[λ⊤ i u(xi )] Sufficient statistics X

u(x)

x∈D

Conjugate prior p(λi |χ0 , ν0 ) = f (χ0 , ν0 )g(λi )ν0 exp(ν0 λ⊤ i χ0 )

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family Note that the Graphical Model Hyper-parameters η

Model Parameters λ1

λ2

λK

Data Set D1

D2

DK

Graph.00 Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family Note that the Graphical Model Hyper-parameters η

Model Parameters λ1

λ2

λK

Data Set D1

D2

DK

Graph.01 Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family Note that the Graphical Model Hyper-parameters η

Model Parameters λ1

λ2

λK

Data Set D1

D2

DK

Graph.02 Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family Note that the Graphical Model Preceding node η

Given parents node λ1

λ2

Succeeding node D1

D2

λK

DK Graph.03

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Independent

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family

Using the two properties With conjugate prior, the posterior can have the same functional form as its prior. Di is conditionally independent of ηi , given λi (Di ⊥ηi |λi ). we get ⇒ Q(η, ηold ) =

K Z X i=1

p(λi |˜ η old i ) ln p(λi |η)dλi + C

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family

Using the two properties With conjugate prior, the posterior can have the same functional form as its prior. Di is conditionally independent of ηi , given λi (Di ⊥ηi |λi ). we get ⇒ Q(η, ηold ) =

K Z X i=1

p(λi |˜ η old i ) ln p(λi |η)dλi + C

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family

Using the two properties With conjugate prior, the posterior can have the same functional form as its prior. Di is conditionally independent of ηi , given λi (Di ⊥ηi |λi ). we get ⇒ Q(η, ηold ) =

K Z X i=1

p(λi |˜ η old i ) ln p(λi |η)dλi + C

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family

Using the two properties With conjugate prior, the posterior can have the same functional form as its prior. Di is conditionally independent of ηi , given λi (Di ⊥ηi |λi ). we get ⇒ Q(η, ηold ) =

K Z X i=1

p(λi |˜ η old i ) ln p(λi |η)dλi + C

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family E-step ν˜i = ν + γi Pγi n=1 u(xi,n ) + νχ ˜i = χ ν˜i M-step hλ, ln[g(λ)]iηnew =

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

K 1 X hλ, ln[g(λ)]iη˜ old i K i=1

Outline Motivation Evidence Framework CDHMMs with VB Experiment

EM Empirical Bayesian Learning for the Exponential Family E-step ν˜i = ν + γi Pγi n=1 u(xi,n ) + νχ ˜i = χ ν˜i M-step hλ, ln[g(λ)]iηnew =

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

K 1 X hλ, ln[g(λ)]iη˜ old i K i=1

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Concavity Analysis

The auxiliary function Q(η, η old ) is concave ⇒ we can obtain its global optimum in the M-step. In general, the objective function F (the evidence) is not concave. F(η) = p(D1 , . . . , DK |η) P Good news: ∇2 F is proportional to i {cov η˜ i − covη } (Note: posterior is usually sharper than its prior)

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Concavity Analysis

The auxiliary function Q(η, η old ) is concave ⇒ we can obtain its global optimum in the M-step. In general, the objective function F (the evidence) is not concave. F(η) = p(D1 , . . . , DK |η) P Good news: ∇2 F is proportional to i {cov η˜ i − covη } (Note: posterior is usually sharper than its prior)

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Concavity Analysis

The auxiliary function Q(η, η old ) is concave ⇒ we can obtain its global optimum in the M-step. In general, the objective function F (the evidence) is not concave. F(η) = p(D1 , . . . , DK |η) P Good news: ∇2 F is proportional to i {cov η˜ i − covη } (Note: posterior is usually sharper than its prior)

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Outline

Motivation Evidence Framework CDHMMs with VB Experiment

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

An Introduction of Approximated Inference

In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.

Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

An Introduction of Approximated Inference

In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.

Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

An Introduction of Approximated Inference

In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.

Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

An Introduction of Approximated Inference

In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.

Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

An Introduction of Approximated Inference

In some cases, we could hardly evaluate this joint posterior distribution of hidden variables. For example, when training Bayesian HMMs empirically, we need to evaluate p(λ, s|D) in the E-Step. where λ is the HMM parameters and s is the state sequence.

Probabilistic Approach: Monte-Carlo, theoretically good but computationally infeasible. Deterministic Approach: Select a proper q(λ, s) to approximate p(λ, s|D).

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)

We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)

We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)

We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)

We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Variational Approach Separability assumption: q(λ, s) = q(λ)q(s). We can get a new lower bound on model log marginal likelihood Z p(λ, s, D|m) Fm (q(λ), q(s)) = q(λ)q(s) ln q(λ)q(s) It can be iteratively optimized q new (λ) ∝ exp < ln p(D, s|λ) >qold (s) q new (s) ∝ exp < ln p(D, s|λ) >qold (λ)

We have the closed form solution for q new (λ), q new (s) in CDHMM case. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

The Pseudo Code of Evidence Framework based Bayesian Training for CDHMMs iteration loop: variational E-step: conduct Baum-welch on the training set, by using expected log likelihoods instead of Gaussian probabilities, and collect statistics, γi , γi (o), γi (oo⊤ ) variational M-step: maximum evidence E-step: calculate η ˜old for all the CDHMM parameters i maximum evidence M-step: solve η new with the expectation equation while the evidence gap is larger than a threshold Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Optimization Procedure

Log Marginal likelihood Log Marginal likelihood

Log Marginal likelihood

KL(q ||p) KL(q ||p)

VBEM Step

KL(q ||p)

Hyper-parameter estimate New lower bound

New lower bound maximum evidence EM

Lower bound

VB Baum-Welch

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )

Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =

See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

M Y

m=1

α0m −1 ωim

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )

Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =

See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

M Y

m=1

α0m −1 ωim

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )

Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =

See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

M Y

m=1

α0m −1 ωim

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )

Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =

See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

M Y

m=1

α0m −1 ωim

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Parameters Update Gaussian Parameters Conjugate prior is Gaussian-Wishart p(µi , Λi |η) = N (µi ; µ0 , β0−1 Λ−1 i )W(Λi ; Λ0 , ν0 )

Weight Parameters Conjugate prior is Dirichlet distribution p(ωi1 , . . . , ωiM |α01 , . . . , α0M ) =

See more detail in my paper. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

M Y

m=1

α0m −1 ωim

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Outline

Motivation Evidence Framework CDHMMs with VB Experiment

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experiments The evidence framework of CDHMMs was tested on Aurora2, a connected digit recognition task. Whole-word HMM Eleven digits, from ’zero’ to ’nine’, plus ’oh’. 3-component GMM for each the states. Diagonal covariance matrices.

Training Different sized clean training Different sized multi-conditioned training.

Test 0dB-20dB test data. We compared the average word accuracy on the testing. Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Bayesian Predictive Classifier

˜ = argmaxs Bayesian Prediction s

R

p(oT |λs )p(λs |D)dλs

Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Bayesian Predictive Classifier

˜ = argmaxs Bayesian Prediction s

R

p(oT |λs )p(λs |D)dλs

Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Bayesian Predictive Classifier

˜ = argmaxs Bayesian Prediction s

R

p(oT |λs )p(λs |D)dλs

Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Bayesian Predictive Classifier

˜ = argmaxs Bayesian Prediction s

R

p(oT |λs )p(λs |D)dλs

Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Bayesian Predictive Classifier

˜ = argmaxs Bayesian Prediction s

R

p(oT |λs )p(λs |D)dλs

Usually, p(λs |D) is the trained posterior distribution and used as the prior distribution in testing. Used in this study frame-based approximation (used in this study) Assume the prior to be independent for each frame. We have a Student-t, instead of Gaussian, distribution.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Experimental Results 70

90

65

85

60 80 word accuracy(%)

word accuracy(%)

55 50 45 40 evidence β0 = ν0 = 2 (best)

35 30

75 70 65 evidence β = ν =2

60

0

0

0

β = ν = 0.1

25

0

0

3

10 # training utterance

0

β0 = ν0 = 0.1 (best)

55

0

ML

ML 20 2 10

0

β = ν = 0.5

β = ν = 0.5

50 2 10

3

10 # training utterance

Figure: Recognition accuracy of Figure: Recognition accuracy of model trained with different sized model trained with different sized clean training data multi-conditional training data

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Conclusion

We apply the evidence framework, to CDHMM training, which automatically learns the priors and their posteriors from data. The evidence framework trained CDHMM has to better regularization, and higher recognition performance.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Conclusion

We apply the evidence framework, to CDHMM training, which automatically learns the priors and their posteriors from data. The evidence framework trained CDHMM has to better regularization, and higher recognition performance.

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Outline Motivation Evidence Framework CDHMMs with VB Experiment

Question?

Y.Zhang, P.Liu, J.-T.Chien, F.Soong

Thank You!

Thank You!

Thank You!

Thank You!

Thank You!

Thank You!

Thank You!

Thank You!

Thank You!

Thank You!

An Evidence Framework For Bayesian Learning of ...

data is sparse, noisy and mismatched with test. ... In an evidence Bayesian framework, we can build a better regularized HMM with ... recognition performance.

877KB Sizes 0 Downloads 294 Views

Recommend Documents

AN EVIDENCE FRAMEWORK FOR BAYESIAN ...
generalization, and achieve desirable recognition performance for unknown test speech. Under this framework, we develop an EM iterative procedure to ...

An Empirical Framework for Automatically Selecting the Best Bayesian ...
Keywords: Bayesian networks; Data mining; Classifi- cation; Search ... In deciding which classifier will work best for a given dataset there .... The software used to ...

An Empirical Framework for Automatically Selecting the Best Bayesian ...
In deciding which classifier will work best for a given ... ing rationally the one which performs best. ..... The fact that this degree of accuracy was achieved on a.

A Bayesian Framework for Panoramic Imaging of ...
The image sampling and calculation of the photoconsistency observation, energy mini- mization ... Image-based rendering deals with interpolating and fusing discrete, ...... programmable graphics hardware found in modern desktop and laptop ...

Incremental Learning of Nonparametric Bayesian ...
Jan 31, 2009 - Mixture Models. Conference on Computer Vision and Pattern Recognition. 2008. Ryan Gomes (CalTech). Piero Perona (CalTech). Max Welling ...

Incremental Learning of Nonparametric Bayesian ...
Jan 31, 2009 - Conference on Computer Vision and Pattern Recognition. 2008. Ryan Gomes (CalTech) ... 1. Hard cluster data. 2. Find the best cluster to split.

An Introduction to Bayesian Machine Learning
Apr 8, 2013 - Instead of manually encoding patterns in computer programs, we make computers learn these patterns without explicitly programming them .

A conceptual framework for the integration of learning ...
Test LT in situ. • Students using the LT. Monitor and adapt the integration. • Continuous “integrative evaluation”. • Adapt the LT and the REST of the course “system”. Evaluation of implementation ..... operates, but whether it does so

Bayesian Reinforcement Learning
2.1.1 Bayesian Q-learning. Bayesian Q-learning (BQL) (Dearden et al, 1998) is a Bayesian approach to the widely-used Q-learning algorithm (Watkins, 1989), in which exploration and ex- ploitation are balanced by explicitly maintaining a distribution o

An Ensemble Based Incremental Learning Framework ...
May 2, 2010 - Index Terms—concept drift, imbalanced data, ensemble of classifiers, incremental ..... show a more detailed empirical analysis of the variation of. ߟ. Overall, the ... thus making the minority class prediction the most difficult. Fig

An Ensemble Based Incremental Learning Framework ...
May 2, 2010 - well-established SMOTE algorithm can address the class im- balance ... drift in [13], based on their natural ability to obtain a good balance ...... Change in Streaming Data," School of Computer Science, Bangor. University, UK ...

An Architectural Framework for Interactive Music Systems
Software Architecture, Interactive Systems, Music soft- ... synthesis of data media of different nature. ... forms (e.g. Max/MSP [19] and Pure Data [24]), and oth-.

An Extended Framework of STRONG for Simulation ...
Feb 29, 2012 - Indeed, STRONG is an automated framework with provable .... Construct a local model rk(x) around the center point xk. Step 2. .... We call the sample size required for each iteration a sample size schedule, which refers to a.

Evidence for an additional effect of whole-body ...
Sep 3, 2010 - Summary The addition of whole-body vibration to high- load resistive ...... Conflicts of interest Dieter Felsenberg acts as a consultant to the.

Further Evidence for an Autonomous Processing of ...
of energy, such as "brightness" or "sharpness" (yon Bis- .... types of I tones instead of five, subjects were run in six ..... Admittedly, an alternative hypoth- esis can ...

Further Evidence for an Autonomous Processing of ...
For such I tones, we expected from previous data .... ous data obtained in a comparable condition (Semal and ... license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp ..... "Mapping of interactions in the pitch memory.

New Local Move Operators for Bayesian Network Structure Learning
more operations in one move in order to overcome the acyclicity constraint of Bayesian networks. These extra operations are temporally .... gorithm for structural learning of Bayesian net- works. It collects the best DAG found by r ..... worth noting

A Primer on Learning in Bayesian Networks for ...
Introduction. Bayesian networks (BNs) provide a neat and compact representation for expressing joint probability distributions. (JPDs) and for inference. They are becoming increasingly important in the biological sciences for the tasks of inferring c

Robust Bayesian Learning for Wireless RF Energy ...
rely on ambient sources such as solar or wind in which the amount of energy harvested strongly depends on envi- ronmental factors. The RF energy source can ...