Speech recognizer for EllaVator Arif Khan Saarland University [email protected]

Wednesday 3rd June, 2015

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

1 / 13

.

Overview

1

Speech recognizer - background Components of speech recognizer Acoustic model (AM) Language model (LM) Grammar based LM Statistical LM

2

Using LM with opendial

3

Training Models Installing Sphinx for training Training acoustic model

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

2 / 13

.

Components of speech recognizer

. . . . . . . . Figure: Main components of speech recognizer . .

Arif Khan (UdS)

ASR for EllaVator

.

. . . .

. . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

3 / 13

.

Acoustic model Acoustic model - AM

. Figure: Acoustic model . . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

4 / 13

.

Language model

Grammar based LM Write grammars to specify the possible sentence structures Good for small set of sentences (small domains) In some specification, weights can be assigned to sentence structure

With Sphinx plugin of opendial, JSpeech Grammar Format (JSGF) is used for specifying grammar Provides a lot of flexibility for writing grammar (enough for EllaVator) see documentation for complete specification and examples. JSGF also inherits the drawback of grammar based LM

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

5 / 13

.

Language Model

Example grammar ellavator; public = | ; = ( Hello | “Good morning” | “Good evening”) Ella; = ( ); = [ Take me to ] | [please ]; = ( first | second | third | fourth) floor;

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

6 / 13

.

Language Model

Statistical LM Statistical LM gives probabilistic estimates of word strings from large text corpora of transcribed speech. The probabilities for a word are approximated from the preceding sequence. The preceding sequence could be one (bigram), two (trigram) words For trigrams we have:

P(wk |wk−1 , wk−2 ) =

count(wk−2 , wk−1 , wk ) total(wk−2 , wk−1 )

. . .

Arif Khan (UdS)

ASR for EllaVator

(1)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

7 / 13

.

Statistical LM - Example <001> Hello Ella fourth floor <002> Good morning Ella fourth floor <003> Good evening Ella four floor

For probability of “Ella” if “Hello” is already spoken: P(Ella|Hello) =

count(Ella, Hello) total(Hello)

. . .

Arif Khan (UdS)

ASR for EllaVator

(2)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

8 / 13

.

Language Model

Statistical LM - Example With Sphinx we can also use statistical LM trained by various tools. Some tools that we can use for training are: cmulmtk, IRSLM, MITLM, SRILM, http://cmusphinx.sourceforge.net/wiki/tutoriallm we can also use the online interface of cmulmtk for small set of sentences. http://www.speech.cs.cmu.edu/tools/lmtool-new.html Dont forget to pre-process the data before using the online tool (web interface)

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

9 / 13

.

Language Model Statistical LM - Example

. Figure: Bigram language model for Ella . . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

10 / 13

.

Language Model

Which one is good Statistical LM captures the actual probabilities from corpus. Weights can be assigned to grammar based LM, but weights are static. Empty loops are valid sentences in grammar based LM, if the grammar is poorly written.

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

11 / 13

.

Using LM with opendial

Using LM with opendial Write a grammar for EllaVator in SJGF that covers the user utterances. Come up with example utterances using the grammar you wrote

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

12 / 13

.

Installing Sphinx for training acoustic model

Downloading the following sphinxbase, sphinxtrain, pocketsphinx from https://github.com/cmusphinx Tutorial for training acoustic model http://cmusphinx.sourceforge.net/wiki/tutorialam

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

13 / 13

.

Speech recognizer for EllaVator - GitHub

Jun 3, 2015 - Page 1 .... <start> = [ Take me to ] | [please ];. = ( first ... Dont forget to pre-process the data before using the online tool (web interface).

301KB Sizes 6 Downloads 93 Views

Recommend Documents

Part-of-Speech Driven Cross-Lingual Pronoun Prediction with ... - GitHub
Most modern statistical machine translation ... of using discourse analysis for pronoun translation .... The best performing classes are ce, ils ..... Giza++ software.

Fortran Numerical Constants Recognizer
Mar 7, 2006 - 2 DEVELOPMENT . ... 2.3 Application main entry point . ... soft Visual C# Express Edition 2005 development environment. 1.2 Definition.

CASA Based Speech Separation for Robust Speech Recognition
National Laboratory on Machine Perception. Peking University, Beijing, China. {hanrq, zhaopei, gaoqin, zhangzp, wuhao, [email protected]}. Abstract.

Haskell for LATEX2e - GitHub
School of Computer Science and Engineering. University of New South Wales, Australia [email protected] .... Research report, Yale University, April 1997. 4.

hacking for sustainability - GitHub
web, is the collection of interconnected hypertext3 documents. 3 Hypertext is a .... the online photo service Flickr hosts now more than 200 ... It is one of the top ten most visited websites ..... best possible delivery route between different store

Uses for Modules - GitHub
In this article, I will walk you through several practical examples of what modules can be used for, ranging from ... that is not the case. Ruby would apply the definitions of Document one after the other, with whatever file was required last taking

Science for Solufions - GitHub
In house, recent synthesis vs. purchased compound collection ... identified? Stouch. 9. CADD GRC July 2013. 26750. 26850. 26950. 3. 4. 5. 6. 7 .... S6. S20 sample mean. Many instruments operate like this (e.g. plate reader that .... http://www.labcyt

For Developing Countries - GitHub
Adhere to perceived affordance of a mobile phone to avoid alienating the user. Target users are unfamiliar to banking terminology. Balance Check. Shows current balance in m-banking account. Top Up. Scratch card or merchant credit transfer. Credit Tra

STRUCTURED LANGUAGE MODELING FOR SPEECH ...
A new language model for speech recognition is presented. The model ... 1 Structured Language Model. An extensive ..... 2] F. JELINEK and R. MERCER.

ai for speech recognition pdf
Page 1 of 1. File: Ai for speech recognition pdf. Download now. Click here if your download doesn't start automatically. Page 1. ai for speech recognition pdf.

Speech Recognition for Mobile Devices at Google
phones running the Android operating system like the Nexus One and others becoming ... decision-tree tied 3-state HMMs with currently up to 10k states total.