Speech recognizer for EllaVator Arif Khan Saarland University [email protected]

Wednesday 3rd June, 2015

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

1 / 13

.

Overview

1

Speech recognizer - background Components of speech recognizer Acoustic model (AM) Language model (LM) Grammar based LM Statistical LM

2

Using LM with opendial

3

Training Models Installing Sphinx for training Training acoustic model

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

2 / 13

.

Components of speech recognizer

. . . . . . . . Figure: Main components of speech recognizer . .

Arif Khan (UdS)

ASR for EllaVator

.

. . . .

. . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

3 / 13

.

Acoustic model Acoustic model - AM

. Figure: Acoustic model . . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

4 / 13

.

Language model

Grammar based LM Write grammars to specify the possible sentence structures Good for small set of sentences (small domains) In some specification, weights can be assigned to sentence structure

With Sphinx plugin of opendial, JSpeech Grammar Format (JSGF) is used for specifying grammar Provides a lot of flexibility for writing grammar (enough for EllaVator) see documentation for complete specification and examples. JSGF also inherits the drawback of grammar based LM

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

5 / 13

.

Language Model

Example grammar ellavator; public = | ; = ( Hello | “Good morning” | “Good evening”) Ella; = ( ); = [ Take me to ] | [please ]; = ( first | second | third | fourth) floor;

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

6 / 13

.

Language Model

Statistical LM Statistical LM gives probabilistic estimates of word strings from large text corpora of transcribed speech. The probabilities for a word are approximated from the preceding sequence. The preceding sequence could be one (bigram), two (trigram) words For trigrams we have:

P(wk |wk−1 , wk−2 ) =

count(wk−2 , wk−1 , wk ) total(wk−2 , wk−1 )

. . .

Arif Khan (UdS)

ASR for EllaVator

(1)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

7 / 13

.

Statistical LM - Example <001> Hello Ella fourth floor <002> Good morning Ella fourth floor <003> Good evening Ella four floor

For probability of “Ella” if “Hello” is already spoken: P(Ella|Hello) =

count(Ella, Hello) total(Hello)

. . .

Arif Khan (UdS)

ASR for EllaVator

(2)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

8 / 13

.

Language Model

Statistical LM - Example With Sphinx we can also use statistical LM trained by various tools. Some tools that we can use for training are: cmulmtk, IRSLM, MITLM, SRILM, http://cmusphinx.sourceforge.net/wiki/tutoriallm we can also use the online interface of cmulmtk for small set of sentences. http://www.speech.cs.cmu.edu/tools/lmtool-new.html Dont forget to pre-process the data before using the online tool (web interface)

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

9 / 13

.

Language Model Statistical LM - Example

. Figure: Bigram language model for Ella . . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

10 / 13

.

Language Model

Which one is good Statistical LM captures the actual probabilities from corpus. Weights can be assigned to grammar based LM, but weights are static. Empty loops are valid sentences in grammar based LM, if the grammar is poorly written.

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

11 / 13

.

Using LM with opendial

Using LM with opendial Write a grammar for EllaVator in SJGF that covers the user utterances. Come up with example utterances using the grammar you wrote

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

12 / 13

.

Installing Sphinx for training acoustic model

Downloading the following sphinxbase, sphinxtrain, pocketsphinx from https://github.com/cmusphinx Tutorial for training acoustic model http://cmusphinx.sourceforge.net/wiki/tutorialam

. . .

Arif Khan (UdS)

ASR for EllaVator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . rd

Wednesday 3

June, 2015

. .

. .

. . . .

13 / 13

.

Speech recognizer for EllaVator - GitHub

Jun 3, 2015 - Page 1 .... <start> = [ Take me to ] | [please ];. = ( first ... Dont forget to pre-process the data before using the online tool (web interface).

301KB Sizes 7 Downloads 277 Views

Recommend Documents

accent tutor: a speech recognition system - GitHub
This is to certify that this project prepared by SAMEER KOIRALA AND SUSHANT. GURUNG entitled “ACCENT TUTOR: A SPEECH RECOGNITION SYSTEM” in partial fulfillment of the requirements for the degree of B.Sc. in Computer Science and. Information Techn

Fortran Numerical Constants Recognizer
Mar 7, 2006 - 2 DEVELOPMENT . ... 2.3 Application main entry point . ... soft Visual C# Express Edition 2005 development environment. 1.2 Definition.

Part-of-Speech Driven Cross-Lingual Pronoun Prediction with ... - GitHub
Most modern statistical machine translation ... of using discourse analysis for pronoun translation .... The best performing classes are ce, ils ..... Giza++ software.

CASA Based Speech Separation for Robust Speech Recognition
National Laboratory on Machine Perception. Peking University, Beijing, China. {hanrq, zhaopei, gaoqin, zhangzp, wuhao, [email protected]}. Abstract.

CASA Based Speech Separation for Robust Speech ...
techniques into corresponding speakers. Finally, the output streams are reconstructed to compensate the missing data in the abovementioned processing steps ...

Dynamic Evidence Models in a DBN Phone Recognizer
patterns to be learned from relatively small amounts of data. Once trained, the ... Figure 1: A graphical representation of the HHMM-based phone recognition ...

Lodash for President - GitHub
Page 1. LODASH FOR PRESIDENT. Christian Ulbrich, CDO Zalari UG. Page 2. AGENDA PROPAGANDA. • Recap: LoDash. • Why? • Installation.

Uses for Modules - GitHub
In this article, I will walk you through several practical examples of what modules can be used for, ranging from ... that is not the case. Ruby would apply the definitions of Document one after the other, with whatever file was required last taking

For Developing Countries - GitHub
Adhere to perceived affordance of a mobile phone to avoid alienating the user. Target users are unfamiliar to banking terminology. Balance Check. Shows current balance in m-banking account. Top Up. Scratch card or merchant credit transfer. Credit Tra

Optimal Design of a Molecular Recognizer: Molecular Recognition as ...
information channels and especially of molecular codes [4], [5]. The task of the molecular .... Besides the questions regarding the structural mis- match between the ...... Institute of Technology, Haifa, Israel, and the M.Sc. degree in physics from 

Clojure for Beginners - GitHub
Preview. Language. Overview. Clojure Basics & .... (clojure.java.io/reader file))]. (doseq [line .... Incremental development via REPL ⇒ less unexpected surprises ...

hacking for sustainability - GitHub
web, is the collection of interconnected hypertext3 documents. 3 Hypertext is a .... the online photo service Flickr hosts now more than 200 ... It is one of the top ten most visited websites ..... best possible delivery route between different store

Directions For Use - GitHub
Page 7 of 46. 4. Using EMPOP to perform mtDNA haplotype frequency estimates. EMPOP follows the revised and extended guidelines for mitochondrial DNA typing issued by the DNA commission of the ISFG (Parson et al. 2014). See document for further detail

Haskell for LATEX2e - GitHub
School of Computer Science and Engineering. University of New South Wales, Australia [email protected]. .... Research report, Yale University, April 1997. 4.

Manual for tsRFinder - GitHub
Feb 11, 2015 - Book (running OS X) or laptop (running Linux), typically within 10 min- .... ure (”label/distribution.pdf”) showing the length distributions and base.

structured language modeling for speech ... - Semantic Scholar
20Mwds (a subset of the training data used for the baseline 3-gram model), ... it assigns probability to word sequences in the CSR tokenization and thus the ...

STRUCTURED LANGUAGE MODELING FOR SPEECH ...
A new language model for speech recognition is presented. The model ... 1 Structured Language Model. An extensive ..... 2] F. JELINEK and R. MERCER.

Protractor: a fast and accurate gesture recognizer - Research at Google
CHI 2010, April 10–15, 2010, Atlanta, Georgia, USA. Copyright 2010 ACM .... experiment on a T-Mobile G1 phone running Android. When 9 training samples ...

PM's Speech for buddhist conference.pdf
Buddha means 'a man with supreme enlightenment'. His. renouncement of all worldly pleasures at the royal palace,. pilgrimage for truth and attainment of enlightenment speak. volumes about his perception of the essence of human life. His. teachings al