Challenges in Automatic Speech Recognition 2010-2020: Speech Technology for the Next Decade - Visions from Academia and Industry Ciprian Chelba, Michiel Bacchiani, Johan Schalkwyk {ciprianchelba,michiel,johans}@google.com

Google

09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 1

Case Study:Google Search by Voice

Carries 25% of USA Google mobile search queries! What contributed to success: clearly set user expectation by existing text app excellent language model built from query stream clean speech: users are motivated to articulate clearly phones do high quality speech capture speech tranferred error free to server over IP Challenges: Making and measuring progress: manually transcribing data is at about same word error rate as system (15%) 09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 2

Case Study: Google Labs GAudi Demo This was the study for the YouTube feature that is now launched for all and integrated with translation. Main challenge: lack of coverage due to ASR limitations: noise-robustness speaker/accent/channel variability language model mismatches web is multi-lingual

09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 3

ASR for Retrieval and Ranking

On large document collections search is truly about [email protected] There is seldom a good reason to replace a result in the top-N with one that has hits in the (noisy) ASR transcript. Future directions: improve retrieval for "hard queries" which return very few documents based strictly on keyword hits in the text metadata speech-rich sub-domains such as lectures/talks in English recorded in a controlled setup where current ASR capabilities are adequate after manual tuning to the sub-domain. 09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 4

Core Technology

Current state: automatic speech recognition is incredibly complex problem is fundamentally unsolved data availability and computing have changed significantly since the mid-nineties Challenges and Directions: re-visit (simplify!) modeling choices made on corpora of modest size; 2-3 orders of magnitude more data is available multi-linguality built-in from start noise-robustness and speaker/channel variability

09/29/2010 Ciprian Chelba et al., Challenges in ASR – p. 5

Challenges in Automatic Speech Recognition - Research at Google

Case Study:Google Search by Voice. Carries 25% of USA Google mobile search queries! ... speech-rich sub-domains such as lectures/talks in ... of modest size; 2-3 orders of magnitude more data is available multi-linguality built-in from start.

45KB Sizes 7 Downloads 118 Views

Recommend Documents

Large Vocabulary Automatic Speech ... - Research at Google
Sep 6, 2015 - child speech relatively better than adult. ... Speech recognition for adults has improved significantly over ..... caying learning rate was used. 4.1.

Automatic Speech and Speaker Recognition ... - Semantic Scholar
7 Large Margin Training of Continuous Density Hidden Markov Models ..... Dept. of Computer and Information Science, ... University of California at San Diego.

STATE-OF-THE-ART SPEECH RECOGNITION ... - Research at Google
model components of a traditional automatic speech recognition. (ASR) system ... voice search. In this work, we explore a variety of structural and optimization improvements to our LAS model which significantly improve performance. On the structural

AUTOMATIC LANGUAGE IDENTIFICATION IN ... - Research at Google
this case, analysing the contents of the audio or video can be useful for better categorization. ... large-scale data set with 25000 music videos and 25 languages.

Speech Recognition for Mobile Devices at Google
phones running the Android operating system like the Nexus One and others becoming ... decision-tree tied 3-state HMMs with currently up to 10k states total.

Automatic generation of research trails in web ... - Research at Google
Feb 10, 2010 - thematic exploration, though the theme may change slightly during the research ... add or rank results (e.g., [2, 10, 13]). Research trails are.

AutoFDO: Automatic Feedback-Directed ... - Research at Google
about the code's runtime behavior to guide optimization, yielding improvements .... 10% faster than binaries optimized without AutoFDO. 3. Profiling System.

automatic pronunciation verification - Research at Google
Further, a lexicon recorded by experts may not cover all the .... rently containing interested words are covered. 2. ... All other utterances can be safely discarded.

Challenges And Opportunities In Media Mix ... - Research at Google
Media mix models (MMMs) are statistical models used by advertisers to .... The ads exposure data is more challenging to collect, as ad campaigns are often ... publication can be provided, it is not always a good proxy for the actual ... are well-esti