Japanese Pronunciation Prediction as Phrasal SMT Jun Hatori (University of Tokyo) Hisami Suzuki (Microsoft Research)
In IJCNLP-2011 Chiang Mai, Thailand 2011/11/9
Task
Predict the pronunciation of Japanese text. ◦ Input – ◦ Output – (pron.) - kyoo wa yomitanson ni itta
Applications ◦ Text-to-speech conversion ◦ Transliteration of proper nouns for MT & search ◦ Training data creation for input methods (a.k.a. kana-to-kanji conversion)
Japanese Orthography
A Japanese text consists of ◦ Kanji: ideographic characters (e.g. “village”) ◦ Kana: phonetic characters (e.g. /ni/)
Kanji is the source of pron. ambiguity ◦ A kanji has 2.5 pronunciations on average. ◦ Frequent kanji characters tend to have many (10–20) pronunciations.
An Example of Japanese Text 国境の長いトンネルを抜けると The train came out of the long tunnel into the snow country.
雪国であった。夜の底が白くなった。 The earth lay white under the night sky. – Snow Country (Yasunari Kawabata)
An Example of Japanese Text 国境の長いトンネルを抜けると no i ton-neru o keruto
雪国であった。夜の底が白くなった。 de a tta. no ga kuna tta. Each of “kana” characters has a unique pronunciation.
An Example of Japanese Text 国境の長いトンネルを抜けると kuni guni kuna koku kok kou
sakai kyo kei zakai
naga osa take na cho choo
nu batsu batt bachi
雪国であった。夜の底が白くなった。 yuki setsu sett sechi susu soso
kuni guni kuna koku kok kou
yoru ya yo
soko zoko tei tai
shiro shira haku byaku
“kanji” characters usually have multiple pronunciations.
Ambiguity and Idiosyncrasy
Character-level pronunciations are highly ambiguous. ◦ ◦3
/kan-naoto/ 14 12 = 504 possibilities!
Idiosyncrasy (non-compositionality) ◦ Pronunciation of a word is commonly a meaning-based mapping of the sound of a Japanese word to a Chinese writing form. E.g.
/ashita/ “tomorrow” (mei + jitsu)
Word-level pronunciation dictionaries are essential.
Pointwise Approach [Mori+ 10]
Two-step approach ◦ Step 1: word segmentation ◦ Step 2: pronunciation disambiguation as SVM-based classification for each word E.g. “ ” /ninki/ (popularity), /jinki/ (people’s atmosphere), /hitoke/ (sign of life)
Requires a separate model for handling OOV (out-of-vocabulary) words ◦ A simple noisy channel model with a character bigram probability is used.
Substring-based Model [Hatori+ 11] Focuses on the pron. prediction for OOV words. SMT-like unsupervised (no-dictionary) approach
◦ Pronunciations are learned by parallel corpora ◦ Monotone alignments; no insertion or deletion
Single-character translation operations Use composed operations to capture substring-level information and context. No mechanism to accommodate dictionary information.
Our Approach
Dictionary-based phrasal SMT ◦ Dictionary entries as minimal translation unit Entries: word and character-level pronunciations For known words, word-level pron. are used. For OOV words, the pronunciation is reasonably guessed by using character-level pronunciations.
◦ A unified approach that can deal with the sentence-level pronunciation assignment, while integrating OOV pron. prediction as part of the whole task.
Dictionary-based Operations
Dictionary words are the standard units. As a back-off, character-level pron. are used for OOV words. Dictionary-based alignments are obtained using our dictionary-based phrasal decoder. ◦ Unreachable operations are discarded. ◦ Makes the model more robust to noise.
Composed Operations During decoding, translation operations are composed so as to maximize the overall probability. In our current work, composed operations are the compositions of dictionary words.
◦ Allows us to consider wider, phrasal context
SMT-based Framework
Simplified SMT ◦ Monotone alignment, no insertion or deletion
Linear model: Score(s,t,λ) = Σi λifi (s,t)
◦ Weights are trained with averaged perceptron. ◦ Stack decoder [Zens+ 04]
Features (component models) ◦ ◦ ◦ ◦
Bi-directional translation probability P(t|s),P(s|t) Character 5-gram probability P(t) Number of phrases/characters Joint trigram probability P(s,t)
Joint N-gram Language Model
Joint n-gram: a language model for the sequence of translation operations [Bisani+ 04] ◦ Provide smoothed context for pron. disambiguation ◦ Incorporate single-kanji pronunciation dependencies into OOV pronunciation prediction
Summary of Training
←Substring-based model ↓Proposed model
Related Works
Japanese Pronunciation Prediction ◦ SVM-based two-step approach [Mori+ 10] ◦ Substring-based word pron. prediction [Hatori+ 11]
Our contribution ◦ Integrating word- and character-level pronunciations based on dictionary-based alignment. ◦ Capturing larger context by the composition of wordlevel pronunciations (e.g. 8–24 alignment) ◦ Scalable: probabilities of the component models are obtained from the frequencies in the training corpus.
Experiment – Baseline Models
SubStr: Substring-based model [Hatori+ 11] SubStr+: Extended substring-based model ◦ Additionally uses joint n-gram probability and dictionary features
KyTea
[Mori+ 10]
◦ A state-of-the-art Japanese pronunciation prediction system ◦ Performs SVM-based classification of word pronunciations, along with a simple OOV model
Wikipedia-derived pairs (460k instances) ◦ Extracted word-pronunciation pairs using pattern matching with parenthesis. (noisy)
Newspaper corpus (1.4m sents)
Experiment – Evaluation Dataset
Nikkei/Kyodo Newspaper
News-1/2)
◦ Consisting of full complete sentences
Bing Search query log (Query-1/2) ◦ General nouns phrases/proper nouns
Difficult-to-pronounce word corpus (Name) ◦ Consisting mostly of person names
Wikipedia instances (Wiki) ◦ Mostly named entities and technical words
Test set
News-1
News-2
Query-1
Query-2
Name
Wiki
Avg. len.
51.8
44.9
3.8
5.7
3.0
4.1
OOV rate
0.3%
0.3%
3.5%
12.7%
23.4%
13.7%
Final Result 100 90 80 70 60 SubStr SubStr+ Proposed
50 40 30
20 10 0 News-1 News-2 Query-1 Query-2 Names
Wiki
Final Result 100 90 80 70 60 SubStr SubStr+ Proposed
50 40 30
20 10 0 News-1 News-2 Query-1 Query-2 Names
Wiki
Final Result 100 90 80 70 60 SubStr SubStr+ Proposed
50 40 30
20 10 0 News-1 News-2 Query-1 Query-2 Names
Wiki
Comparison to KyTea (training with BCCWJ and UniDic) 100 95 90 85 80 KyTea (w/noise) KyTea Proposed
75 70 65
60 55 50 News-1
News-2
Query-1
Query-2
Comparison to KyTea (training with BCCWJ, Wiki, and UniDic) 100 95 90 85 80 KyTea (w/noise) KyTea Proposed
75 70 65
60 55 50 News-1
News-2
Query-1
Query-2
Conclusion
We proposed an SMT-based pronunciation prediction model with an effective use of dictionary-based operations. ◦ Achieved ∼90% accuracies in various domains. ◦ Robust for OOV words, and work effectively for standard texts.
Future work ◦ Investigate the use of contextual features such as character- and pronunciation-type dependencies.
Bi-directional translation probability P(t|s),P(s|t). ⦠Character ... Wikipedia-derived pairs (460k instances) ... Test set News-1 News-2 Query-1 Query-2 Name Wiki.
2Microsoft Research / One Microsoft Way, Redmond, WA 98052, USA [email protected] ... ship at Microsoft Research. .... Sugaya (2006) proposed a method to use the web ..... ings of the European Conference on Speech Commu-.
dictionary using the web; (2) Building a decoder for the task of pronunciation prediction, for which ... the word-pronunciation pairs harvested from unannotated text achieves over. 98% precision ... transliteration, letter-to-phone. 1 Introduction.
Aug 21, 2017 - processes and a world class facility. The company is ... medical devices, Internet of things, optical communication, automotive electronics and ...
âhaving received little or no specialist training in the teaching of pronunciation' (p 25). In 2001, Breitkreutz et al surveyed ... However, information available in phonology textbooks is not easy to make relevant to ..... prosody ratings for pron
Phonetics and phonology courses were gradually dropped from many teacher training programs and pronunciation was, in general, covered briefly if at all.
May 31, 2015 - tistical models for headline generation, training of the models, and their ... lems suffered by traditional metrics for auto- matically evaluating the ...
Defect prediction models aim at identifying likely defect-prone software components .... (ii) global models, and (iii) global models accounting for data specificity.
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. PHRASAL ...
Jul 29, 2013 - I am to further inform that, Awards are proposed to be given to the deserving teachers & Teacher educators working under the categories at a State .... for recommendation the teachers for state Awards. 2. Criteria to be followed for se
number of defects that the analysis would likely discover (effectiveness), and LOC to be analyzed/tested ... Defect prediction models aim at identifying likely defect-prone software components to prioritize ... often quite good in terms of the cost-e
Page 2 of 10. PROGRAMACIà TRIMESTRAL Escola del Mar, curs 2017-18. 5è. 2. SEGON TRIMESTRE. Numeració i cà lcul. - Nombres decimals: part sencera i ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Phrasal Verb Organiser.pdf. Phrasal Verb Organiser.pdf. Open. Extract. Open with. Sign In. Main menu. Whoops
(they don't need to draw very well â stick figures will do.). Draw a line down. the middle of the board. Each team's artist will draw on a different half of the. board.
Facts in brief are that the assessee was head of the family after the death of ... her unawareness about source of investment made in the said property and.
Valuable relationships, which are characterized at the level of social interaction by ..... Nakamichi M, Shizawa Y (2003) Distribution of grooming among adult ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. phrasal verb governmentadda.com.pdf. phrasal verb governmentadda.com.pdf. Open. Extract. Open with. Sign In.