Japanese Pronunciation Prediction as Phrasal SMT Jun Hatori (University of Tokyo) Hisami Suzuki (Microsoft Research)

In IJCNLP-2011 Chiang Mai, Thailand 2011/11/9

Task 

Predict the pronunciation of Japanese text. ◦ Input – ◦ Output – (pron.) - kyoo wa yomitanson ni itta



Applications ◦ Text-to-speech conversion ◦ Transliteration of proper nouns for MT & search ◦ Training data creation for input methods (a.k.a. kana-to-kanji conversion)

Japanese Orthography 

A Japanese text consists of ◦ Kanji: ideographic characters (e.g. “village”) ◦ Kana: phonetic characters (e.g. /ni/)



Kanji is the source of pron. ambiguity ◦ A kanji has 2.5 pronunciations on average. ◦ Frequent kanji characters tend to have many (10–20) pronunciations.

An Example of Japanese Text 国境の長いトンネルを抜けると The train came out of the long tunnel into the snow country.

雪国であった。夜の底が白くなった。 The earth lay white under the night sky. – Snow Country (Yasunari Kawabata)

An Example of Japanese Text 国境の長いトンネルを抜けると no i ton-neru o keruto

雪国であった。夜の底が白くなった。 de a tta. no ga kuna tta. Each of “kana” characters has a unique pronunciation.

An Example of Japanese Text 国境の長いトンネルを抜けると kuni guni kuna koku kok kou

sakai kyo kei zakai

naga osa take na cho choo

nu batsu batt bachi

雪国であった。夜の底が白くなった。 yuki setsu sett sechi susu soso

kuni guni kuna koku kok kou

yoru ya yo

soko zoko tei tai

shiro shira haku byaku

“kanji” characters usually have multiple pronunciations.

Ambiguity and Idiosyncrasy 

Character-level pronunciations are highly ambiguous. ◦ ◦3



/kan-naoto/ 14 12 = 504 possibilities!

Idiosyncrasy (non-compositionality) ◦ Pronunciation of a word is commonly a meaning-based mapping of the sound of a Japanese word to a Chinese writing form.  E.g.



/ashita/ “tomorrow” (mei + jitsu)

Word-level pronunciation dictionaries are essential.

Pointwise Approach [Mori+ 10] 

Two-step approach ◦ Step 1: word segmentation ◦ Step 2: pronunciation disambiguation as SVM-based classification for each word  E.g. “ ” /ninki/ (popularity), /jinki/ (people’s atmosphere), /hitoke/ (sign of life)



Requires a separate model for handling OOV (out-of-vocabulary) words ◦ A simple noisy channel model with a character bigram probability is used.

Substring-based Model [Hatori+ 11] Focuses on the pron. prediction for OOV words.  SMT-like unsupervised (no-dictionary) approach 

◦ Pronunciations are learned by parallel corpora ◦ Monotone alignments; no insertion or deletion

Single-character translation operations  Use composed operations to capture substring-level information and context.  No mechanism to accommodate dictionary information. 

Our Approach 

Dictionary-based phrasal SMT ◦ Dictionary entries as minimal translation unit  Entries: word and character-level pronunciations  For known words, word-level pron. are used.  For OOV words, the pronunciation is reasonably guessed by using character-level pronunciations.

◦ A unified approach that can deal with the sentence-level pronunciation assignment, while integrating OOV pron. prediction as part of the whole task.

Dictionary-based Operations   

Dictionary words are the standard units. As a back-off, character-level pron. are used for OOV words. Dictionary-based alignments are obtained using our dictionary-based phrasal decoder. ◦ Unreachable operations are discarded. ◦ Makes the model more robust to noise.

Composed Operations During decoding, translation operations are composed so as to maximize the overall probability.  In our current work, composed operations are the compositions of dictionary words. 

◦ Allows us to consider wider, phrasal context

SMT-based Framework 

Simplified SMT ◦ Monotone alignment, no insertion or deletion



Linear model: Score(s,t,λ) = Σi λifi (s,t)

◦ Weights are trained with averaged perceptron. ◦ Stack decoder [Zens+ 04]



Features (component models) ◦ ◦ ◦ ◦

Bi-directional translation probability P(t|s),P(s|t) Character 5-gram probability P(t) Number of phrases/characters Joint trigram probability P(s,t)

Joint N-gram Language Model 

Joint n-gram: a language model for the sequence of translation operations [Bisani+ 04] ◦ Provide smoothed context for pron. disambiguation ◦ Incorporate single-kanji pronunciation dependencies into OOV pronunciation prediction

Summary of Training

←Substring-based model ↓Proposed model

Related Works 

Japanese Pronunciation Prediction ◦ SVM-based two-step approach [Mori+ 10] ◦ Substring-based word pron. prediction [Hatori+ 11]



Transliteration / letter-to-phoneme conversion ◦ Joint n-gram & discriminative features [Jiampojamarn+ 10]  2–2 (source-target) substring-based alignment



Our contribution ◦ Integrating word- and character-level pronunciations based on dictionary-based alignment. ◦ Capturing larger context by the composition of wordlevel pronunciations (e.g. 8–24 alignment) ◦ Scalable: probabilities of the component models are obtained from the frequencies in the training corpus.

Experiment – Baseline Models  

SubStr: Substring-based model [Hatori+ 11] SubStr+: Extended substring-based model ◦ Additionally uses joint n-gram probability and dictionary features



KyTea

[Mori+ 10]

◦ A state-of-the-art Japanese pronunciation prediction system ◦ Performs SVM-based classification of word pronunciations, along with a simple OOV model

Experiment - Training Data 

Dictionary (770k token pairs) ◦ UniDic (630k entries) ◦ Iwanami Dictionary (107k entries) ◦ in-house dictionary (226k entries)



Wikipedia-derived pairs (460k instances) ◦ Extracted word-pronunciation pairs using pattern matching with parenthesis. (noisy)



Newspaper corpus (1.4m sents)

Experiment – Evaluation Dataset 

Nikkei/Kyodo Newspaper

News-1/2)

◦ Consisting of full complete sentences 

Bing Search query log (Query-1/2) ◦ General nouns phrases/proper nouns



Difficult-to-pronounce word corpus (Name) ◦ Consisting mostly of person names



Wikipedia instances (Wiki) ◦ Mostly named entities and technical words

Test set

News-1

News-2

Query-1

Query-2

Name

Wiki

Avg. len.

51.8

44.9

3.8

5.7

3.0

4.1

OOV rate

0.3%

0.3%

3.5%

12.7%

23.4%

13.7%

Final Result 100 90 80 70 60 SubStr SubStr+ Proposed

50 40 30

20 10 0 News-1 News-2 Query-1 Query-2 Names

Wiki

Final Result 100 90 80 70 60 SubStr SubStr+ Proposed

50 40 30

20 10 0 News-1 News-2 Query-1 Query-2 Names

Wiki

Final Result 100 90 80 70 60 SubStr SubStr+ Proposed

50 40 30

20 10 0 News-1 News-2 Query-1 Query-2 Names

Wiki

Comparison to KyTea (training with BCCWJ and UniDic) 100 95 90 85 80 KyTea (w/noise) KyTea Proposed

75 70 65

60 55 50 News-1

News-2

Query-1

Query-2

Comparison to KyTea (training with BCCWJ, Wiki, and UniDic) 100 95 90 85 80 KyTea (w/noise) KyTea Proposed

75 70 65

60 55 50 News-1

News-2

Query-1

Query-2

Conclusion 

We proposed an SMT-based pronunciation prediction model with an effective use of dictionary-based operations. ◦ Achieved ∼90% accuracies in various domains. ◦ Robust for OOV words, and work effectively for standard texts.



Future work ◦ Investigate the use of contextual features such as character- and pronunciation-type dependencies.

Japanese Pronunciation Prediction as Phrasal SMT

Bi-directional translation probability P(t|s),P(s|t). ◦ Character ... Wikipedia-derived pairs (460k instances) ... Test set News-1 News-2 Query-1 Query-2 Name Wiki.

2MB Sizes 0 Downloads 303 Views

Recommend Documents

Japanese Pronunciation Prediction as Phrasal ...
2Microsoft Research / One Microsoft Way, Redmond, WA 98052, USA [email protected] ... ship at Microsoft Research. .... Sugaya (2006) proposed a method to use the web ..... ings of the European Conference on Speech Commu-.

Predicting Word Pronunciation in Japanese
dictionary using the web; (2) Building a decoder for the task of pronunciation prediction, for which ... the word-pronunciation pairs harvested from unannotated text achieves over. 98% precision ... transliteration, letter-to-phone. 1 Introduction.

SMT
Aug 21, 2017 - processes and a world class facility. The company is ... medical devices, Internet of things, optical communication, automotive electronics and ...

Pronunciation as orphan: what can be done? - TESOL International ...
‗having received little or no specialist training in the teaching of pronunciation' (p 25). In 2001, Breitkreutz et al surveyed ... However, information available in phonology textbooks is not easy to make relevant to ..... prosody ratings for pron

Pronunciation as orphan: what can be done? - TESOL International ...
Phonetics and phonology courses were gradually dropped from many teacher training programs and pronunciation was, in general, covered briefly if at all.

HEADS: Headline Generation as Sequence Prediction Using an ...
May 31, 2015 - tistical models for headline generation, training of the models, and their ... lems suffered by traditional metrics for auto- matically evaluating the ...

Defect Prediction as a Multi-Objective Optimization ... - Gerardo Canfora
Defect prediction models aim at identifying likely defect-prone software components .... (ii) global models, and (iii) global models accounting for data specificity.

PHRASAL VERBS.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. PHRASAL ...

Dr (Smt) -
Jul 29, 2013 - I am to further inform that, Awards are proposed to be given to the deserving teachers & Teacher educators working under the categories at a State .... for recommendation the teachers for state Awards. 2. Criteria to be followed for se

Defect Prediction as a Multi-Objective Optimization ... - Gerardo Canfora
number of defects that the analysis would likely discover (effectiveness), and LOC to be analyzed/tested ... Defect prediction models aim at identifying likely defect-prone software components to prioritize ... often quite good in terms of the cost-e

NILAI SMT Genap FL (smt 6_2012) 2015 mtbs.pdf
Page 2 of 10. PROGRAMACIÓ TRIMESTRAL Escola del Mar, curs 2017-18. 5è. 2. SEGON TRIMESTRE. Numeració i càlcul. - Nombres decimals: part sencera i ...

Phrasal Verb Organiser.pdf
Phrasal Verb Organiser.pdf. Phrasal Verb Organiser.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Phrasal Verb Organiser.pdf. Page 1 of 144.

Phrasal Verb Organiser.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Phrasal Verb Organiser.pdf. Phrasal Verb Organiser.pdf. Open. Extract. Open with. Sign In. Main menu. Whoops

Phrasal Verb Stories.pdf
(they don't need to draw very well – stick figures will do.). Draw a line down. the middle of the board. Each team's artist will draw on a different half of the. board.

SMT-4032A_Datasheet.pdf
HDMI, DVI, VGA, and Component (CVBS Common) video input ... Stand (WxHxD) ... information / specification can be found at www.samsungsecurity.com.

Late Smt. Leelabai.pdf
Facts in brief are that the assessee was head of the family after the death of ... her unawareness about source of investment made in the said property and.

Case markers as clause boundary inducers in Japanese
May 7, 2002 - Nara Institute of Science and Technology. Graduate School of Information Science ... Local information allowing, the boundary between.

Topic 7: Japanese Macaque Society as a Complex ...
Valuable relationships, which are characterized at the level of social interaction by ..... Nakamichi M, Shizawa Y (2003) Distribution of grooming among adult ...

phrasal verb governmentadda.com.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. phrasal verb governmentadda.com.pdf. phrasal verb governmentadda.com.pdf. Open. Extract. Open with. Sign In.