Confusion Network Based System Combination for Chinese Translation Output: Word-Level or CharacterLevel?
Introduction Recently, confusion network based system combination has applied successfully to various machine translation tasks. Confusion network based system combination picks one hypothesis as the skeleton and aligns the other hypotheses against the skeleton to form a confusion network. The path with the highest score represents the consensus translation. Previous work on system combination most focus on combining translation outputs in Latin alphabet-based languages, in which sentences are already segmented into words sequences with white space before constructing the confusion network. 2
Introduction When combining Chinese translation outputs The first step is to segment the translation output into a sequence of words, An alternative is to split the translation output into characters, Both approach is possible. In this woks, we compare the translation performance of confusion network based system combination when the Chinese translation output is segmented into words versus characters.
3
Related work It is a long debating issue that which one, word or character, is the appropriate unit for Chinese NLP. J. Xu, et al. investigated CWS for Chinese-English phrasebased SMT, R. Zhang, et al. reported that the most accurate word segmentation is not the best word segmentation for SMT, P-C Chang, et al. optimized CWS granularity with respect to the SMT task, M. Li, et al. compared word-level metrics with characterlevel metrics, J. Du utilized a character-level strategy to improve translation quality for spoken language translation. 4
Confusion network based system combination for Chinese translation output IHMM monolingual hypothesis alignment approach is utilized to align the hypothesis to the skeleton. IHMM approach uses a similarity model and a distortion model to calculate the conditional probability that the hypothesis is generated by the skeleton. p(e'j | ei ) a psem (e'j | ei ) (1 a) psur (e'j | ei ) Given a source sentence: Pakistan cleric says would rather die than surrender And three translation hypotheses: 巴基斯坦称死不投诚 巴基斯坦说死不投诚 巴基斯坦说死于投诚 5
Confusion network based system combination for Chinese translation output We can construction a word-level and a characterlevel confusion network given the example.
6
Experimental Data We conducted experiments on two datasets The NIST'08 English-to-Chinese translation task. Contains 127 documents with 1,830 segments; 4 human reference translations; The best 7 submitted system outputs are chose to participate in system combination; 3-fold cross-validation. The IWSLT'08 English-to-Chinese CRR challenge task. The development set contained 757 segments and the test set contained 300 segments; 4 human reference translations; 7
Experimental Setting It has been reported that character-level automatic metrics correlate with human judgment better than word-level automatic metrics for Chinese translation evaluation. The system performance of Chinese translation output are measured with character-level metrics. Character-level BLEU, Character-level NIST, Character-level METEOR, Character-level GTM, Character-level TER 8
Experimental Setting Because better automatic evaluation metrics leading to better translation performance for parameters optimization. The feature weights of confusion network based combination system are tuned based on character-level BLEU score. We experimented with three different CWS tools ICTCLAS, Stanford Chinese word segmenter (STANFORD), Urheen.
9
Results on NIST’08 EC Tasks The submitted outputs of 7 systems are combined: System 01, system 03, system 17, system 18, system 24, system 28, and system 31. Words are not demarcated in the system outputs, we divide the output into words by different CWS tools or characters to facilitate hypothesis alignment before combining the outputs.
10
Results on NIST’08 EC Tasks The "Character" row shows the translation performance after the system outputs are split into characters. The "ICTCLAS", "STANFORD", and "Urheen" rows show the scores when the system outputs are segmented into words by the respective CWS tools. Experimental results given in Table 1 show that the characterlevel combination system significantly improves the translation performance (p < 0.01).
11
12
Results on IWSLT’08 EC CRR challenge Tasks We segment the Chinese sentences in bilingual training data into word sequences, and train several English-to-Chinese SMT systems to decode the development set and test set. JoshuaICTCLAS represent the Joshua system that Chinese sentences in the training data have been segmented into words by ICTCLAS tools, thus the outputs to be combined can be seemed to have been segmented into words by ICTCLAS tools. JoshuaSTANFORD represent the Joshua system that Chinese sentences in the training data have been segmented into words by STANFORD tool.
13
Results on IWSLT’08 EC CRR challenge Tasks Because the outputs to be combined have been segmented into words with different granularity, we must consistently resegment the outputs into words or characters before system combination. The "ICTCLAS", and "STANFORD" rows show the scores when the system outputs are re-segmented into words by the respective Chinese word segmenters. The experimental results in Table 2 show when translation outputs to be combined are with different word granularity: The character-level combination system significantly improves the translation performance. 14
15
Results on IWSLT’08 EC CRR challenge Tasks When the outputs to be combined have been segmented into words by the same CWS tool ICTCLAS, we combined the output generated by two SMT systems: MosesICTCLAS, JoshuaICTCLAS. Table 3 shows the character-level combination system still consistently outperforms the word-level combination system, “ICTCLAS”, even though the translation outputs to be combined are with the same word granularity.
16
17
Conclusion and discussion We conducted a study of character-level versus word-level confusion network based system combination for Chinese translation output. The experimental results show that character-level combination system significantly outperforms word-level combination systems.
18
Conclusion and discussion Reasons: Chinese sentences can be split into characters with perfect accuracy; however, there is not a CWS tool to perform 100% yet. Therefore, outputs can be segmented into characters more consistently. which lead to generate high quality monolingual hypothesis alignment to help construct confusion network. Chinese character is a smaller unit than Chinese word (containing at least one character) for constructing confusion network. Thus, character-level approach has more choice to produce better consensus translation. 19
Thanks! 20