A Synthesis Instance Pruning Approach Based on ... - Semantic Scholar

Viewer
Transcript

TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1 0 0 7 - 0 2 1 4 1 4 / 2 2 pp5 1 5 -5 2 1 Volume 13, Number 4, August 2008

A Synthesis Instance Pruning Approach Based on Virtual Non-uniform Replacements* ZHANG Wei (张巍)1,2,**, LING Zhenhua (凌震华)2, HU Guoping (胡国平)3, WANG Renhua (王仁华)2 1. Department of Computer Science, Ocean University of China, Qingdao 266100, China; 2. Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei 230027, China; 3. Anhui USTC Iflytek Co., Ltd., Hefei 230088, China Abstract: The employment of non-uniform processes assists greatly in the corpus-based text-to-speech (TTS) system to synthesize natural speech. However, tailoring a TTS voice font, or pruning redundant synthesis instances, usually results in loss of non-uniform synthesis instances. In order to solve this problem, we propose the concept of virtual non-uniform instances. According to this concept and the synthesis frequency of each instance, the algorithm named StaRp-VPA is constructed to make up for the loss of nonuniform instances. In experimental testing, the naturalness scored by the mean opinion score (MOS) remains almost unchanged when less than 50% instances are pruned, and the MOS is only slightly degraded for reduction rates above 50%. The test results show that the algorithm StaRp-VPA is effective. Key words: text-to-speech system; speech synthesis; synthesis instance pruning; non-uniform unit

Introduction Corpus-based approaches (selection-based approaches) have been successfully applied to many state-of-the-art text-to-speech (TTS) systems[1-4]. These approaches can generate highly natural speech due to the utilization not only of digital signal process techniques but also of data-driven techniques from knowledge discovery and data mining. Here, we consider the basic unit chosen for synthesis to be a syllable in Mandarin or Cantonese. During synthesis, proper syllables are selected from a very large speech database using the Viterbi[5] algorithm. In the database, all recorded speech is indexed by trees,

named non-uniform units. A non-uniform unit includes one syllable or several succeeding syllables. Acoustic instances (variants or voice fonts) belonging to the same non-uniform units are indexed to a tree according to their prosody, phonetic, and acoustic context. The tree, named a non-uniform tree, is constructed by clustering (generally using the CART[6] approach for clustering) instances based on questions concerning prosody, phonetic, and acoustic context. Figure 1 gives an example of a non-uniform tree.

Received: 2007-09-10; revised: 2008-03-02

* Supported by the National Natural Science Foundation of China (No. 60602017)

** To whom correspondence should be addressed. E-mail: [email protected] Webpage: http://cheung.colin.googlepages.com

Fig. 1

Non-uniform tree

With corpus-based TTS systems, speech synthesis becomes a problem of collecting, annotating, indexing,

516

and retrieving from a very large speech database[1-4]. In order to synthesize natural-sounding speech, several or even tens of hours of speech waveforms are required from a diverse text input. Thus, storing, loading, and searching such a huge corpus becomes a major issue in many applications. Because of this, corpus-based TTS usually requires high performance hardware to synthesize natural-sounding speech. Approaches are therefore sought that retain the natural quality of corpus-based TTS but allow shrinkage of the speech database, such that the corpus-based TTS can be more flexible and scalable for use with all kinds of hardware. One approach is called pruning redundant synthesis instances, or tailoring of the TTS voice font. There is always some redundancy in the speech database. For example, some instances are almost never used for synthesis, while some instances can be replaced by others. Several approaches for reducing redundant synthesis instances have been proposed. The approach described in Black and Taylor[7] clusters similar units (diphones) with a decision tree that asks questions concerning the prosodic and phonetic context. Units that are the furthest from the cluster center are pruned. It is claimed that pruning up to 50% of the units produced no serious degradation in speech quality. The method proposed in Hon et al.[8] is based on a unified hidden Markov model (HMM) framework. Only instances (single or multiple) with the highest HMM scores are kept to represent a cluster of similar ones. Kim et al. presented a weighted vector quantization (WVQ) method that prunes the least important instances[9,10] and a 50% reduction rate is reached without significant distortions. In another paper[11], Rutten et al. proposed a database reduction technique based on the statistical behavior of unit selection. They claimed that pruning the database down to 50% of its original size was possible without a significant drop in the output speech quality. Zhao et al.[12] proposed a prosodic outlier criterion, an importance criterion, and a combination of the two, and pruned voice fonts with these criteria. They reported that the naturalness remained almost unchanged when 50% of instances were pruned using the combined criteria. We have done research on the clustering-based synthesis-instances-pruning approach of embedded systems[13]. Non-uniform instances is a very important concept

Tsinghua Science and Technology, August 2008, 13(4): 515-521

in corpus-based TTS. Non-uniform instances of different granularity increase the matching between the texts being synthesized and the corpus of speech database. With regard to speech naturalness, the quality of nonuniform instances determines the TTS system performance. Almost all state-of-the-art corpus-based TTS systems therefore utilize a non-uniform technique. However, tailoring of the TTS voice font, pruning redundant synthesis instances, usually results in loss of nonuniform instances. None of the pruning methods mentioned above addressed this problem. In order to solve this problem, we propose the concept of virtual non-uniform instances. According to this concept and the synthesis frequency of each instance, an algorithm, StaRp-VPA, is constructed on the KBCE (the prototype TTS systems of China Iflytek Co., Ltd. (www.iflytek.com), with commercial TTS system named interphonic) TTS system to make up for the loss of non-uniform instances.

1

Problem Investigation and Virtual Non-uniform Instances

The redundancy of a speech database can only exist in units (instances index-trees) or acoustic instances. Generally speaking, in Mandarin or Cantonese, units are those words or succeeding syllables that frequently appear[3]. Pruning a unit means loss of some prosody and of the phonetic environment. Thus, there is little chance for redundant units. In practice, redundancy usually comes from redundant acoustic instances. Some redundant instances are only rarely selected by the TTS system, so these instances can be pruned. Some redundant instances are very similar to other instances and so can also be pruned. The purpose of this paper is to investigate how to remove redundant instances from a speech database automatically and flexibly. There are two problems concerning the pruning of redundant instances: (1) How many instances of a unit are redundant? (2) In a unit, which instances are redundant? In other words, it is necessary to determine the importance of instances that belong to the same unit. For the first problem, because most state-of-art TTS systems adopt the framework of Liu[3] or something similar, it can be considered that more instances in a

ZHANG Wei (张巍) et al：A Synthesis Instance Pruning Approach Based on ...

unit mean more redundancy. Thus, we propose an approach named vibration-rate pruning. The total reserving rate is kept as a configuration, and the instances reserving rate of each unit are tuned according to their instances frequency (more instances means a smaller reserving rate, and those too-small reserving rates are properly compensated). Two quantities can be used to measure the importance of an instance: (i) the frequency of an instance selected by the TTS (frequently selected instance must be reserved); (ii) the capability of an instance to replace other instances (instances that can replace more instances are more important, and the replacement should include different granular non-uniform instances). The importance measure is a function of these two quantities, named in our study as the instanceimportance-score (IIS) function. For problem (2), we arrange the instances of the same unit according to their IIS value, instances with the smallest IIS values are redundant instances. Based on the discussion above, there are four key points that require further consideration. Point 1 Reserve rate computation of vibrationrate pruning Suppose that we want to prune the speech database to β (0<β ≤1). From the analysis of problem (1) above, a different unit i needs a different reserve rate gi (pruning rate ti=1– gi), where the total reserve rate is equal to β. gi is computed in the following way: Suppose that the instances belonging to unit i are pi of the total instances, I

∑pg i =1

Let pigi =β / I, so

i

i

=β.

gi = β / (Ipi )

(1)

Equation (1) shows that gi is in inverse proportion to pi. This means that more instances result in a smaller reserve rate, which is consistent with the discussion above. For those units whose β satisfy β / (Ipi )>1 , their

gi =1. The remnant is given by Eq. (2): I ⎛β ⎞ ⎛β ⎞ ⎛β ⎞ σ = ∑ ⎜ − pi gi ⎟ = ∑ ⎜ − pi ⎟ + ∑ ⎜ − pi gi ⎟ (2) I I I ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ i =1 gi =1 gi <1 The expectation probability Efri describes the probability of unit i appearing in a text. The value of Efri can be estimated by statistical computation from a large corpus (in this paper, we used more than 300 MB

517

of text to estimate each Efri). The current frequency ratio Sfri describes, during each iteration, the ratio of unit i ’s frequency to the frequency of all units. The value of Sfri can be accurately computed by counting the number of instances in the current speech database (Sfri is updated in each iteration of the reserve rate computation). The parameter xi = Efri / Sfri describes the gap between Sfri and Efri, and xi is used for reserve rate compensation in Eq. (3) to all gi <1: x x β Gi = gi + σ i = +σ i (3) x Ip ∑ i i ∑ xi gi <1

gi <1

If Gi ≤1, gi =Gi; if Gi >1, gi =1. The process of compensation is iteratative and terminates when the reserve rates are less than or equal to 1. Reserve rate compensation is required to prevent over-pruning those instances that belong to a unit including many instances. Compensation helps to preserve the prosody and phonetic integrity of the original speech database. Point 2 Virtual non-uniform instance and IIS There are two parameters related to IIS: usage and replacing score. The usage of instance L is defined as the loss of dynamic coverage after deleting the unit’s leaf that instance L belongs to. Suppose that the coverage (see Liu[3] for computation detail) before deleting is A0L, and AL after deleting. The usage of instance L is, therefore, α L = ( A0 L − AL ) / A0 L . The usage of instance parameter weights the importance in the corpus. If the prosody and phonetic environment are the same, the usages of different instances are the same. Otherwise, the usages are usually different. Pruning of redundant synthesis instances usually results in loss of non-uniform instances. In order to solve this problem, we introduce the concept of virtual nonuniform instances in the following content. Imagine that we remove a given instance from a TTS system, and we then let the TTS system selects a replacement of this instance using a measure or an algorithm, such as Viterbi, acoustic distance, and trainable approach (the instance itself is named replacement R0). This replacement is named the 1st replacement R1 of R0. R1 is not a real non-uniform instance; it is the best replacement of non-uniform instance R0. In order to select the best one, the selection only

Tsinghua Science and Technology, August 2008, 13(4): 515-521

518

happens in instances with close fidelity to the original prosodic and phonetic environment of R0. We name this process speech completion. In a similar way, we can now remove R0 and R1, and let the TTS system select the 2nd replacement R2 of R0. Generally, we remove R0, …, RN–1, and the TTS system selects the N th replacement RN. Replacement Ri (0 < i < N) is named the virtual non-uniform instance of R0. The measure or algorithm (in this paper, we use the Viterbi algorithm) that TTS uses to select the replacement gives each RK a cost QK. The cost just describes the difference between the virtual non-uniform instance and the real non-uniform instance, and satisfies monotonicity: Q0=0, QK–1≤QK. The score of each replacement is ⎛ Q2 ⎞ M K = exp ⎜ − K ⎟ . ⎝ σ ⎠ σ, named the width, is used to control the response range of QK. For the original non-uniform instance R0, M0=1. Because QK is monotonic, MK satisfies monotonocity: MK≤MK = 1<1 = M0 . Pruning of synthesis instances is ultimately pruning of redundant syllable instances, and thus we should add the score of each replacement to the syllable instances that compose those replacements. For example, a replacement V1V2V3 (suppose the score of V1V2V3 is MF) is composed of V1, V2, and V3. We should add to each of V1, V2, and V3 the score αF MF . αF is the usage of the original real non-uniform instance V1V2V3. The IIS of a syllable instance m is Sm =

N

J

∑ η mark

j. Lm

j

j

, mark j = ∑ Fn , Fn = α n M n . n =0

Here, we consider the syllable number of an instance as the length of the instance. Mn is the score of the instance n; instance n is a length-j replacement (real or virtual non-uniform instance) that is composed by instance m. The value αn is the usage of instance n, and Fn is the sum of the weighted replacement score. If we consider αn as a weight, markj is the weighted sum of all length-j replacements that are composed by instance m. Here, Lm is the syllable number of instance m. The IIS of instance m is the weighted sum of all different length replacements that are composed of instances m. Here, the weight is ηj (usually ηj is 1), as required to keep a balance of different instance lengths.

From the above, we can see that IIS is a measure of replacement ability of each instance. In calculating the IIS, we take into account how frequently each instance is selected by the TTS system. The frequency is demonstrated by the usage of each instance. If we consider the length (syllable number) of instance as granularity, we also take into account all kinds of different granular replacements in determining IIS. Point 3 Non-uniform instances adjustment When a syllable instance is pruned, all those instances that are composed by that syllable instance are also pruned passively. Thus, each instance should record the information of N replacements. If a onesyllable instance, comprising the replacement RK of R0, is pruned, the replacement RK is pruned passively. Replacement RL with the least L in the reserved replacements is the final virtual non-uniform replacement of R0. This adjustment reduces loss of non-uniform instance: those R0s with high IIS value are reserved. In this situation, RL=R0, and there is no loss of nonuniform instance. If R0 is pruned, R0 is replaced by a virtual non-uniform instance RL . Because RL is worse than R0 , there is a small loss of non-uniform instance. Only in the situation that all replacements are pruned are the non-uniform instances thoroughly lost. The value of N is therefore very important. If N is too small, all replacements of a majority of instances are probably pruned. If N is too large, the computation will be time and storage intensive. In this paper, we use N=5. Point 4 Associated scoring elimination A problem arises as we score each instance. If two instances can be replaced by each other, there is a chance that they will be scored twice. For example, if V1 and V2 can replace each other, V2 is scored when scoring all replacements of V1 , and V1 is scored when scoring all replacements of V2. Although only one of V1 and V2 should be reserved, both V1 and V2 may be reserved because of this double scoring process. This problem is called associated scoring. We propose the following process named associated-scoring elimination (ASE) to eliminate associated scoring. For each instance V there are two structures: V.REL, composed by the replacements that can replace V, and V.RIL, composed by the replacements that V can replace. The IIS of V can thus be expressed by SV = ∑ score R . R ∈ V .RIL

ZHANG Wei (张巍) et al：A Synthesis Instance Pruning Approach Based on ...

We arrange the instances from high to low according to their IIS value. Suppose that the arrangement is V1, V2, …, VH , and only k (k < N) can be reserved according to the reserve rate. First, we reserve instances V1 because of its highest IIS value: ∀RX∈V1.REL, RX∈V1.RELÙV1∈RX.RIL, thus we remove V1 from RX.RIL. Secondly, we adjust the IIS value of RX (RX =V2, …, VH): S RX = S RX − scoreV1 . Finally, we rearrange the remaining instances V2, …, VH , and reserve the instance with the highest IIS value. Then this process is repeated. If V2 and V1 can be replaced by each other, there are two possibilities: (a) V1∈V2.RELÙV2∈V1.RIL and (b) V2∈V1.RELÙV1∈V2.RIL. However, if ASE removes V1 from V2.RIL, possibility (b) is not valid. Thus, V1 and V2 are not scored repetitively, and the associate scoring is eliminated. These steps are shown in Fig. 2. By using ASE, pruning of synthesis instances is deduced to a problem of graph theory: we construct an edge-weighted directed graph. Then beginning with the vertex of maximum outgoing degree, we search the vertices with not only the maximum degree of outgoing but also minimum degree of incoming[14].

Step 1

519

Computing the instance reserve rate of each unit

Input: the overall reserve rate of the speech database Output: the instance reserve rate of each unit U, namely Reserve_rate (U) The computation of Step 1 is adjusted according to Eqs. (1) to (3). Step 2

IIS scoring of instances

Input: each Reserve_rate (U), and other frequency information of the instances Output: information about the reserved instances, replaced instances, and pruned instances The process of Step 2 can be described as follows: For L=max length to 1, execute (1) and (2): (1) ∀V, Length(V)=L, execute (1.1) and (1.2): (1.1) V.REL={R0, R1, …, RN}, scores=F0, F1, …, FN. (1.2) ∀W∈V.REL, W.RIL={V}∪W.RIL. (2) ∀U, execute from (2.1) to (2.2): (2.1) Reserve_Num=Number (U)×Reserve_rate (V) (2.2) ∀V∈U, execute (2.2.1)： (2.2.1) For i=1 to Reserve_Num, Execute ASE (2.2.2) Tailor all variants remaining after ASE. Note: L is the syllable number of each instance; its maximum is max length. V=Variant, represents each instance; U=Unit, represents units. Number (U) is the number of instances that unit U includes. Step 3

Adjustment of the speech database

Input: information on the reserved instances, replaced instances, and pruned instances, and the original speech database Output: the speech database after pruning (1) For reserved instances, do nothing; (2) For replaced instances, replace the instance with its virtual non-uniform replacement; (3) For pruned instance, delete all information connected with it from the speech database.

3 Objective and Subjective Evaluation 3.1 Fig. 2

Associated-scoring elimination

2 StaRp-VPA Algorithm There are three main steps of statistics and replacingbased variant pruning algorithm (StaRp-VPA):

Objective evaluation

Because the purpose of StaRp-VPA is to make up for the loss of non-uniform instances, it is natural to evaluate the results of pruning by StaRp-VPA by examining the distribution of non-uniform instances (including syllables) after pruning. The objective measurements here are the proportion between the number

Tsinghua Science and Technology, August 2008, 13(4): 515-521

520

of reserved non-uniform instances and the number of original non-uniform instances (rONU), the proportion between the number of virtual non-uniform instances and the number of original non-uniform instances (rVNU), the proportion between the number of pruned non-uniform instances and the number of original nonuniform instances (rTNU), and the values of λO and λOV (described subsequently). The distributions of nonuniform instances under different reserve rates are shown in Table 1. Here, β is the reserve rate, the pruning rate of the speech database is therefore 1– β, and r r +r λO = ONU , λOV = ONU VNU .

β

Table 1

β/% 73 62 50 30 10

β

Distributions of non-uniform instances

rONU / % 62.53 47.80 33.98 15.93 4.36

rVNU / % 36.43 48.85 56.44 48.22 19.15

rTNU / % 1.24 3.35 9.58 35.85 76.49

λO

λOV

0.86 0.77 0.68 0.53 0.43

1.36 1.56 1.81 2.14 2.35

The parameters λO and λOV describe the effect of changing the reserve rate on the loss of non-uniform instances. Generally speaking, there is a certain loss of non-uniform instances when the speech database is pruned. In fact, the theoretically optimality β=rONU is impossible. The reason is explained as follows: For a syllable instance V, there are LV instances of different lengths composed by V. When V is pruned, those LV original instances are still pruned. Different V means different LV value. If an instance V with large LV is pruned, rONU<<β. From Table 1 and Fig. 3, λO descends slowly when β descends. This demonstrates that StaRp-VPA tends to reserve those syllable instances with large value of LV . The loss of nonuniform instances is in some degree made up by virtual non-uniform instances, as λOV shown in Table 1 and Fig. 3 (usually, λOV>1).

Fig. 3 λO and λOV with different β

3.2

Subjective evaluation

We have used two kinds of texts to perform a listening test to evaluate the effect of StaRp-VPA on synthesis quality. The subjective measurement chosen is the mean opinion score (MOS). Text 1 includes 150 sentences, automatically gathered from a large corpus using the approach described by Liu[3]. The first 100 sentences of Text 1 are of high coverage, while the last 50 sentences are of low coverage. Five formal listeners (A1, A2, A3, A4, and A5) performed a listening test on the KBCE speech database with different reserve rates. The MOS value of the first 100 and the last 50 sentences are shown in Tables 2 and 3. The figures demonstrate that MOS degrades quite slowly when the pruning rate increases. Even when the pruned speech database is pruned to 30% of its original size, the MOS is only degraded by 0.07. In particular, the MOS of 73% is even higher than that of the original for the latter 50 sentences. Table 2

MOS of the first 100 sentences in Text 1

β/%

A1

A2

A3

A4

A5

MOS

30

3.83

3.63

3.60

3.51

3.77

3.668

50

3.85

3.69

3.61

3.52

3.83

3.700

62

3.85

3.69

3.62

3.54

3.86

3.712

73

3.88

3.71

3.65

3.54

3.88

3.732

100

3.87

3.70

3.64

3.54

3.93

3.736

Table 3

MOS of the last 50 sentences in Text 1

β/%

A1

A2

A3

A4

A5

MOS

30

3.83

3.69

3.51

3.40

3.72

3.630

50

3.86

3.76

3.57

3.44

3.78

3.682

62

3.86

3.79

3.55

3.43

3.83

3.692

73

3.85

3.79

3.58

3.42

3.85

3.698

100

3.85

3.77

3.58

3.45

3.83

3.696

Text 2 includes 100 sentences, automatically gathered from the Internet also using the approach in Liu[3]. A different group of five formal listeners performed a listening test on the KBCE speech database with different reserve rates. The MOS of Text 2 are shown in Table 4. Five formal listeners (B1, B2, B3, B4, and B5) performed a listening test on the KBCE speech database with different reserve rates. Table 4 demonstrates that the MOS of Text 2 is still only degraded slightly, though the degradation is greater than for Text 1. We were able to prune the speech database to 10% of origin with a MOS degradation of only 0.22.

ZHANG Wei (张巍) et al：A Synthesis Instance Pruning Approach Based on ... Table 4

MOS of Text 2

521

[3] Liu Q F. Speech synthesis study based on perception quan-

β/%

B1

B2

B3

B4

B5

MOS

tification [Dissertation]. Hefei, China: University of Sci-

10

3.50

3.83

3.45

3.73

3.09

3.520

ence and Technology of China, 2003. (in Chinese)

30

3.61

3.87

3.57

3.90

3.26

3.642

[4] Chu M, Peng H, Yang H, Chang E. Selection non-uniform

50

3.60

3.89

3.62

3.89

3.30

3.660

units from a very large corpus for concatenative speech

73

3.72

3.90

3.69

4.00

3.32

3.726

synthesizer. In: Proceedings of ICASSP2001. Salt Lake

100

3.69

3.93

3.73

4.00

3.35

3.740

From Tables 2-4, it can be seen that the MOS descends slowly when the reserve rate is lowered. For reserve rates above 50%, the MOS is almost unchanged. Even when the reserve rate is under 50%, the MOS is only slightly degraded.

City, USA, 2001. [5] Rabiner L R. A tutorial on hidden Markov models and selected application in speech recognition. Proc. IEEE, 1989, 77(2): 257-285. [6] Breiman L, Friedman J, Olsen R, Stone C. Classification and Regression Trees. Pacific Grove, CA: Wadsworth & Brooks, 1984.

4

Discussion and Conclusions

[7] Black A W, Taylor P A. Automatically clustering similar units for units selection in speech synthesis. In: Proceed-

In both of the listening tests mentioned above, no severe degradation of the MOS is seen. There are three possible reasons: (1) Because of the specific StaRpVPA mechanism, the reserved instances are able to replace others and frequently selected by the TTS system; (2) The use of virtual non-uniform instances, in some sense, makes up for the loss of replaced non-uniform instances; (3) Vibration-rate pruning preserves most prosody and phonetic environments, and only unimportant instances are pruned. In this paper, we have proposed the concept of virtual non-uniform instances to make up for the loss of non-uniform instances. Based on this idea and the usage of instances, we have constructed an algorithm (StaRp-VPA) and used this algorithm to prune the KBCE speech database to different sizes. In listening tests, the naturalness, as scored by the MOS, remains almost unchanged when less than 50% instances are pruned, and the MOS value is not severely degraded even for reduction rates above 50%. References

ings of Eurospeech 1997. Rhodes, Greece, 1997, 2: 601604. [8] Hon H, Acero A, Huang X, Liu J, Plumpe M. Automatic generation of synthesis units for trainable text-to-speech systems. In: Proceedings of ICASSP 1998. Seattle, USA, 1998, 1: 293-296. [9] Kim S H, Lee Y L, Hirose K. Pruning of redundant synthesis instances based on weight vector quantization. In: Proceedings of Eurospeech 2001. Aalborg, Denmark, 2001: 2231-2234. [10] Kim S H, Lee Y L, Hirose K. Unit generation based on phrase break strength and pruing for corpus-based text-tospeech. ETRI Journal, 2001, 23(4): 168-176. [11] Rutten P, Aylett M, Fackrell J, Taylor P. A statistically motivated database pruning technique for unit selection synthesis. In: Proceedings of ICSLP2002. Denver, USA, 2002: 125-128. [12] Zhao Y, Chu M, Peng H, Chang E. Custom-tailoring TTS voice font-keeping the naturalness when reducing database size. In: Proceedings of Eurospeech 2003. Geneva, Switzerland, 2003: 2957-2960.

[1] Hunt A, Black A. Unit selection in a concatenative speech

[13] Ling Z H, Hu Y, Shuang Z W, Wang R H. Compression of

synthesis system using a large speech database. In:

speech database by feature separation and pattern cluster-

Proceedings of ICASSP1996. Atlanta, USA, 1996, 1: 373-

ing using STRAIGHT. In: Proceeding of ICSLP2004. Jeju

376. [2] Sagisaka Y, Kaiki N, Iwahashi N, Mimura K. ATR-vTALK speech synthesis system. In: Proceedings of ICSLP1992, 1992, 1: 483-486.

Island, Korea, 2004: 766-769. [14] Bondy J A, Murty U S R. Graph Theory with Application. New York: American Elsevier, 1976.

MCGP: A Software Synthesis Tool Based on Model ... - Semantic Scholar

A Reuse-Based Approach to Determining Security ... - Semantic Scholar

Online Video Recommendation Based on ... - Semantic Scholar

a new color image cryptosystem based on a ... - Semantic Scholar

Language Recognition Based on Score ... - Semantic Scholar

A Network Pruning Based Approach for Subset-Specific ...

Error Correction on a Tree: An Instanton Approach - Semantic Scholar

a multimodal search engine based on rich unified ... - Semantic Scholar

A sensitivity-based approach for pruning architecture of ...

A Bidirectional Transformation Approach towards ... - Semantic Scholar

On Robust Key Agreement Based on Public Key ... - Semantic Scholar

A Bidirectional Transformation Approach towards ... - Semantic Scholar

An Agent-based Approach to Health Care ... - Semantic Scholar

Object Instance Search in Videos via Spatio ... - Semantic Scholar

Mixin-based Inheritance - Semantic Scholar

Total synthesis of seco-lateriflorone - Semantic Scholar

On Knowledge - Semantic Scholar

Field-Effect Tunneling Transistor Based on Vertical ... - Semantic Scholar