Discriminative Pruning for Discriminative ITG Alignment Shujie Liu, Chi-Ho Li and Ming Zhou

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

2

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

7/12/2010

财政

accountable



to

the

负责

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

3

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

7/12/2010

财政

accountable



to

the

负责

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

4

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

the

负责

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

5

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

the

负责

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules

7/12/2010

Structural Rules

Shujie Liu, Chi-Ho Li and Ming Zhou

6

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules

Structural Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

7

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

X X→[XX]

X

X

X

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

8

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

x

Lexical Rules C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

X X→[XX]

X→

X

X

X

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

9

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

the

负责

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

x

x X X

X

X

X

X

X

X

Ɛ

负责



Ɛ

财政



Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

be

accountable

to

the

Financial

Secretary

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

10

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

B

Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]

B→ B→ B→ B→ B→ B→

A

A

A

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

11

Why Pruning 

ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009).

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

12

Why Pruning 



ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

13

Why Pruning 



ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]

for each F-span [s, t] for each E-span [u,v]

[s/u,S/U ]

s/u

7/12/2010

n2

n2

try to find an optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).

[S/U,t/v]

S/U

n2

t/v

Shujie Liu, Chi-Ho Li and Ming Zhou

14

Why Pruning 



ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]

for each F-span [s, t] for each E-span [u,v]

[s/u,S/U ]

s/u  

n2

n2

try to find a optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).

[S/U,t/v]

S/U

n2

t/v

The complexity for ITG parsing without pruning is O(n6) Take more than 1 hour to parse one sentence pair longer than 60

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

15

Why Pruning 



ITG has achieved state of the art results against gold standard alignments (Haghighi et al., 2009). Speed is a major obstacle in ITG parsing [s/u,t/v]

for each F-span [s, t] for each E-span [u,v]

[s/u,S/U ]

s/u  



n2

n2

try to find a optimal point pair [S,U] to split the span pair[s/u, t/v] into two small span pairs ([s/u, S/U],[S/U, t/v]).

[S/U,t/v]

S/U

n2

t/v

The complexity for ITG parsing without pruning is O(n6) Take more than 1 hour to parse one sentence pair longer than 60

Pruning is necessary

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

16

Three Kinds of Pruning 

discard F-spans and/or E-spans.  

discards too many span pairs (empirically) highly harmful to alignment performance

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

17

Three Kinds of Pruning 

discard F-spans and/or E-spans.  



discards too many span pairs (empirically) highly harmful to alignment performance

discard some alignment for a span pair.  

= minimizing the beam size of each span pair i.e. K-Best parsing

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

18

Three Kinds of Pruning 

discard F-spans and/or E-spans.  



discard some alignment for a span pair.  



discards too many span pairs (empirically) highly harmful to alignment performance = minimizing the beam size of each span pair i.e. K-Best parsing

discard some unpromising span pairs.  

i.e. limit E-spans per F-span It’s what our research is about.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

19

Related Work 

Tic-tac-toe pruning (Zhang and Gildea, 2005) 

Inside and outside scores to prune candidate E-spans for each F-span

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

20

Related Work 

Tic-tac-toe pruning (Zhang and Gildea, 2005) 



Inside and outside scores to prune candidate E-spans for each F-span

Tree constraints pruning (Cherry and Lin, 2006) invalid spans = spans interrupting the phrases of dependency tree i.e. [x1, j] and [j,x2]

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

21

Related Work 

Tic-tac-toe pruning (Zhang and Gildea, 2005) 



Inside and outside scores to prune candidate E-spans for each F-span

Tree constraints pruning (Cherry and Lin, 2006) invalid spans = spans interrupting the phrases of dependency tree i.e. [x1, j] and [j,x2]



High-precision alignments pruning (Haghighi et al., 2009) 



Prune all bitext cells that would invalidate more than 8 of high-precision alignments

1-1 alignment posterior pruning (Haghighi et al., 2009) 

Prune all 1-1 bitext cells that have a posterior below 10-4 in both HMM Models

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

22

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

23

Linear Model 

As all these techniques have certain contribution in making good pruning decision, we try to incorporate all these features in ITG pruning.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

24

Linear Model 



As all these techniques have certain contribution in making good pruning decision, we try to incorporate all these features in ITG pruning. DPDI: Discriminative Pruning for Discriminative ITG

P (e | f ) 

exp(  (e , f )) ' exp(    ( e , f )) 

e ' T

λ: Feature weights

7/12/2010

ᴪ : Features

Shujie Liu, Chi-Ho Li and Ming Zhou

25

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

26

Training Sample Extraction 

Training samples? 

consist of various F-spans and their corresponding E-spans. 书 the book the book 就会 is to 书 就会 the book is to 书 就会 来 the book is to come …….

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

27

Training Sample Extraction 

Training samples?  

consist of various F-spans and their corresponding E-spans. extracted from word alignment annotation

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

28

Training Sample Extraction 

Training samples?  



consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation





就会

the

book

is

7/12/2010

来 的

to

come



Shujie Liu, Chi-Ho Li and Ming Zhou

29

Training Sample Extraction 

Training samples?  



consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation





就会

the

book

is

来 的

to

come



ITG Constraints







就会

the

book

is

7/12/2010

来 的

to

come



Shujie Liu, Chi-Ho Li and Ming Zhou

30

Training Sample Extraction 

Training samples?  



consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation





就会

the

book

is

来 的

to

come

书 就会 来 的 the book is to come 书 就会 来 the book is to come



ITG Constraints



the



book

就会

is

书 就会 the book is to

来 的

to

come

the book 书 the book



7/12/2010

Ɛ the



就会 is to

书 book

Shujie Liu, Chi-Ho Li and Ming Zhou

Ɛ is

就会 to

来 come

的 Ɛ



31

Training Sample Extraction 

Training samples?  

consist of various F-spans and their corresponding E-spans. extracted out of word alignment annotation 书 就会 来 的 the book is to come

书 the book the book 就会 is to 书 就会 the book is to 书 就会 来 the book is to come

书 就会 来 the book is to come 书 就会 the book is to the book 书 the book

Ɛ the

7/12/2010

书 book



就会 is to

……. Ɛ is

就会 to

来 come

的 Ɛ



Shujie Liu, Chi-Ho Li and Ming Zhou

32

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

33

Loss Function M

loss ( rs , eˆ ( f s ; 1 ))





 rank( rs ) penalty

if rs  eˆ ( f s ;1M ) otherwise

fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-Best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000).

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

34

Loss Function M

loss ( rs , eˆ ( f s ; 1 ))





 rank( rs ) penalty

if rs  eˆ ( f s ;1M ) otherwise

fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000). loss: -1

[1,3]

[1,2]

[2,3]

[2,2]

F-span

0

1

2

7/12/2010

….



….

10

Shujie Liu, Chi-Ho Li and Ming Zhou

….



….

index 35

Loss Function M

loss ( rs , eˆ ( f s ; 1 ))





 rank( rs ) penalty

if rs  eˆ ( f s ;1M ) otherwise

fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000).

loss: -100,000

[1,7]

[1,8]

[1,9]

[2,7]

F-span

0

1

2

7/12/2010

…. …

…. 10

Shujie Liu, Chi-Ho Li and Ming Zhou

[1,10]

….



index 36

Loss Function M

loss ( rs , eˆ ( f s ; 1 ))





 rank( rs ) penalty

if rs  eˆ ( f s ;1M ) otherwise Rationale : keep as many correct E-spans as possible in the N-best lists, and push the correct E-spans upward as much as possible

fs : F-span, rs : correct E-span M M eˆ ( f s ; 1 ): the N-best list given fs and 1 rank (rs ) : is the rank of rs in the N-best list penalty: If rs is not in the N-best list at all, then the loss is defined to be penalty(-100,000). loss: -1

[1,3]

[1,2]

[2,3]

[2,2]

….

….

….

…. loss: -100,000

[1,7]

[1,8]

[1,9]

[2,7]

F-span

0

1

2

7/12/2010

…. …

…. 10

Shujie Liu, Chi-Ho Li and Ming Zhou

[1,10]

….



index 37

MERT: Minimum Error Rate Training 

Training method is much similar with MERT for SMT 









An important part of MERT for SMT is a linear search, which is a search for a best point given a fixed dimension. Bleu score are changed while the best candidate changes The changed best candidates form the upper envelope (red curved line) The changed points are interval boundaries (green points) Finding the interval boundaries are important for Normal MERT

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

Normal MERT

38

MERT: Minimum Error Rate Training 



Training method is much similar with MERT for SMT Difference: 

Instead of finding the interval boundaries at which the optimal candidate changes, we will find the interval boundaries at which the index of the correct result changes.

Normal MERT

interval boundaries : the red points, which are the intersections between the correct E-span and all other candidate E-spans.

golden

Modified MERT 7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

39

MERT: Minimum Error Rate Training score 



Training method is much similar with MERT for SMT Difference: 



Instead of finding the interval boundaries at which the optimal candidate changes, we will find the interval boundaries at which the index of the correct result changes. And the performance gain will be calculated between loss before the interval boundaries and loss after that

golden wi index -8 -9

index

loss

1

-10

-1

+1 -1

wi loss -8 -9

boundaries

+1

-10

+1

N = 10

-1

-99991 -99991 +99991

wi

-100000

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

Modified MERT

40

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

41

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



Alignment invalid count Ratio 



2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

|(j-i) /(m-l)-1.15|

Position Ratio 

7/12/2010

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|

Shujie Liu, Chi-Ho Li and Ming Zhou

42

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

Alignment invalid count Ratio 



Tic-Tac-Toe

|(j-i) /(m-l)-1.15|

Position Ratio 

7/12/2010

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|

Shujie Liu, Chi-Ho Li and Ming Zhou

43

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



Similar with Haghighi ’s

2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

Alignment invalid count Ratio 



Tic-Tac-Toe

|(j-i) /(m-l)-1.15|

Position Ratio 

7/12/2010

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|

Shujie Liu, Chi-Ho Li and Ming Zhou

44

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



|(j-i) /(m-l)-1.15|

Position Ratio 

7/12/2010

Similar with Haghighi ’s

2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

Alignment invalid count Ratio 



Tic-Tac-Toe

ratio of span length ≈ 1.15 : (average ratio of sentence length)

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))|

Shujie Liu, Chi-Ho Li and Ming Zhou

45

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Alignment count Ratio 



Similar with Haghighi ’s

2*Count(links linked to outside)/(j-i+m-l)

Length Ratio 



2*Count(Links in this span pair) / (j-i+m-l)

Alignment invalid count Ratio 



Tic-Tac-Toe

|(j-i) /(m-l)-1.15|

Position Ratio 

ratio of span length ≈ 1.15 : (average ratio of sentence length)

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))| monotonic assumption:{ Position(F-span) ≈ Position(E-span)}

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

46

Outline  

Introduction Discriminative Model for Pruning   



Training Sample Extraction Training : MERT Features

Experiments and Analysis

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

47

Small-scale alignment Evaluation 





 

The first set of experiments evaluates the performance of the three pruning methods using the Berkeley annotated data. We use the first 250 sentence pairs as training data and the rest 241 pairs as testing data. The corresponding numbers of E-spans in training and test data are 4590 and 3951 respectively. Two ITG models are used: W-DITG and HP-DITG. The upper-bound , actual F-score and the time cost are compared.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

48

Small-scale alignment Evaluation ID

pruning

beam size

pruning/ (total time cost)

F-score Upper Bound

F-score

1

DPDI

10

72’’/3’03’’

88.5%

82.5%

2

TTT

10

58’’/2’38’’

87.5%

81.1%

3

TTT

20

53’’/6’55’’

88.6%

82.4%

4

DP

--

11’’/6’01’’

86.1%

80.5%

Table 1: Evaluation of DPDI against TTT (Tic-tac-toe) and DP (Dynamic Program) for W-DITG

• With the same beam size, although DPDI spends a bit more time, in terms of F-score upper bound, DPDI is 1 percent higher. • DPDI achieves even larger improvement in actual F-score.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

49

Small-scale alignment Evaluation ID

pruning

beam size

pruning/ (total time cost)

F-score Upper Bound

F-score

1

DPDI

10

72’’/5’18’’

93.9%

87.0%

2

TTT

10

58’’/4’51’’

93.0%

84.8%

3

TTT

20

53’’/12’5’’

94.0%

86.5%

4

DP

--

11’’/15’39’’

91.4%

83.6%

Table 2: Evaluation of DPDI against TTT (Tic-tac-toe) and DP (Dynamic Program) for HP-DITG

• Roughly the same observation as in W-DITG can be made. • In addition to the superiority of DPDI, it can also be noted that HPDITG achieves much higher F-score and F-score upper bound (For more details, please read our Coling2010 paper).

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

50

Large-scale End-to-End Experiment 

Machine translation evaluation 

Bilingual Training data: 



Language model: 



5-gram language model trained from the Xinhua section of the Gigaword corpus

Develop corpus 



the NIST training set excluding the Hong Kong Law and Hong Kong Hansard

NIST’03 test set

Test corpus 

7/12/2010

Nist’05 and Nist’08 test sets

Shujie Liu, Chi-Ho Li and Ming Zhou

51

Large-scale End-to-End Experiment ID

Prun-ing

beam size

time cost

Bleu-05

Bleu-08

1

DPDI

10

1092h

38.57

28.31

2

TTT

10

972h

37.96

27.37

3

TTT

20

2376h

38.13

27.58

4

DP

--

2068h

37.43

27.12

Table 3: Evaluation of DPDI against TTT and DP for HP-DITG 





HP-DITG using DPDI achieves the best Bleu score with acceptable time cost. An explanation of the better performance by HP-DITG is the better phrase pair extraction due to DPDI. Good ITG pruning like DPDI guides the subsequent ITG alignment process so that less links inconsistent to good phrase pairs are produced.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

52

Large-scale End-to-End Experiment Prun-ing

F-Score

Bleu-05

3

HMM Giza++ BITG

80.1% 84.2% 85.9%

36.91 37.70 37.92

26.86 27.33 27.85

4

W-DITG

82.5%

--

--

5

HP-DITG

87.0%

38.57

28.31

ID 1 2

Bleu-08

Table 4: Evaluation of DPDI against HMM, Giza++ and BITG 



W-DITG is not as good as HMM, Giza++ and BITG, since W-DITG suffers from the 1-to-1 alignment constraints. HP-DITG (with DPDI) is better than the three baselines both in alignment F-score and Bleu score.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

53

Summary 





A discriminative pruning method (DPDI) is proposed, which can use Minimum Error Rate Training and various features. DPDI is an effective way to reduce the number of bitext cells for bilingual parsing. DPDI can improve not only the alignment performance, but also the SMT performance.

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

54

Thanks

7/12/2010

Shujie Liu, Chi-Ho Li and Ming Zhou

55

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

56

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product. Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]

A

A

A

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

57

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

B

Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

Structural Rules

A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]

B→ B→ B→ B→ B→ B→

A

A

A

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

58

Alignment and ITG Alignment Problem: Finding translation pairs in bitext sentences: 向

be

财政

accountable



to

负责

the

Financial

Secretary

ITG (Wu, 1997) parsing does synchronous parsing of two languages. Word alignment is the by-product.

S

Lexical Rules

C→ei/fi C→Ɛ/fi C→ei/Ɛ

7/12/2010

B

Structural Rules

A A→[AB] A→[BB] A→[CB] A→[AC] A→[BC] A→[CC]

B→ B→ B→ B→ B→ B→

A

A

S→A S→B S→C

A

Ɛ

负责



Ɛ

财政



be

accountable

to

the

Financial

Secretary

Shujie Liu, Chi-Ho Li and Ming Zhou

59

Training Sample Extraction 

The annotated data for training? 

e1

We use the phrase pair extracted from golden alignment sentence pairs as annotated data for training e2

f1

e3

A: [e1,e3]/[f1,f2] {e1/f1,e3/f2},{e2/f1,e3/f2} A  [C, C ]

f2

A  [C, C ]

[f1,f2]

f2 C: [e1,e2]/[f1] {e2/f1}

C  [Ce , Cw ] Cw: e1/f1 {e1/f1}

7/12/2010

Ce: e1/Ɛ

Cw: e2/f1 {e2/f1}

C: [e2,e3]/[f2] {e3/f2}

[e2,e3] e3 [e1,e2]

f1

C  [Ce , Cw ] Ce: e2/Ɛ

[e2,e3]

e2 e1

Cw: e3/f2 {e3/fe}

Shujie Liu, Chi-Ho Li and Ming Zhou

60

Evaluation Criteria 

The upper bound on alignment F-score 

how many links in annotated alignment can be kept in ITG parse A: [e1,e3]/[f1,f2] hit=max{1+1,1+1}=2

1 if  u, v  R hit (Cw [u, v])   0 otherwise

hit (Ce )  0

A  [C, C ]

hit (C f )  0

hit ( X [ f , e ])  max

Y , Z , f1 ,e1 , f 2 ,e2

7/12/2010

C: [e1,e2]/[f1] hit=max{0+1}=1

(hit (Y [ f1 , e1 ])  hit (Y [ f 2 , e2 ]))

where X,Y,Z are variables for the categories in ITG grammar, and R comprises the golden links in annotated alignment.

A  [C, C ]

C: [e2,e3]/[f2] hit=max{0+1}=1

C  [Ce , Cw ]

Cw: e1/f1 hit=1

Ce: e1/Ɛ hit=0

Shujie Liu, Chi-Ho Li and Ming Zhou

C  [Ce , Cw ] Cw: e2/f1 hit=1

Ce: e2/Ɛ hit=0

Cw: e3/f2 hit=1

61

Features 

Features for pruning model (F-span:, E-span)   

Inside probability Outside probability Length Ratio 



3

4

1

2

3

4

|(j+i)/(2*length(src sent))–(l+m)/(2*length(trg sent))| = |4/(2*4) - 5/(2*4)| = 0.125

Alignment count Ratio 



2

Position Ratio 



|(j-i) /(m-l)-1.15| = |(3-1)/(4-2)-1.15| = |-0.15| = 0.15

1

2*Count(Links in this span pair) / (j-i+m-l) = 2*1/4 = 0.5

Alignment invalid count Ratio 

7/12/2010

2*Count(links linked to outside)/(j-i+m-l) = 2*2/4 = 1

Shujie Liu, Chi-Ho Li and Ming Zhou

62

Trainable Pruning Model for ITG Parsing

Dec 7, 2010 - pruning methods using the Berkeley annotated data.. We use the first 250 sentence pairs as training data and the rest 241 pairs as testing data.. The corresponding numbers of E-spans in training and test data are. 4590 and 3951 respectively.. Two ITG models are used: W-DITG and HP-DITG.

2MB Sizes 0 Downloads 173 Views

Recommend Documents

PartBook for Image Parsing
effective in handling inter-class selectivity in object detec- tion tasks [8, 11, 22]. ... intra-class variations and other distracted regions from clut- ...... learning in computer vision, ECCV, 2004. ... super-vector coding of local image descripto

PartBook for Image Parsing
effective in handling inter-class selectivity in object detec- tion tasks [8, 11, 22]. ... automatically aligning real-world images of a generic cate- gory is still an open ...

TRAINABLE FRONTEND FOR ROBUST AND ... - Research
Google, Mountain View, USA. {yxwang,getreuer,thadh,dicklyon,rif}@google.com .... smoother M, PCEN can be easily expressed as matrix operations,.

Trainable Speaker Diarization
available annotated training data to model intra-speaker inter- segment variability. ... define a segment-space which is a direct sum of two subspaces. The first ...

Universal Dependency Annotation for Multilingual Parsing
of the Workshop on Treebanks and Linguistic Theo- ries. Sabine Buchholz and Erwin Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In.

Transformation-based Learning for Semantic parsing
semantic hypothesis into the correct semantics by applying an ordered list of transformation rules. These rules are learnt auto- matically from a training corpus ...

Tree Revision Learning for Dependency Parsing
Revision learning is performed with a discriminative classi- fier. The revision stage has linear com- plexity and preserves the efficiency of the base parser. We present empirical ... A dependency parse tree encodes useful semantic in- formation for

Trunking & Pruning Configuration.pdf
Page 1 of 7. All rights reserved. ©Best Cisco CCNA CCNP and Linux PDF Notes. www.ccnaccnplinux.com® www.cisconotes.com®. Now the we talked about ...

TRAINABLE FRONTEND FOR ROBUST AND ... - Research at Google
tion, we introduce a novel frontend called per-channel energy nor- malization (PCEN). ... In this section, we introduce the PCEN frontend as an alternative to the log-mel .... ing background noises and reverberations, where the noise sources.

Parsing words - GitHub
which access sequence elements without bounds checking (Unsafe sequence operations). ...... This feature changes the semantics of literal object identity.

Pfff: Parsing PHP - GitHub
Feb 23, 2010 - II pfff Internals. 73 ... 146. Conclusion. 159. A Remaining Testing Sample Code. 160. 2 ..... OCaml (see http://caml.inria.fr/download.en.html).

A Dual-Phase Technique for Pruning Constructive Networks
School of Computer Science. McGill University ... Setting the ideal size of a neural network's topology is a ... interconnected system that is very deep in layers.

Pruning and Preprocessing Methods for Inventory ...
e.g., keys, are worth using thus pruning potentially unnecessary items before the ...... Digital Entertainment Conference, 2015, extended version at http://arxiv.

Efficient Pruning Schemes for Distance-Based Outlier ... - Springer Link
distance r [4], (b) top n data points whose distance to their corresponding kth ... We demonstrate a huge improvement in execution time by using multiple pruning ...

A Network Pruning Based Approach for Subset-Specific ...
framework for top-k influential detection to incorporate γ. Third, we ... online social networks, we believe that it is useful in other domains ... campaign which aims to focus only on nodes which are sup- .... In [10], an alternate approach is pro-

Generalized Selective Data Pruning for Video Sequence
It is regarded as a spatial-domain video retargeting, i.e., re- duction of the ..... Table 2. Interpolation Back PSNRs (dB). SDP. GenSDP λ = 10 λ = 100. Akiyo. 43.33. 48.96 .... [5] Y. Tanaka et al., “Improved image concentration for artifact-fre