Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approach Yuya Unno, Takashi Ninomiya, Yusuke Miyao and Jun’ichi Tsujii University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, Japan

I ntroduc tion

Method 2. Bottom-Up Method

Sentence compression is one of the summarization tasks. We compress an input sentence and create a new short grammatical sentence preserving its meaning. Yesterday I went to Tokyo by train I went to Tokyo  

We cannot learn some compression patterns using the previous method, because the two parse trees sometimes have different structures. Previous method S

We can only drop words

Input: sentence l Output: argmaxsP(s | l)

NP

VP

NP

Knight and Marcu’ s noisy-channel model [1] 1. Parse sentences in the training corpus 2. Compare the corresponding nodes of compressed and original parse trees from the root nodes 3. Estimate rewriting probabilities using count of applied CFG rules We revised this model in two points 1. Maximum-entropy model 2. Bottom-up method

DT The

PP on the table

N apple

S NP

NP

ADVP

Yesterday

VP V

I

PP

I

∏P

( rl , rs )∈R

exp

Left-most ‘ ADVP’

S

ADVP

NP

Yesterday

I

V

‘ Yesterday’ is removed

features  Mother

    

DT

N

The

apple

on the table

r ∈R '

NP

is red

DT

N

The

apple

NP

went to Tokyo

1 P( s | l ) = ∏ exp ∑ λi f i (rs , rl ) i ( rs ,rl )∈R Z  Depth

PP on the table

is red

Avg. length of original sentences: 23.8 Avg. length of compressed sentences: 12.5 Training set: 527 sentences Development set: 263 sentences Test set: 264 sentences F-measure Bigram F-measure BLEU score

80

I

VP

Extracted tree

70 60 50 40

75.3 63.3

30

50.2 47.1

20

80.9 64.1 62.1

72.0 69.5

S

PP

S NP

10

VP

Select the nodes which dominate the compressed sentence

Daughter nodes are corresponding

100 90

Probabilities depend on various features of a parse tree

node  Daughter nodes sequence  Daughter terminals that are removed

Compressed tree

Original tree

PP

(rl | rs )∏ Pcfg (r )

‘ S’is the root

V

apple

VP

We can easily introduce various features to the maximum-entropy model, such as the depth from the root node and which words are removed. Maximum entropy model

The

is red

E x pe rim e nta l R e s ult

went to Tokyo went to Tokyo Pexp(rl | rs): probability of rewriting rs to rl P ( s | l ) ∝ P (l | s ) P ( s ) P(l | s ) =

PP

S

NP

N

VP

NP

Probabilities only depend on mother and daughter nodes

rs

DT

{DT, N} is not a subsequence of {NP, PP}

Bottom-up method

Rewriting probabilities only depend on mother and daughter nonterminals in Knight and Marcu’s model.

S

is red

VP

In the bottom-up method, we only parse the original sentence, and extract a tree from the original parse tree.

Method 1. Maximum-Entropy Model

rl

S

NP

Original tree

A lgorithm

Knight and Marcu’ s Noisy-channel model

Daughter nodes are not corresponding

Noisy-channel Maximum EntropyMaximum Entropy with Bottom-up

VP V

Results of N-gram based evaluation

PP

Grammar

Importance

Human Noisy-channel

4.94 3.81

4.31 3.38

Maximum Entropy ME + Bottom-up

3.88 4.22

3.38 4.06

went to Tokyo

from the root  Left-most and right-most daughters  etc...

We used the same corpus as Knight and Marcu. We evaluated the results using F-measure and BLEU score [2], and human judgment. Our method exceeds the previous method in all evaluation criteria. Especially we obtained the highest score using the maximum entropy model with bottom-up method.

Results of human evaluation Grammar: Whether the output is grammatically correct  Importance: Whether the important words remain 

[1] K. Knight and D. Marcu. 2000. Statistics-Based Summarization - Step One: Sentence Compression. In Proc. of AAAI/IAAI' ‘00 [2] K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proc. of ACL'02.

Maximum-entropy model

Test set: 264 sentences. Noisy-channel. 63.3. 50.247.1. 75.3. 64.162.1. 80.9. 72.069.5. Maximum EntropyMaximum Entropy with Bottom-up. F-measure. Bigram F-measure. BLEU score. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. S. NP. VP. NP. PP. The apple on the table is red. DT. N. S. NP. VP. DT. N. The apple is red. S. NP.

1MB Sizes 0 Downloads 124 Views

Recommend Documents

Model Typing for Improving Reuse in Model-Driven Engineering ... - Irisa
Mar 2, 2005 - on those found in object-oriented programming languages. .... The application of typing in model-driven engineering is seen at a number of.

Medical Model vs. Social Model - Kids As Self Advocates
Visit Kids As Self Advocates on the web at: www.fvkasa.org. KASA is a project of ... are a change in the interaction between the individual and society. 5.

Model AIC Deviance - GitHub
summary(dsm_all). Family: Tweedie(p=1.25). Link function: log. Formula: count ~ s(x, y) + s(Depth) + s(DistToCAS) + s(SST) + s(EKE) + s(NPP) + offset(off.set).

Model Typing for Improving Reuse in Model-Driven Engineering ... - Irisa
Mar 2, 2005 - paradigm, both for model transformation and for general ... From the perspective of the data structures involved, model-driven computing ..... tools that work regardless of the metamodel from which the object was instan- tiated.

model-integration.pdf
Sign in. Main menu.

What is computation? Actor Model vesus Turing's Model
What is computation? ... communicate asynchronously and the entire computation is not in any well- defined state. ..... astronomical data was surpassing the.