Trimming CFG Parse Trees for Sentence Compression Using Machine Learning Approach Yuya Unno, Takashi Ninomiya, Yusuke Miyao and Jun’ichi Tsujii University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, Japan
I ntroduc tion
Method 2. Bottom-Up Method
Sentence compression is one of the summarization tasks. We compress an input sentence and create a new short grammatical sentence preserving its meaning. Yesterday I went to Tokyo by train I went to Tokyo
We cannot learn some compression patterns using the previous method, because the two parse trees sometimes have different structures. Previous method S
We can only drop words
Input: sentence l Output: argmaxsP(s | l)
NP
VP
NP
Knight and Marcu’ s noisy-channel model [1] 1. Parse sentences in the training corpus 2. Compare the corresponding nodes of compressed and original parse trees from the root nodes 3. Estimate rewriting probabilities using count of applied CFG rules We revised this model in two points 1. Maximum-entropy model 2. Bottom-up method
DT The
PP on the table
N apple
S NP
NP
ADVP
Yesterday
VP V
I
PP
I
∏P
( rl , rs )∈R
exp
Left-most ‘ ADVP’
S
ADVP
NP
Yesterday
I
V
‘ Yesterday’ is removed
features Mother
DT
N
The
apple
on the table
r ∈R '
NP
is red
DT
N
The
apple
NP
went to Tokyo
1 P( s | l ) = ∏ exp ∑ λi f i (rs , rl ) i ( rs ,rl )∈R Z Depth
PP on the table
is red
Avg. length of original sentences: 23.8 Avg. length of compressed sentences: 12.5 Training set: 527 sentences Development set: 263 sentences Test set: 264 sentences F-measure Bigram F-measure BLEU score
80
I
VP
Extracted tree
70 60 50 40
75.3 63.3
30
50.2 47.1
20
80.9 64.1 62.1
72.0 69.5
S
PP
S NP
10
VP
Select the nodes which dominate the compressed sentence
Daughter nodes are corresponding
100 90
Probabilities depend on various features of a parse tree
node Daughter nodes sequence Daughter terminals that are removed
Compressed tree
Original tree
PP
(rl | rs )∏ Pcfg (r )
‘ S’is the root
V
apple
VP
We can easily introduce various features to the maximum-entropy model, such as the depth from the root node and which words are removed. Maximum entropy model
The
is red
E x pe rim e nta l R e s ult
went to Tokyo went to Tokyo Pexp(rl | rs): probability of rewriting rs to rl P ( s | l ) ∝ P (l | s ) P ( s ) P(l | s ) =
PP
S
NP
N
VP
NP
Probabilities only depend on mother and daughter nodes
rs
DT
{DT, N} is not a subsequence of {NP, PP}
Bottom-up method
Rewriting probabilities only depend on mother and daughter nonterminals in Knight and Marcu’s model.
S
is red
VP
In the bottom-up method, we only parse the original sentence, and extract a tree from the original parse tree.
Method 1. Maximum-Entropy Model
rl
S
NP
Original tree
A lgorithm
Knight and Marcu’ s Noisy-channel model
Daughter nodes are not corresponding
Noisy-channel Maximum EntropyMaximum Entropy with Bottom-up
VP V
Results of N-gram based evaluation
PP
Grammar
Importance
Human Noisy-channel
4.94 3.81
4.31 3.38
Maximum Entropy ME + Bottom-up
3.88 4.22
3.38 4.06
went to Tokyo
from the root Left-most and right-most daughters etc...
We used the same corpus as Knight and Marcu. We evaluated the results using F-measure and BLEU score [2], and human judgment. Our method exceeds the previous method in all evaluation criteria. Especially we obtained the highest score using the maximum entropy model with bottom-up method.
Results of human evaluation Grammar: Whether the output is grammatically correct Importance: Whether the important words remain
[1] K. Knight and D. Marcu. 2000. Statistics-Based Summarization - Step One: Sentence Compression. In Proc. of AAAI/IAAI' ‘00 [2] K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proc. of ACL'02.
Test set: 264 sentences. Noisy-channel. 63.3. 50.247.1. 75.3. 64.162.1. 80.9. 72.069.5. Maximum EntropyMaximum Entropy with Bottom-up. F-measure. Bigram F-measure. BLEU score. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. S. NP. VP. NP. PP. The apple on the table is red. DT. N. S. NP. VP. DT. N. The apple is red. S. NP.
The entrance test for admission to Master's Degree in Hospital Management is ... After successive discounts of 10% and 8% have been granted the net price of ...
model are AVHRR â LAC (Advanced Very. High Resolution Radiometer â Local Area. Coverage) type. Description about it could be seen in chapter 2.2.3. Actually, it has spatial resolution is 1,1 x 1,1 kilometers square and temporal resolution is one
School of Electronics Engineering and Computer Science ...... quate to support synchronization because the transforma- .... engineering, pages 362â365.
Mar 2, 2005 - on those found in object-oriented programming languages. .... The application of typing in model-driven engineering is seen at a number of.
Visit Kids As Self Advocates on the web at: www.fvkasa.org. KASA is a project of ... are a change in the interaction between the individual and society. 5.
The problem: unauthorised or malicious activities performed by clients on servers while clients consume services (e.g. email spam) without behavioural history ...
typing in model-driven engineering, including a motivating example. Following this, in section 3 ... type system). Not all errors can be addressed by type systems, especially since one usually requires that type checking is easy; e.g., with static ty
28) Verify Euler's formula for the given network. 29) In âleABC, PQ II BC. AP = 3 cm, AR = 4.5 cm,. AQ = 6 cm, AB ... A motor boat whose speed is 15km/hr in still water goes 30 km down stream and comes back in a total of a 4 hours 30 minutes. Deter
Wa/iace'£ publication0. Since DarWin |maiiace^ubiicanon^-tim i^id and ... a// -fka Fi-eids Of fodtoy^.-. Page 3 of 5. Model PDQ's.pdf. Model PDQ's.pdf. Open.
Real survey data is messy ... Weather has a big effect on detectability. Need to record during survey. Disambiguate ... Parallel processing. Some models are very ...
ground domain), for which large amounts of train- ing data are available, to a different domain (the adaptation domain), for which only small amounts of training ...
can attach a Java program that realizes the actual transformation (referred to as a ..... M. Clavel, F. Durän, S. Eker, P. Lincoln, N. Marti-Oliet, J. Meseguer, and J.
Mar 2, 2005 - paradigm, both for model transformation and for general ... From the perspective of the data structures involved, model-driven computing ..... tools that work regardless of the metamodel from which the object was instan- tiated.
Aug 7, 2010 - We call this a ... In HMM-GMM based speech recognition (see [11] for review), we turn the .... of the work described here has been published in conference .... ize the SGMM system; we do this in such a way that all the states' ...
using MDA for software development, there remain many challenges for the process of software validation, and in par- ticular software testing, in an MDA context ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. MODEL ...
Page. 1. /. 1. Loading⦠Page 1 of 1. model acta.pdf. model acta.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying model acta.pdf. Page 1 of 1.
Proof will be able to reach the segments through its primary advertising campaign of ... will utilize social networks such as Facebook, Twitter, Instagram, and Yelp to make the ...... Does your business need customer relationship management?
where v1, v2, . . . . v represents the current state and v., v, ..., v, represents the next state. By converting this ... one register is eventually equal to the sum of the values in two other registers. In such ... atomic proposition names. .... If