Information Extraction Using the Structured Language Model Ciprian Chelba, Milind Mahajan

 Information Extraction from Text  Structured Language Model (SLM)  SLM for Information Extraction  Experiments and Error Analysis  Conclusions and Future Directions

Microsoft Research Speech.Net

Information Extraction from Text  Data driven approach with minimal annotation effort: clearly identifiable semantic slots and frames

 Information extraction viewed as the recovery of a two level semantic parse S for a given word sequence W

 Sentence independence assumption: the sentence W is sufficient for identifying the semantic parse S

FRAME LEVEL

SLOT LEVEL

Calendar Task

Person

Subject

Time

Schedule meeting with Megan Hokins about internal lecture at two thirty p.m. Microsoft Research Speech.Net

Syntactic Parsing Using the Structured Language Model  Generalize trigram modeling (local) by taking advantage of sentence structure (influence by more distant past)

 Develop hidden syntactic structure assignment

T i for a given word prefix Wi, with headword

 Assign a probability P (W i; Ti) ended_VP’ with_PP loss_NP

of_PP contract_NP

loss_NP

the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN

cents_NP 7_CD cents_NNS after Microsoft Research Speech.Net

ended_VP’ with_PP loss_NP

of_PP contract_NP

cents_NP

loss_NP

the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN

7_CD cents_NNS

; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; left-VP’; null; : : :; :::

; adjoin-

:::

Microsoft Research Speech.Net

ended_VP’ with_PP loss_NP

of_PP contract_NP

cents_NP

loss_NP

the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN

7_CD cents_NNS

; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; left-VP’; null; : : :; :::

; adjoin-

:::

ended_VP’ with_PP loss_NP

of_PP contract_NP

cents_NP

loss_NP

the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN

7_CD cents_NNS

; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; left-VP’; null; : : :; :::

; adjoin-

:::

ended_VP’ with_PP loss_NP

of_PP contract_NP

cents_NP

loss_NP

the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN

7_CD cents_NNS

; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; left-VP’; null; : : :; :::

; adjoin-

:::

predict word PREDICTOR

TAGGER

ended_VP’ with_PP

null

loss_NP

tag word PARSER

of_PP

adjoin_{left,right}

contract_NP

cents_NP

loss_NP

the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN

7_CD cents_NNS

; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; left-VP’; null; : : :; :::

; adjoin-

:::

Word and Structure Generation P

Tn+1 Wn+1) =

(

;

Y

n+1

i {z2 1 } | | i=1 P

(w jh

;h

)

P

(gijwi; h

predictor

{z

}|

{z

}

1:tag; h 2:tag) P (Tijwi; gi; Ti 1)

tagger

parser

 The predictor generates the next word w i with probability P (wi = v jh  The tagger attaches tag gi to the most recently generated word P (gi jwi ; h 1 :tag; h 2 :tag )  The parser builds the partial parse

T i from Ti 1

;w

w

2; h 1)

i with probability

i, and gi in a series of moves

ending with null, where a parser move a is made with probability P (ajh a 2 f(adjoin-left, NTtag), (adjoin-right, NTtag), nullg

2; h 1);

Microsoft Research Speech.Net

Model Parameter Reestimation Need to re-estimate model component probabilities such that we decrease the model perplexity. P

( wi = v j h

2; h 1); P (gijwi; h 1:tag; h 2:tag); P (ajh 2; h 1)

N-best variant of the Expectation-Maximization(EM) algorithm:

 We seed re-estimation process with parameter estimates gathered from manually or automatically parsed sentences

 We retain the N “best” parses fT 1 ; : : : ; TN g for the complete sentence W  The hidden events in the EM algorithm are restricted to those occurring in the N “best” parses

Microsoft Research Speech.Net

SLM for Information Extraction





Training: initialization Initialize SLM as a syntactic parser from treebank syntactic parsing Train SLM as a matched constrained parser and parse the training data: boundaries of semantic constituents are matched augmentation Enrich the non/pre-terminal labels in the resulting treebank with semantic tags syntactic+semantic parsing Train SLM as an L-matched constrained parser: boundaries and tags of the semantic constituents are matched Test: – syntactic+semantic parsing of test sentences; retrieve the semantic parse by taking the semantic projection of the most likely parse: S

=

SEM

(arg max P (Ti ; W ))

T

i

Microsoft Research Speech.Net

Constrained Parsing Using the SLM  a semantic parse S is equivalent to a set of constraints  each constraint c is a 3-tuple c =< l; r; Q >: / is the left/right boundary of the semantic constituent to be matched and Q is the set of allowable non-terminal tags for the constituent l r

☞ ☞ ✔ ✔

Match parsing (syntactic parsing stage): 1. parses match the constraint boundaries c:l; c:r; 8c for a given sentence L-Match parsing (syntactic+semantic parsing stage): 1. parses match the constraint boundaries and the set of labels Q: c:l; c:r; c:Q; 8c 2. the semantic projection of the parse trees must have exactly two levels Both Match and L-Match parsing can be efficiently implemented in the left-to-right, bottom-up, binary parsing strategy of the SLM On test sentences the only constraint available is the identity of the semantic tag at the root node Microsoft Research Speech.Net

Experiments MiPad data (personal information management)

 training set: 2,239 sentences (27,119 words) and 5,431 slots  test set: 1,101 sentences (8,652 words) and 1,698 slots  vocabulary: 1,035wds, closed over test data Training Iteration Stage 2

Stage 4

Baseline 0, MiPad/NLPwin 1, UPenn Trbnk 1, UPenn Trbnk 1, UPenn Trbnk

0 0 1 2

Error Rate (%) Training Test Slot Frame Slot Frame 43.41 7.20 57.36 14.90 9.78 1.65 37.87 21.62 8.44 2.10 36.93 16.08 7.82 1.70 36.98 16.80 7.69 1.50 36.98 16.80

 baseline is a semantic grammar developed manually that makes no use of syntactic information

 initialize the syntactic SLM from in-domain MiPad treebank (NLPwin) and out-ofdomain Wall Street Journal treebank (UPenn)

 3 iterations of N-best EM parameter reestimation algorithm Microsoft Research Speech.Net

Would More Data Help?  big difference in performance between training and test suggests over training  studied the performance of the model with decreasing amounts of training data Training Corpus Size all 1/2 all 1/4 all

✔ ✔

Training Iteration Stage 2 Baseline 1, UPenn Trbnk 1, UPenn Trbnk 1, UPenn Trbnk

Stage 4 0 0 0

Error Rate (%) Training Test Slot Frame Slot Frame 43.41 7.20 57.36 14.90 8.44 2.10 36.93 16.08 — — 43.76 18.44 — — 49.47 22.98

performance degradation w/ training data size is severe more training data and model parameterization that makes more effective use of the training data is likely to help

Microsoft Research Speech.Net

Error Analysis  investigated the correlation between the semantic frame/slot accuracy and the number of semantic slots in a sentence No. slots/sent 1 2 3 4 5+



Error Rate (%) Slot Frame 43.97 18.01 39.23 16.27 26.44 5.17 26.50 4.00 21.19 6.90

No. Sent 755 209 58 50 29

Sentences containing more semantic slots are less ambiguous from an information extraction point of view

Microsoft Research Speech.Net

Conclusions

✔ ✔

Presented a data driven approach to information extraction that outperforms a manually written semantic grammar Coupling of syntactic and semantic information improves information extraction accuracy, as shown previously by Miller et al., NAACL 2000

Future Work

✘ ✘

Use a statistical modeling technique that makes better use of limited amounts of training data and rich conditioning information — maximum entropy Aim at information extraction from speech: treat the word sequence as a hidden variable, thus finding the most likely semantic parse given a speech utterance

Microsoft Research Speech.Net

Information Extraction Using the Structured Language ...

syntactic+semantic parsing of test sentences; retrieve the semantic parse by ... Ї initialize the syntactic SLM from in-domain MiPad treebank (NLPwin) and out-of-.

66KB Sizes 0 Downloads 187 Views

Recommend Documents

Adaptive Extraction of Information Using Relaxation Labelling ... - IJRIT
Abstract-Internet forums are important services where users can request and exchange information ... 2.2 Web data extraction based on partial tree alignment:.

Language Independent Sentence Extraction Based ...
LSi = Length Score of ith sentence. Wi = Set of words in ith sentence. The sum of the scores yielded by (1), (2) and (3) results in the surface score for a sentence.

structured query language pdf
Download now. Click here if your download doesn't start automatically. Page 1 of 1. structured query language pdf. structured query language pdf. Open. Extract.

STRUCTURED LANGUAGE MODELING FOR SPEECH ...
A new language model for speech recognition is presented. The model ... 1 Structured Language Model. An extensive ..... 2] F. JELINEK and R. MERCER.

Mining comparative sentences and information extraction
Computer Science and Engineering. Assistant professor in ... substance fillers for the machines of chance in the template, which makes into company techniques from several way of discovery from .... record data. For example, Amazon puts (a person) fo

TEXTLINE INFORMATION EXTRACTION FROM ... - Semantic Scholar
because of the assumption that more characters lie on baseline than on x-line. After each deformation iter- ation, the distances between each pair of snakes are adjusted and made equal to average distance. Based on the above defined features of snake

TEXTLINE INFORMATION EXTRACTION FROM ... - Semantic Scholar
Camera-Captured Document Image Segmentation. 1. INTRODUCTION. Digital cameras are low priced, portable, long-ranged and non-contact imaging devices as compared to scanners. These features make cameras suitable for versatile OCR related ap- plications

Robust Information Extraction with Perceptrons
First, we define a new large-margin. Perceptron algorithm tailored for class- unbalanced data which dynamically ad- justs its margins, according to the gener-.

Textline Information Extraction from Grayscale Camera ... - CiteSeerX
INTRODUCTION ... our method starts by enhancing the grayscale curled textline structure using ... cant features of grayscale images [12] and speech-energy.

Putting Semantic Information Extraction on the Map
web for which semantic association with locations could be obtained through .... Mn. Input Features. Figure 2: (Left) A Naive log-linear model as a factor graph.

FEATURE NORMALIZATION USING STRUCTURED ...
School of Computer Engineering, Nanyang Technological University, Singapore. 4. Department of Human ... to multi-class normalization for better performance.