Richer Syntactic Dependencies for Structured Language Modeling Ciprian Chelba

Peng Xu

STG Microsoft Research [email protected]

CLSP Johns Hopkins University [email protected]

Abstract

 two simple methods of enriching the dependencies in the syntactic parse trees used for intializing the struc-

tured language model (SLM)  achieve improvement in perplexity (PPL) and word-error-rate (WER, N-best rescoring) over the baseline results reported using the SLM on the UPenn Treebank and Wall Street Journal (WSJ) corpora, respectively

X )=

✔ Word level probability assignment:

Structured Language Model ✔ Generalize trigram modeling (local) by taking advantage of sentence structure (influence by more distant past) [1] ✔ Use exposed heads h (words w and their corresponding nonterminal tags l) for prediction:

P (wi+1jWi; Ti(Wi)) = P (wi+1jh 2(Ti(Wi)); h 1(Ti(Wi)))

Ti is the partial hidden structure, with head assignment, assigned to Wi = w1 : : : wi ✔ Model will assign joint probability to sequences of words and hidden parse structure: P (Ti; Wi) ✔ Number of parses Tk for a given word prefix Wk is jfTk gj  (2k ) — need to prune it by discarding the unlikely ones

P (wk+1=Wk

Tk2Sk

P (wk+1=Wk Tk )  (Wk ; Tk )

X )

(Wk ; Tk ) = P (Wk Tk =

Tk 2Sk

P (Wk Tk )

– Sk is the set of all parses present in the stacks at the current stage k ✔ Model statistics estimation — unsupervised algorithm for maximizing P (W ) (minimizing perplexity) that belongs to the class of Expectation-Maximization algorithms ✔ Parameters are initialized on parse trees that have been binarized and the non-terminal (NT) tags at each node have been enriched with headwords

predict word PREDICTOR

ended_VP’

TAGGER

with_PP null

loss_NP

tag word PARSER

of_PP

adjoin_{left,right}

contract_NP

cents_NP

loss_NP

the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN

7_CD cents_NNS

; null; predict cents; POStag cents; adjoin-right-NP; adjoin-left-PP; : : :; adjoin-left-VP’; null; : : :;

:::

P (Tn+1; Wn+1

Y ) = | ( j {z )} | ( j {z n+1 i=1

} | {z }

P wi h 2; h 1 P gi wi; h 1:tag; h 2:tag ) P (Tijwi; gi; Ti 1) predictor

tagger

parser

Richer Syntactic Dependencies Enrich the non-terminal tag of a node in the binarized parse tree with the NT tag of one if its children or both: 1. same: we use the non-terminal tag of the node from which the headword is being percolated

2. opposite: we use the non-terminal tag of the sibling node from which the headword is being percolated 3. both: both of the above A given binarized tree is traversed recursively in depth first order and each constituent is enriched in the above manner.

predict word PREDICTOR

ended_VP’+PP

TAGGER

with_PP+NP null

loss_NP+PP

tag word PARSER

of_PP+NP

adjoin_{left,right}

contract_NP+DT

loss_NP+DT

the_DT contract_NN ended_VBD with_IN a_DT loss_NN of_IN :::

cents_NP+CD

7_CD cents_NNS

; null; predict cents; POStag cents; adjoin-right-NP+CD; adjoin-left-PP+NP; : : :; adjoin-left-VP’+PP; null; : : :;

Perplexity experiments

Model

Evaluate the perplexity on the UPenn Treebank.  Training set: 1Mwds (Sections 00-22)  Test set: 82.4kwds (Sections 23-24)  Vocabulary: 10kwds open  POS-tagger vocabulary: 40  NT tag vocabulary: 52 baseline, 954 opposite, 712 same, 3816 both  CONSTRUCTOR operation vocabulary: 157 baseline, 2863 opposite, 2137 same, 11449 both  The SLM was interpolated with the 3-gram model: P () =   P3gram() + (1 )  PSLM () Enriching scheme baseline opposite same both

Train Iter 3 3 3 3

 = 0.0

 = 0.6

 = 1.0

158.75 150.83 155.29 153.30

148.67 144.08 146.39 144.99

166.63 166.63 166.63 166.63

✘ opposite initialization scheme performed best ✘ 5% relative reduction compared to SLM baseline ✘ 3% relative improvement after interpolation with 3-gram

WER (N-best rescoring) results Evaluate the WER performance of the SLM in the WSJ DARPA’93 HUB-1 test setup.  Training set: 20Mwds(SLM)/40Mwds(3-gram), WSJ  Test set size: 213 utterances, 3446 words.  Vocabulary: 20kwds open  baseline: standard (LDC) 3-gram model trained on 40Mwds — lattices and the N-best lists  SLM: trained on 20Mwds subset of WSJ automatically parsed (Ratnaparkhi), opposite NT tag scheme

Iter

0.0 baseline SLM WER % 0 13.1 opposite SLM WER, % 0 12.7 MPSS significance test p-value 0.020

Interpolation weight 0.2 0.4 0.6 0.8 1.0 13.1 13.1 13.0 13.4 13.7 12.8 12.7 12.7 13.1 13.7 0.017 0.014 0.005 0.070 —

✘ 0.3-0.4% absolute reduction in WER over the baseline SLM ✘ 1.0% absolute reduction in WER over the baseline 3-gram ✘ SLM performance as a second pass language model is the same even without interpolating it with the 3-gram model

Conclusions and Future Directions ☞ Simple but effective method of enriching the syntactic dependencies in the structured language model (SLM) that achieves 0.3-0.4% absolute reduction in WER over the best previous results reported using the SLM on WSJ. ☞ Implementation could be greatly improved by predicting only the relevant part of the enriched non-terminal tag and then adding the part inherited from the child. ☞ A more comprehensive study of the most productive ways of increasing the probabilistic dependencies in the parse tree would be desirable.

Acknowledgements The authors would like to thank Brian Roark for making available the N-best lists for the HUB1 test set. SLM publicly available: ftp://ftp.clsp.jhu.edu/pub/clsp/chelba/SLM RELEASE

References [1] Ciprian Chelba and Frederick Jelinek, “Structured language modeling,” Computer Speech and Language, vol. 14, no. 4, pp. 283–332, October 2000.

adjoin-left-PP; ; adjoin-left-VP'; null - Semantic Scholar

–Л is the set of all parses present in the stacks at the current stage. ✓Model statistics estimation — unsupervised algorithm for max- imizing И´Пµ (minimizing ...

61KB Sizes 2 Downloads 282 Views

Recommend Documents

The null space property for sparse recovery from ... - Semantic Scholar
Nov 10, 2010 - E-mail addresses: [email protected] (M.-J. Lai), [email protected] (Y. Liu). ... These motivate us to study the joint sparse solution recovery.

The null space property for sparse recovery from ... - Semantic Scholar
Nov 10, 2010 - linear systems has been extended to the sparse solution vectors for multiple ... Then all x(k) with support x(k) in S for k = 1,...,r can be uniquely ...

the value of null theories in ecology - Semantic Scholar
the-two-thirds-power scaling law, because that is a null ..... model for the origin of allometric scaling laws in biology. ...... unreachable in the Australian Outback.

Monte Carlo null models for genomic data - Semantic Scholar
Section 3 presents null model preservation hierarchies and signifi- cance orderings. Sections 4–6 ... A Galaxy Pages (Goecks et al., 2010) document allowing for ...

Physics - Semantic Scholar
... Z. El Achheb, H. Bakrim, A. Hourmatallah, N. Benzakour, and A. Jorio, Phys. Stat. Sol. 236, 661 (2003). [27] A. Stachow-Wojcik, W. Mac, A. Twardowski, G. Karczzzewski, E. Janik, T. Wojtowicz, J. Kossut and E. Dynowska, Phys. Stat. Sol (a) 177, 55

Physics - Semantic Scholar
The automation of measuring the IV characteristics of a diode is achieved by ... simultaneously making the programming simpler as compared to the serial or ...

Physics - Semantic Scholar
Cu Ga CrSe was the first gallium- doped chalcogen spinel which has been ... /licenses/by-nc-nd/3.0/>. J o u r n a l o f. Physics. Students http://www.jphysstu.org ...

Physics - Semantic Scholar
semiconductors and magnetic since they show typical semiconductor behaviour and they also reveal pronounced magnetic properties. Te. Mn. Cd x x. −1. , Zinc-blende structure DMS alloys are the most typical. This article is released under the Creativ

vehicle safety - Semantic Scholar
primarily because the manufacturers have not believed such changes to be profitable .... people would prefer the safety of an armored car and be willing to pay.

Reality Checks - Semantic Scholar
recently hired workers eligible for participation in these type of 401(k) plans has been increasing ...... Rather than simply computing an overall percentage of the.

Top Articles - Semantic Scholar
Home | Login | Logout | Access Information | Alerts | Sitemap | Help. Top 100 Documents. BROWSE ... Image Analysis and Interpretation, 1994., Proceedings of the IEEE Southwest Symposium on. Volume , Issue , Date: 21-24 .... Circuits and Systems for V

TURING GAMES - Semantic Scholar
DEPARTMENT OF COMPUTER SCIENCE, COLUMBIA UNIVERSITY, NEW ... Game Theory [9] and Computer Science are both rich fields of mathematics which.

A Appendix - Semantic Scholar
buyer during the learning and exploit phase of the LEAP algorithm, respectively. We have. S2. T. X t=T↵+1 γt1 = γT↵. T T↵. 1. X t=0 γt = γT↵. 1 γ. (1. γT T↵ ) . (7). Indeed, this an upper bound on the total surplus any buyer can hope

i* 1 - Semantic Scholar
labeling for web domains, using label slicing and BiCGStab. Keywords-graph .... the computational costs by the same percentage as the percentage of dropped ...

fibromyalgia - Semantic Scholar
analytical techniques a defect in T-cell activation was found in fibromyalgia patients. ..... studies pregnenolone significantly reduced exploratory anxiety. A very ...

hoff.chp:Corel VENTURA - Semantic Scholar
To address the flicker problem, some methods repeat images multiple times ... Program, Rm. 360 Minor, Berkeley, CA 94720 USA; telephone 510/205-. 3709 ... The green lines are the additional spectra from the stroboscopic stimulus; they are.

Dot Plots - Semantic Scholar
Dot plots represent individual observations in a batch of data with symbols, usually circular dots. They have been used for more than .... for displaying data values directly; they were not intended as density estimators and would be ill- suited for

Master's Thesis - Semantic Scholar
want to thank Adobe Inc. for also providing funding for my work and for their summer ...... formant discrimination,” Acoustics Research Letters Online, vol. 5, Apr.

talking point - Semantic Scholar
oxford, uK: oxford university press. Singer p (1979) Practical Ethics. cambridge, uK: cambridge university press. Solter D, Beyleveld D, Friele MB, Holwka J, lilie H, lovellBadge r, Mandla c, Martin u, pardo avellaneda r, Wütscher F (2004) Embryo. R

Physics - Semantic Scholar
length of electrons decreased with Si concentration up to 0.2. Four absorption bands were observed in infrared spectra in the range between 1000 and 200 cm-1 ...

aphonopelma hentzi - Semantic Scholar
allowing the animals to interact. Within a pe- riod of time ranging from 0.5–8.5 min over all trials, the contestants made contact with one another (usually with a front leg). In a few trials, one of the spiders would immediately attempt to flee af

minireviews - Semantic Scholar
Several marker genes used in yeast genetics confer resis- tance against antibiotics or other toxic compounds (42). Selec- tion for strains that carry such marker ...

PESSOA - Semantic Scholar
ported in [ZPJT09, JT10] do not require the use of a grid of constant resolution. We are currently working on extending Pessoa to multi-resolution grids with the.