Cross-Lingual Syntactically Informed Distributed Word Representations Ivan Vuli¢

University of Cambridge

EACL 2017; Valencia; April 6, 2017 [email protected]

Motivation (High-Level)

The NLP community has developed useful features for several tasks but nding features that are...

1. task-invariant (POS tagging, SRL, NER, parsing, ...) (monolingual word embeddings)

2. language-invariant (English, Dutch, Chinese, Spanish, ...) (cross-lingual word embeddings



this talk)

...is non-trivial and time-consuming (20+ years of feature engineering...)

1 / 19

Motivation (High-Level)

The NLP community has developed useful features for several tasks but nding features that are...

1. task-invariant (POS tagging, SRL, NER, parsing, ...) (monolingual word embeddings)

2. language-invariant (English, Dutch, Chinese, Spanish, ...) (cross-lingual word embeddings



this talk)

...is non-trivial and time-consuming (20+ years of feature engineering...)

Learn word-level features which generalise across tasks and languages 2 / 19

Motivation (Low-Level)

Inject

→ → 3 / 19

syntactic information into cross-lingual word embeddings

Similar structures in English and Italian

Universal Dependencies: syntactic contexts in multiple languages

Learning from Context

Skip-gram with negative sampling (SGNS) [Mikolov et al.; NIPS 2013]

Learning from the set D of (word, context) pairs observed (w, v) = (wt , wt±c ); i = 1, ..., c; c = context window size

in a corpus:

SG learns to predict the context of each pivot word. John saw a cute gray huhblub running in the eld. D

= (huhblub, cute), (huhblub, gray), (huhblub, running), (huhblub, in)

vec(huhblub) = [−0.23, 0.44, −0.76, 0.33, 0.19, . . .] 4 / 19

Learning from Context

Representation model → Skip-gram with negative sampling (SGNS) SGNS may be trained with arbitrary contexts [Levy and Goldberg, ACL 2014]

Context is crucial Dierent context types result in dierent SGNS vectors.

[Schwartz et al, NAACL 2016; Melamud et al, NAACL 2016]

Some standard context types: 1. (Ordinary) bag-of-words (BOW) 2. Positional (POSIT) 3. Dependency-based: Basic (DEPS-NAIVE)

4. Dependency-based: with prepositional arc collapsing

5 / 19

(DEPS-ARC)

Context Types: Dependency-Based

4. (Universal) Dependency-based: with prepositional arc collapsing {(discovers, scientist_nsubj), (discovers, stars_dobj), (discovers,

telescope_nmod), (stars, discovers_dobj-1), (scientist, australian_amod), (discovers, telescope_prep_with), (telescope, discovers_prep_with-1))}, ...

→ 6 / 19

Simple but important post-processing:

prepositional arc collapsing

Cross-Lingual Word Embeddings

Representation of a word w1S ∈ V S : 1 vec(w1S ) = [f11 , f21 , . . . , fdim ]

Exactly the same representation for w2T ∈ V T : 2 vec(w2T ) = [f12 , f22 , . . . , fdim ]

Language-independent word representations in the same shared semantic (or embedding) space! 7 / 19

Cross-Lingual Word Embeddings

Monolingual

vs.

Bilingual

Q1 →

How to align semantic spaces in two dierent languages?

Q2 →

Which bilingual

signals

are used for the alignment?

See also:

8 / 19

[Upadhyay et al., ACL 2016; Vuli¢ and Korhonen, ACL 2016]

Exploiting Syntax and Translation Pairs

Using translation dictionaries, e.g., [en_stars, it_stelle], [en_scientist, it_scienzato]

Extracting context pairs from hybrid cross-lingual trees

9 / 19

Exploiting Syntax and Translation Pairs

Online training with monolingual and cross-lingual dependency-based contexts

10 / 19

Extracting Cross-Lingual Dep-Based Contexts

Online training with monolingual and cross-lingual dependency-based contexts (discovers, scientist_nsubj) − (stars, discovers_dobj 1) (scienzato, australiano_amod) (scopre, stelle_dobj) (scientist, australiano_amod) −

(australiano, scientist_amod − (stars, scopre_dobj 1)

1)

(discovers, scienzato_nsubj)

Training

11 / 19

word2vecf

SGNS on these

(word, context)

pairs

Experimental Setup

Language pairs Results reported with two language pairs:

IT-EN, DE-EN. Experiments

conducted with more language pairs (SV-EN, FR-EN, NL-EN).

Translation dictionaries 1.

BNC-Lemma+GT

2.

dict.cc

Training Data and Setup → → → →

SGNS model; Data: Wikipedias in EN, IT, DE Universal Dependencies v1.4 SOTA UPOS tagger [Martins et al., ACL 2013] SOTA dependency parser [Bohnet, COLING 2010]

[Vuli¢ and Korhonen, ACL 2016]

12 / 19

Baselines

Cross-lingual embeddings relying on exactly translation dictionaries

the same supervision signal:

[Mikolov et al., arXiv 2013], [Lazaridou et al., ACL 2015], ...

word2vecf

SGNS trained with three context types:

1. BOW (win

= 2) = 2)

2. Positional (win

3. Monolingual DEPS (exactly the same signal used as with our model) Online vs

oine: These models train monolingual SGNS oine and learn a

mapping function

13 / 19

Task I: (Monolingual) Word Similarity

Results on

multilingual SimLex-999

[Leviant and Reichart, arXiv 2015]

14 / 19

IT

DE

EN (with IT)

Model

All | Verbs

All | Verbs

All | Verbs

Mono-sgns o-bow2 o-posit2 o-deps

0.235 | 0.318 0.254 | 0.317 0.227 | 0.323 0.199 | 0.308

0.305 | 0.259 | 0.263 0.283 | 0.194 0.258 | 0.214

0.331 | 0.281 0.328 | 0.279 0.336 | 0.316 0.334 | 0.311

CL-DepEmb

0.287

| 0.358

0.306

0.306

| 0.319

0.356

| 0.308

Task II: (Bilingual) Lexicon Induction

Results on

three BLI datasets:

1. Translations of SimLex words (IT-EN and DE-EN) 2. IT-EN test set [Vuli¢ and Moens, EMNLP 2013] 3. DE-EN test set [Upadhyay et al., ACL 2016]

Model

IT-EN

DE-EN

SL-TRANS

VULIC1k

SL-TRANS

UP1328

o-bow2

0.328 [0.457]

0.405

0.218 [0.246]

0.317

o-posit2

0.219 [0.242]

0.272

0.115 [0.056]

0.185

o-deps

0.169 [0.065]

0.271

0.108 [0.051]

0.162

CL-DepEmb

0.541 [0.597] 0.532

0.503 [0.385] 0.436

Table : BLI results (Top 1 scores). For SL-Trans we also report results on the verb translation subtask (numbers in square brackets). 15 / 19

More Results: Highlights (Not Really)

Improvements with CL-DepEmb on verb similarity; tested on SimVerb-3500

→ →

DE SimLex-999, adjectives: 0.585, best baseline: 0.417

→ →

DE SimLex-999, verbs: 0.319, best baseline: 0.263

16 / 19

IT SimLex-999, adjectives: 0.334, best baseline: 0.266

IT SimLex-999, verbs: 0.358, best baseline: 0.323

Future Work

These preliminary experiments show that injecting syntactic information into cross-lingual tasks helps semantic tasks which stress similarity... Porting this idea to more (typologically diverse) languages

More accurate dependency parsers? Selection of (reliable) translation pairs?

More sophisticated approaches to constructing hybrid cross-lingual trees

Other semantic tasks: cross-lingual lexical entailment, lexical substitution?

17 / 19

Questions?

18 / 19

Cross-Lingual Syntactically Informed Distributed Word ...

(scientist, australiano_amod). (australiano, scientist_amod−1). (stars, scopre_dobj−1). (discovers, scienzato_nsubj). Training word2vecf SGNS on these (word, ...

862KB Sizes 1 Downloads 138 Views

Recommend Documents

Cross-Lingual Syntactically Informed Distributed Word ...
Language Technology Lab. DTAL, University ... sion of cross-lingual information results in a shared ..... Georgiana Dinu, Angeliki Lazaridou, and Marco Ba- roni.

BilBOWA: Fast Bilingual Distributed Representations without Word ...
BilBOWA: Fast Bilingual Distributed Representations without Word. Alignments .... process, since parallel data is typically only easily available for certain narrow ...

Semantic Frame Identification with Distributed Word ... - Dipanjan Das
A1. (a). (b). Figure 1: Example sentences with frame-semantic analyses. FrameNet annotation .... as a vector in Rn. Such representations allow a model to share meaning ..... were used for creating our frame lexicon. 5.2 Frame Identification ...

Informed Agents
best-known examples of Rubinstein's (1982) noncooperative-bargaining ... equilibrium, since "With complete information and .... Both earn zero in Ms ..... function through technology or price changes. .... Bargaining," mimeo, London School of.

Informed-Consent.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

WORD OPPOSITE RHYMING WORD
Aug 24, 2014 - Social Development. Let us ensure our children use magic words on a regular basis. Be a role model for our children and use these words yourself as well around our children. Magic words like : Please, Thank You, Sorry, Etc. Also, greet

Minor Informed Consent.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Minor Informed Consent.pdf. Minor Informed Consent.pdf. Open. Extract. Open with. Sign In. Main menu.

Trauma Informed Schools.pdf
September 2014 Wettbewerbskonzept. Dezember 2014 / Januar 2015 Vorentwurf. Februar bis April 2015 Entwurf. Whoops! There was a problem loading this page. Retrying... Whoops! There was a problem loading this page. Retrying... Trauma Informed Schools.p

word by word pdf
Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. word by word pdf. word by wor

Distributed Verification and Hardness of Distributed ... - ETH TIK
and by the INRIA project GANG. Also supported by a France-Israel cooperation grant (“Mutli-Computing” project) from the France Ministry of Science and Israel ...

Distributed Verification and Hardness of Distributed ... - ETH TIK
C.2.4 [Computer Systems Organization]: Computer-. Communication Networks—Distributed Systems; F.0 [Theory of Computation]: General; G.2.2 [Mathematics ...

Distributed Node with Distributed Quota System (DNDQS).pdf ...
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Distributed ...

A distributed system architecture for a distributed ...
Advances in communications technology, development of powerful desktop workstations, and increased user demands for sophisticated applications are rapidly changing computing from a traditional centralized model to a distributed one. The tools and ser

Distributed Creativity.pdf
the roles of creator and audience that underpins the. model of distributed creativity that we propose. Specifically, we believe that free association of ideas as.

Distributed Random Walks
Random walks play a central role in computer science, spanning a wide range of areas in ..... u uniformly at random and forwards C to u after incrementing the counter on the coupon to i. ...... IEEE Computer Society, Washington, DC, 218–223.

Efficient Distributed Quantum Computing
Nov 16, 2012 - tum circuit to a distributed quantum computer in which each ... Additionally, we prove that this is the best you can do; a 1D nearest neighbour machine .... Of course there is a price to pay: the overhead depends on the topology ...

Distributed DBS
To lookup to the definition of view and translate the definition into an equivalent request against the source table of view and then perform that request is known as view resolution. ➢View materialization. Stores the view as a temporary table in a

Efficient Distributed Quantum Computing
Nov 16, 2012 - 3Dept. of Computer Science & Engineering, University of Washington, .... fixed low-degree graph (see Tab. 2 ... With degree O(log N) the over-.

Distributed Electron.. - Courses
examines a generic contract host, able to host this contract and others. Together .... In the browser, every frame of a web page has its own event loop, which is used both ..... To explain how it works, it is best to start with how it is used. ... In

Informed Consent Statement - VBAC.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Informed ...

Firm Characteristics and Informed Trading - Semantic Scholar
We would like to thank K.C. Chan, John Griffin, Inmoo Lee, Alexander Kempf, seminar participants at the University of ... power for the cross-section of asset returns in long sample tests. We also investigate whether ...... Having reviewed the asset

Timetable: Informed consent and multiple application - European ...
of any planned responses to the List of Questions for the parent product application). 31 October 2017. 30 Churchill Place ○ Canary Wharf ○ London E14 5EU ○ United Kingdom. Telephone +44 (0)20 3660 6000 Facsimile +44 (0)20 3660 5555. Send a que