University of Amsterdam

Distributed Representations of Sentences and Documents

Valentin Vogelmann and Cassandra Loor

April 18, 2016

Outline Introduction Algorithms Paragraph2Vec Experiments Conclusions Discussion Sources

|

|

2

Introduction What do we want to do?



Text classification



Existing techniques aren’t good enough



New techniques have shown new hope

|

Introduction

|

3

Algorithms Which approaches exist?



Bag-of-words    

Simple Efficient Loses word order Little sense of semantics

|

Algorithms

|

4

Algorithms What techniques will we apply?



Remember Word2Vec?



We just expand it!

|

Algorithms

|

5

Algorithms What does Word2Vec do?  

Map a word to a vector King - Man + Woman = Queen

Source: Presentation by Tom Kenter [1]

|

Algorithms

|

6

Algorithms How does Word2Vec work?

Source: [4]

|

Algorithms

|

7

Algorithms How can Word2Vec be trained?

Source: [3]

|

Algorithms

|

8

Algorithms What does this look like in the paper?

Source: [2]

|

Algorithms

|

9

Paragraph2Vec How was this adjusted?

Figure: PV-DBOW

Figure: PV-DM

Source: [2]

|

Paragraph2Vec

|

10

Experiments

Two tasks, three datasets: 

Sentiment analysis: Stanford Sentiment Treebank, IMDB



Information retrieval: Queries

→ understand how the model is different from bag-of-words et al. → insight into how the model captures the semantics of words and paragraphs

|

Experiments

|

11

Experiments



11855 parsed sentences with labels for subphrases (ratings between 0 and 1)



logistic regression on vector representation predicts rating



testing: freeze word vectors, learn representation of new subphrase, feed into regression

|

Experiments

|

12

Experiments

|

Experiments

|

13

Experiments



100,000 paragraphs (labeled and unlabeled; binary labels)



NN with 1 hidden layer to predict labels from vector representation predicts label



Testing: freeze word vectors, learn representation of new paragraph, feed into NN

|

Experiments

|

14

Experiments

|

Experiments

|

15

Experiments



Answers to 1,000,000 queries, segmented into triplets



Distance of paragraphs determines which belong together



Tf-idf scores instead of raw counts

|

Experiments

|

16

Experiments

|

Experiments

|

17

Experiments



PV-DM is consistently better than PV-DBOW, combination is best (and used)



Concatenation of vectors is used rather than sum



Best context size between 5 and 12 words (empirically obtained)



Needs to be trained in parallel during testing

|

Experiments

|

18

Conclusions



unsupervised algorithm for learning fixed-length representations



extends successful paradigm of distributed representations → overcomes some of the problems of these approaches



outperforms some state-of-the-art approaches in common NLP tasks → captures semantics of the broader context

→ alternative to bag-of-words models where parsing is not available

|

Conclusions

|

19

Discussion



What is an appropriate number of words to sample from each paragraph? Does this depend on the paragraph?



Which other methods for evaluation would you choose?



Does the model capture ambiguity in words?



Does the constantly increasing number of columns in the paragraph matrix hurt the model?

|

Discussion

|

20

Sources Tom Kenter. Word2vec, April 2016. Quoc V Le and Tomas Mikolov. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053, 2014. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. Xin Rong. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738, 2014. | Sources

|

21

Distributed Representations of Sentences and ...

18 Apr 2016 - understand how the model is different from bag-of-words et al. → insight into how the model captures the semantics of words and paragraphs. | Experiments. | 11. Page 12. Experiments. □ 11855 parsed sentences with labels for subphrases. (ratings between 0 and 1). □ logistic regression on vector ...

545KB Sizes 0 Downloads 289 Views

Recommend Documents

Distributed Representations of Words and Phrases and their ...
vectors explicitly encode many linguistic regularities and patterns. Somewhat surprisingly ... Word representations are limited by their inability to represent idiomatic phrases that are not com- positions of the ... compositionality suggests that a

BilBOWA: Fast Bilingual Distributed Representations without Word ...
BilBOWA: Fast Bilingual Distributed Representations without Word. Alignments .... process, since parallel data is typically only easily available for certain narrow ...

Decompositions and representations of monotone ...
monotone operators with linear graphs by. Liangjin Yao. M.Sc., Yunnan University, 2006. A THESIS SUBMITTED IN PARTIAL FULFILMENT OF. THE REQUIREMENTS FOR THE DEGREE OF. Master of Science in. The College of Graduate Studies. (Interdisciplinary). The U

Distributed Verification and Hardness of Distributed ... - ETH TIK
and by the INRIA project GANG. Also supported by a France-Israel cooperation grant (“Mutli-Computing” project) from the France Ministry of Science and Israel ...

Distributed Verification and Hardness of Distributed ... - ETH TIK
C.2.4 [Computer Systems Organization]: Computer-. Communication Networks—Distributed Systems; F.0 [Theory of Computation]: General; G.2.2 [Mathematics ...

Presuppositions of Compound Sentences
scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that ...... FA & BI and r ( A v B)I hold.

Mining comparative sentences and information extraction
... come about pattern-match rules that directly clear substance fillers for the machines of chance in the template, which makes into company techniques from several way of discovery from examples reasoning programming systems and gets unlimited desi

Mining comparative sentences and information extraction
Computer Science and Engineering. Assistant professor in ... substance fillers for the machines of chance in the template, which makes into company techniques from several way of discovery from .... record data. For example, Amazon puts (a person) fo

Selecting different protein representations and ...
Apr 7, 2010 - selects the best type of protein representation in a data-driven manner, ..... lection seems similar to the well-known idea of feature selection in data mining ..... Figure 3: Analysis of relative protein representation importance on th

Highest weight representations of the Virasoro algebra
Oct 8, 2003 - Definition 2 (Antilinear anti-involution). An antilinear anti-involution ω on a com- plex algebra A is a map A → A such that ω(λx + µy) = λω(x) + ...

Complete Sentences
Read the first set of short sentences. □ How can you combine those short sentences into one, grammatically correct sentence?

DEMOTIC NOMINAL SENTENCES 1
6. This order is based solely on the type of nominal elements composing the .... pus of noun plus nb as the second element and there are examples of ...... lender, Middle Egyptian (Afroasiatic Dialects 2; Malibu 1975). 5 4.3. 1 , p. 67, vs .

53 Representations of mother-child attachment relationships and ...
There was a problem loading this page. 53 Representations of mother-child attachment relatio ... e Journal of Early Adolescence-2012-Granot-537-64.pdf.

Symbolic method for simplifying AND-EXOR representations of ...
Oct 25, 1995 - tionally, Reed-Muller designs have been seen as more costly to implement ... 0 IEE, 1996. IEE Proceedings online no. 19960196 .... Sft;rJ 1: Store the input function truth table as ininterm strings. Srep 2: Minterni strings ...

Media representations of British Muslims and ...
Jun 23, 2010 - allegedly inferior morals, values and traditions associated with Muslim identity. Social representations of (im-)morality are likely to be anchored to representations of. (ab-)normality, given that the invocation of difference from the

PARALLEL AND DISTRIBUTED TRAINING OF ...
architectures from data distributed over a network, is still a challenging open problem. In fact, we are aware of only a few very recent works dealing with distributed algorithms for nonconvex optimization, see, e.g., [9, 10]. For this rea- son, up t

pdf-84\rumour-and-renown-representations-of-fama-in-western ...
... the apps below to open or edit this item. pdf-84\rumour-and-renown-representations-of-fama-in-w ... ture-cambridge-classical-studies-by-philip-hardie.pdf.

pdf-149\anglo-american-women-writers-and-representations-of ...
Try one of the apps below to open or edit this item. pdf-149\anglo-american-women-writers-and-representations-of-indianness-1629-1824-by-cathy-rex.pdf.

pdf-148\turbo-folk-music-and-cultural-representations-of-national ...
... the apps below to open or edit this item. pdf-148\turbo-folk-music-and-cultural-representations- ... via-ashgate-popular-and-folk-music-series-by-urosc.pdf.

pdf-12100\bad-girls-cultural-politics-and-media-representations-of ...
... the apps below to open or edit this item. pdf-12100\bad-girls-cultural-politics-and-media-repres ... iers-in-political-communication-by-susan-a-owen-sa.pdf.