Making Sense of Word Embeddings 1

2

1

1

Maria Pelevina , Nikolay Arefiev , Chris Biemann , Alexander Panchenko 1

2

TU Darmstadt, Germany



Moscow State University, Russia

Introduction



We present a simple yet effective approach for learning word sense embeddings. In contrast to existing techniques, which either directly learn sense representations from corpora or rely on sense inventories from lexical resources, our approach can induce a sense inventory from existing word embeddings via clustering of egonetworks of related words. An integrated WSD mechanism enables labeling of words in context with learned sense vectors, which gives rise to downstream applications. Experiments show that the performance of our method is comparable to state-of-theart unsupervised WSD systems.

Word Sense Induction

Learning Sense Embeddings from Word Embeddings Learning Word Vectors Text Corpus

Word Vectors

Calculate Word Similarity Graph

Visualization of the ego-network of the word "table" with the "furniture" and the "data" sense clusters.

Word Similarity Graph Pooling of Word Vectors

Word Sense Induction Sense Inventory

Sense Vectors

Schema of the word sense embeddings learning method.

Word sense clusters from inventories derived from the Wikipedia corpus via crowdsourcing (TWSI), JoBimText (JBT) and word embeddings (w2v).

Word Sense Disambiguation with Sense Embeddings Context representation based on k context or word vectors:

Similarity- and probability-based disambiguation in context:

Sense vector:

Filtering the k context words: Neighbours of the word “table” and its senses.

Results: WSD Evaluation on the TWSI and the SemEval 2013 Task 13 Datasets

Performance of our method trained on the Wikipedia corpus on the full (on the left) and on the sense-balanced (on the right) TWSI dataset.

The best configurations of our method on the SemEval 2013 Task 13 dataset. All systems were trained on the ukWaC corpus

Code & Data: https://github.com/tudarmstadt-lt/sensegram Data & Code: http://github.com/cental/stc

Introduction Results: WSD Evaluation on the TWSI and the ... - GitHub

Introduction. We present a simple yet effective approach for learning word sense embeddings. In contrast to existing techniques, which either directly learn ...

881KB Sizes 0 Downloads 113 Views

Recommend Documents

Some results on the optimality and implementation of ...
be used as means of payment the same way money can. ... cannot make binding commitments, and trading histories are private in a way that precludes any.

Introduction - GitHub
software to automate routine labor, understand speech or images, make diagnoses ..... Shaded boxes indicate components that are able to learn from data. 10 ...... is now used by many top technology companies including Google, Microsoft,.

Introduction - GitHub
data. There are many ways to learn functions, but one particularly elegant way is ... data helps to guard against over-fitting. .... Gaussian processes for big data.

On Keyboards and Things... - GitHub
The problem with this is that bigrams like ST would jam the typewriter by ... Issues with QWERTY. Many common letter .... 2 Change layouts on your computer.

On the Complexity and Performance of Parsing with ... - GitHub
seconds to parse only 31 lines of Python. ... Once these are fixed, PWD's performance improves to match that of other ...... usr/ftp/scan/CMU-CS-68-earley.pdf.