Entity Linking and Retrieval for Semantic Search Edgar Meij – @edgarmeij

Yahoo Labs

!

Krisztian Balog – @krisztianbalog

University of Stavanger

!

Daan Odijk – @dodijk

University of Amsterdam

WiFi

- Network: Delta-Meeting

- Password: not needed(?)

Entity?

- Uniquely identifiable thing or object

- “A thing with a distinct and independent existence”

What’s so special about entities? - ID

- Name(s)

- Type(s)

- Attributes (/Descriptions)

- Relationships to other entities

Knowledge graphs - The “backbone” of semantic search

- They define

-

entities attributes types relations (provenance, sometimes) and more - external links, homepages, features, …

Here the aim is to identify the most significant topics; those which the document was written about (Maron, 1977). These index topics can be used to summarize the document and organize it under category-like headings. Wikipedia is a natural choice as a vocabulary for obtaining index topics, since it is broad enough to be applicable to most domains. To use Wikipedia in this way, one must go through much the same process as wikification: one must detect the significant terms being mentioned, and disambiguate these to the

Entity Linking?

for training. For every link, a Wikipedian has manually—and probably with some effort—selected the correct destination to represent the intended sense of the anchor. This provides millions of manually-defined ground truth examples to learn from. All the experiments described in this paper are based on a version of Wikipedia that was released on November 20, 2007. It contains just under two million articles. Because we wanted a reasonable number of links to use for both training and evaluation, we selected articles containing at least 50 links. We also avoided lists

Figure 1: A news story that has been automatically augmented with links to relevant Wikipedia articles Image taken from Milne and Witten (2008b). Learning to Link with Wikipedia. In CIKM '08.

Entity Linking/Retrieval?

Entity Retrieval?

Semantic search

Semantic search

- Improve search accuracy by understanding searcher intent and the contextual meaning of terms/documents/…

- Move beyond “ten blue links” (towards actually answering information needs) using rich context

‘hilton paris’

‘countries in africa’

‘good camera under $300’

Semantic search - Centers around entities

-

“Who was the first human in outer space?” “How tall is the Eiffel tower?” “Who is Brad Pitt married to?” “Where is the closest Starbucks?” “Which airlines fly the Airbus A380?” “What is the best Chinese restaurant in Montreal?”

- Entity/Attribute/Relationship retrieval

- + social, + personal - + (hyper)local

Semantic search

- Combination of entity-related techniques, 
 from various fields

-

IR NLP DB Semantic Web

IR xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx

e

e

e

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

q

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

P (q|✓e ) =

Y

P (t|✓e )n(t,q)

t2q

(1

)P (t|e) + P (t) X P (t|d, e)P (d|e) d

NLP - Question answering, relationship extraction

Databases

Semantic Web

Birds-eye view Information need

Data collection(s)

Retrieval system

Result(s)

Many ways to express Keyword Keyword++ Natural language Structured query languages

Data collection(s)

Information need

Retrieval system

Result format Ranked list

Different types of data Unstructured Semistructured Structured

Result(s)

Tuples (Sub)graphs Natural language

The possibilities are endless IR

Query

DB

NLP

Data collection

keyword

SW

Results ranked list

unstructured keyword++

tuples semistructured (sub)graphs

natural language structured structured
 query language

natural language

Our focus Query

Data collection

keyword

Results ranked list

unstructured keyword++

tuples semistructured (sub)graphs

natural language structured structured
 query language

natural language

Data collection - Unstructured

- Documents, web pages, snippets, …

- Semistructured

- XML, RDF, …

- Structured

Often organized 
 around entities

- Relational DBs, RDF, … Information need

Data collection(s)

Retrieval system

Result(s)

Popular (semi)structured 
 data sources - Wikipedia

- Wikidata

- DBpedia

- Freebase

- YAGO

Linking Open Data (LOD)?

RDFa - schema.org, sitemaps.org

- used by Google, Bing, Yandex, Yahoo!, IPTC, etc.

Queries - Keyword queries

- Single-search-box paradigm - Typical web search queries - “Telegraphic”, i.e., neither well-formed nor grammatically correct

- Keyword++ queries

- Augmented with context - form/facet-based input - location/date/TOD/…

Example keyword++ queries

Example keyword++ queries

Interplay: (un)structured data adding structure to text

xxxx x xxx xx xxxxxx

Unstructured

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx

xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx

adding text to structure

Structured

Menu - Introduction

- Part 1 – Entity Linking

- Part 2 – Entity Retrieval

- Part 3 – Semantic Search

- Wrap up


See http://ejmeij.github.io/entity-linking-and-retrieval-tutorial/ (or http://bit.ly/ELR-slides) for the slides.

Program 10:00 - 10:30 10:30 - 11:30 11:30 - 12:00 12:00 - 13:00 13:00 - 14:00 14:00 - 14:30 14:30 - 15:00 15:00 - 15:30 15:30 - 16:00

Welcome, introduction Entity linking Entity linking practical Lunch Entity retrieval Entity retrieval practical Coffee break Semantic search Wrap up, Q&A

References

http://www.mendeley.com/groups/3339761/entity-linking-and-retrieval/ (or http://bit.ly/ELR-bib)

20140615 Entity Linking and Retrieval for Semantic ... - WordPress.com

WiFi. - Network: Delta-Meeting. - Password: not needed(?). Page 3 ... Entity/Attribute/Relationship retrieval. - + social, + personal. - + (hyper)local ...

6MB Sizes 0 Downloads 270 Views

Recommend Documents

20140615 Entity Linking and Retrieval for Semantic Search ... - GitHub
Jun 15, 2014 - blog posts. - tweets. - queries. - ... - Entities: typically taken from a knowledge base. - Wikipedia. - Freebase. - ... Page 24 ... ~24M senses ...

20140615 Entity Linking and Retrieval for Semantic Search ... - GitHub
Wikipedia Miner. [Milne & Witten 2008b]. - Open source. - (Public) web service. - Java. - Hadoop preprocessing pipeline. - Lexical matching + machine learning.

20140615 Entity Linking and Retrieval for Semantic ... - WordPress.com
Freebase. - Probabilistic retrieval model for semistructured data. - Exercises. - Entity Retrieval with a probabilistic retrieval model for semistructured data ...

20140615 Entity Linking and Retrieval for Semantic ... - WordPress.com
Lazy random walk on entity networks extracted from. Wikipedia ... The entity networks are similar, but Yahoo! ... Other “dimensions” of relevance. - recency. - interestingness. - popularity. - social ... Assume you want to build a “semantic”

Semantic Search Interface for Entity/Fact Retrieval
a semantic knowledge base containing the extracted data and a semantic search ... process; H.3.5 [Information Storage and Retrieval]: Online. Information ...

entity retrieval - GitHub
Jun 15, 2014 - keyword unstructured/ semi-structured ranked list keyword++. (target type(s)) ..... Optimization can get trapped in a local maximum/ minimum ...

Strong Baselines for Cross-Lingual Entity Linking - Stanford NLP
documents with articles in a knowledge base (KB). In the two earliest TAC-KBPs, the KB was a subset of the English. Wikipedia, and the documents were also in ...

Strong Baselines for Cross-Lingual Entity Linking - Stanford NLP Group
managed to score above the median entries in all previ- ous English entity ... but now also any non-English Wikipedia pages covered by the cross-mapper ...

Vectorial Phase Retrieval for Linear ... - Semantic Scholar
Sep 19, 2011 - and field-enhancement high harmonic generation (HHG). [13] have not yet been fully .... alternative solution method. The compact support con- ... calculating the relative change in the pulse's energy when using Xр!Ю (which ...

Discriminative Models for Information Retrieval - Semantic Scholar
Department of Computer Science. University ... Pattern classification, machine learning, discriminative models, max- imum entropy, support vector machines. 1.

Unsupervised, Efficient and Semantic Expertise Retrieval
a case-insensitive match of full name or e-mail address [4]. For. CERC, we make use of publicly released ... merical placeholder token. During our experiments we prune V by only retaining the 216 ..... and EX103), where the former is associated with

Unsupervised, Efficient and Semantic Expertise Retrieval
training on NVidia GTX480 and NVidia Tesla K20 GPUs. We only iterate once over the entire training set for each experiment. 5. RESULTS AND DISCUSSION. We start by giving a high-level overview of our experimental re- sults and then address issues of s

Efficient Speaker Identification and Retrieval - Semantic Scholar
identification framework and for efficient speaker retrieval. In ..... Phase two: rescoring using GMM-simulation (top-1). 0.05. 0.1. 0.2. 0.5. 1. 2. 5. 10. 20. 40. 2. 5. 10.

Efficient Speaker Identification and Retrieval - Semantic Scholar
Department of Computer Science, Bar-Ilan University, Israel. 2. School of Electrical .... computed using the top-N speedup technique [3] (N=5) and divided by the ...

Entity Linking in Web Tables with Multiple Linked Knowledge Bases
in Figure 1, EL aims to link the string mention “Michael Jordan” to the entity ... the first row of Figure 1) in Web tables, entity types in the target KB, and so ..... science, services and agents on the world wide web 7(3), 154–165 (2009) ...

Stanford-UBC Entity Linking at TAC-KBP - Stanford NLP Group
Computer Science Department, Stanford University, Stanford, CA, USA. ‡ .... Top Choice ... training data with spans that linked to a possibly related entity:.

Efficient Spectral Neighborhood Blocking for Entity ... - Semantic Scholar
106, no. 50, pp. 21 068–21 073, 2009. [23] G. Salton, A. Wong, and C. S. Yang, “A vector space model for automatic indexing,” Communications of the ACM, vol. 18, no. 11, pp. 613–620, 1975. [24] P. McNamee and J. Mayfield, “Character п-gram

An active feedback framework for image retrieval - Semantic Scholar
Dec 15, 2007 - coupled and can be jointly optimized. ..... space when user searches for cars images. ..... Optimizing search engines using click through data.

A Semantic Content-Based Retrieval Method for ...
image database systems store images as a complementary data of textual infor- mation, providing the ... Lecture Notes in Computer Science - Springer Verlag ...

Semantic Image Retrieval and Auto-Annotation by ...
Conventional information retrieval ...... Once we have constructed a thesaurus specific to the dataset, we ... the generated thesaurus can be seen at Table 5. 4.3.

Citation-based retrieval for scholarly publications - Semantic Scholar
for and management of information. Some commercial citation index ... database. Publications repository. Indexing client. Intelligent retrieval agent. Citation indexing agent. Indexing client. Retrieval client. Retrieval client. Figure 1. The scholar

20140615 30Km - Trail.pdf
8 LEPORCQ JEAN PHILIPPE 1736 M V1 4 02:40:01.12. 9 SIBER JONAS 1337 M SE 5 02:40:02.05. 10 HUMBERT CAMILLE 1595 M SE 6 02:40:25.11.

Distributed Kd-Trees for Retrieval from Very Large ... - Semantic Scholar
covers, where users can take a photo of a book with a cell phone and search the .... to supply two functions: (1) Map: takes an input pair and produces a set of ...