Unsupervised, efficient and semantic expertise retrieval Christophe Van Gysel, Maarten de Rijke and Marcel Worring

What is expertise retrieval? Ò

The task of finding the right person with the appropriate skills and knowledge w.r.t. a topic.
 Ò

For example, an area chair looking for reviewers.


Ò

Document collections where documents are associated with one or more experts.


Ò

Given a textual topic (e.g., “Information Retrieval”), rank experts in descending order of expertise. Semantic expertise retrieval

2

Semantic expertise retrieval

3

Experts

Semantic expertise retrieval

3

Experts

Semantic expertise retrieval

3

Experts

Documents

Semantic expertise retrieval

3

Experts

Documents

Semantic expertise retrieval

3

Experts

Documents

Semantic expertise retrieval

3

Example:
 area chair looking for a review committee on “information retrieval"

Semantic expertise retrieval

3

Example:
 area chair looking for a review committee on “information retrieval"

Rank experts in decreasing order of expertise.

Semantic expertise retrieval

3

Query: "information retrieval"

1.

Example:
 area chair looking for a review committee on “information retrieval"

2.

3.

Rank experts in decreasing order of expertise.

4.

5.

Semantic expertise retrieval

3

How to do this? Experts

Documents

Semantic expertise retrieval

4

How to do this? Experts

Documents

First score documents using language models, then aggregate scores per expert.

Semantic expertise retrieval

4

How to do this? Experts

Documents

Concatenate documents associated with each expert into a pseudo-document for every expert.
 
 Perform retrieval using language models.

Semantic expertise retrieval

4

Challenges

Semantic expertise retrieval

5

Challenges Ò

Queries and documents use different representations to describe the same concepts.


Semantic expertise retrieval

5

Challenges Ò

Queries and documents use different representations to describe the same concepts.


Ò

Scoring the whole document collection during retrieval is costly when we are only interested in experts.


Semantic expertise retrieval

5

Challenges Ò

Queries and documents use different representations to describe the same concepts.


Ò

Scoring the whole document collection during retrieval is costly when we are only interested in experts.


Ò

Improve retrieval performance without requiring relevance judgments for machine-learned ranking.

Semantic expertise retrieval

5

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

Term

ti

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

ti

Embedding of t_i

Term

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

ti

Embedding of t_i

Term

Transform and apply soft-max

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

Distribution over experts

ti

Embedding of t_i

Term

Transform and apply soft-max

P (C | ti )

Semantic expertise retrieval

6

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C .

Semantic expertise retrieval

7

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C . P (C | “information”)

P (C | “retrieval”)

Semantic expertise retrieval

7

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C . P (C | “information”)

P (C | “retrieval”)



Semantic expertise retrieval

7

How to learn representations? Ò

Given a query q = t1 , . . . , tk , consisting of k terms, and a set of candidate experts C . P (C | “information”)

P (C | “retrieval”)



P (C | “information”“retrieval”)

=

Semantic expertise retrieval

7

How to learn representations? Ò

The embeddings and transformation is trained using batched stochastic gradient descent.


Ò

Word embeddings are specialised for the domain.

Semantic expertise retrieval

8

How to learn representations?

Semantic expertise retrieval

8

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?

Semantic expertise retrieval

9

How to learn representations?





Semantic expertise retrieval

9

How to learn representations?

⇥ = ⇥

Semantic expertise retrieval

9

How to learn representations?

⇥ = Compare with reference distribution
 using cross-entropy



Semantic expertise retrieval

9

How to learn representations?

⇥ = Compare with reference distribution
 using cross-entropy



Backpropagate errors! Semantic expertise retrieval

9

Experimental set-up

Semantic expertise retrieval

10

Experimental set-up Ò

Build and evaluate models on expert finding benchmarks Ò TREC Enterprise Track (2006 - 2008): Ò W3C (715 experts, 331k docs, 99 queries) Ò CERC (3 479 experts, 370k docs, 127 queries)
 Ò

TU Expert Collection
 (977 experts, 31k docs, 1 662 queries)

Semantic expertise retrieval

10

Experimental set-up Ò

Build and evaluate models on expert finding benchmarks Ò TREC Enterprise Track (2006 - 2008): Ò W3C (715 experts, 331k docs, 99 queries) Ò CERC (3 479 experts, 370k docs, 127 queries)
 Ò

Ò

TU Expert Collection
 (977 experts, 31k docs, 1 662 queries)

Compare the log-linear model to LSI, TF-IDF and language modelling approaches (Model 1 and Model 2).

Semantic expertise retrieval

10

What window size to choose? W3C

CERC 0.45

2005 2006

0.5

2007 2008

0.40 0.35

0.4

0.3

MAP

MAP

0.30

0.2

0.25 0.20 0.15 0.10

0.1 0.05 0.0

1 2

4

8

16

Window size

32

0.00

1 2

4

8

16

32

Window size

Semantic expertise retrieval

11

What window size to choose? W3C

CERC 0.45

2005 2006

0.5

2007 2008

0.40 0.35

0.4

0.3

MAP

MAP

0.30

0.2

0.25 0.20 0.15 0.10

0.1 0.05 0.0

1 2

4

8

16

Window size

32

0.00

1 2

4

8

16

32

Window size

Semantic expertise retrieval

11

Results

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks.

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks. Ò 17% to 86% relative increase in MAP over state-of-the-art language models.

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks. Ò 17% to 86% relative increase in MAP over state-of-the-art language models. Ò No significant difference for the other benchmarks.


Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks. Ò 17% to 86% relative increase in MAP over state-of-the-art language models. Ò No significant difference for the other benchmarks.


Ò

Compared to semantic matching methods (LSI):

Semantic expertise retrieval

12

Results Ò

Outperforms language models on 4 out of 6 benchmarks. Ò 17% to 86% relative increase in MAP over state-of-the-art language models. Ò No significant difference for the other benchmarks.


Ò

Compared to semantic matching methods (LSI): Ò Relative increase in MAP ranges from 83% to 1000%. Semantic expertise retrieval

12

Per-topic difference in MAP w.r.t.
 document-centric language models CERC 1.0

0.5

0.5

0.0

4M AP

1.0

0.0

0.5

0.5

1.0

1.0

1.0

1.0

0.5

0.5

0.0

4M AP

4M AP

4M AP

W3C

0.0

0.5

0.5

1.0

1.0

Semantic expertise retrieval

13

Per-topic difference in MAP w.r.t.
 document-centric language models CERC 1.0

0.5

0.5

0.0

4M AP

1.0

0.0

0.5

0.5

1.0

1.0

1.0

1.0

0.5

0.5

0.0

4M AP

4M AP

4M AP

W3C

0.0

0.5

0.5

1.0

1.0

Semantic expertise retrieval

13

Per-topic difference in MAP w.r.t.
 document-centric language models CERC 1.0

0.5

0.5

0.0

4M AP

1.0

0.0

0.5

0.5

1.0

1.0

1.0

1.0

0.5

0.5

0.0

4M AP

4M AP

4M AP

W3C

0.0

0.5

0.5

1.0

1.0

Semantic expertise retrieval

13

Per-topic difference in MAP w.r.t.
 document-centric language models CERC 1.00

0.5 .5

0.55

0.0 .0

4M AP

1.0 .0

0.00

0.5 .5

0.55

1.0 .0

1.00

1.0 .0

1.00

0.5 .5

0.55

0.0 .0

4M AP

4M AP

4M AP

W3C

0.00

0.5 .5

0.55

1.0 .0

1.00

Semantic expertise retrieval

13

What if we combine our approach with language models? Ò

For every topic qi, rank experts using the loglinear model and Model 2.


Ò

Combine these two rankings according to the reciprocal rank of experts cj .

rankensemble /

1

·

1

rankmodel 2 (cj , qi ) ranklog-linear (cj , qi ) Semantic expertise retrieval

14

What if we combine our approach with language models? Ò

Outperforms our approach on 5 out of 6 benchmarks.


Ò

Relative increase of 15% to 31% MAP compared to our discriminative approach by combining it with generative language models.

Semantic expertise retrieval

15

Can generative language models benefit from the log-linear model? Ò

Perform semantic query expansion by using the learned word embeddings.


Ò

For a particular query (e.g. “information retrieval”),
 add k terms closest to each of the terms in embedding space (e.g. “knowledge" and “search”).


Ò

Instead of querying for “information retrieval”,with k = 1, we query for: “information retrieval knowledge search”

Semantic expertise retrieval

16

Can generative language models benefit from the log-linear model? Ò

We notice an increase in MAP by performing semantic query expansion on most benchmarks.


Ò

Benchmarks that did not benefit were those for which our method did not outperform language models.


Ò

Our intuition: some benchmarks require semantic matching, while others benefit from lexical matching.

Semantic expertise retrieval

17

How efficient is the log-linear model compared to generative models? Ò

During retrieval, time complexity is linear with the number of experts.


Ò

Previous state-of-the art:
 linear w.r.t. the number of documents.

Semantic expertise retrieval

18

Code is available at
 https://github.com/cvangysel/sert

Semantic expertise retrieval

19

Code is available at
 https://github.com/cvangysel/sert [cvangysel@ilps SERT] ./W3C-expert-finding.sh

Semantic expertise retrieval

19

Code is available at
 https://github.com/cvangysel/sert [cvangysel@ilps SERT] ./W3C-expert-finding.sh Verifying W3C corpus. Creating output directory. Fetching topics and relevance judgments. Constructing log-linear model on W3C collection. Evaluating on TREC Enterprise tracks. 2005 Enterprise Track: ndcg=0.5474; map=0.2603; recip_rank=0.6209; P_5=0.4098; 2006 Enterprise Track: ndcg=0.7883; map=0.4937; recip_rank=0.8834; P_5=0.7000; Semantic expertise retrieval

19

Conclusions Ò

Our log-linear model performs competitively with existing methods, while taking time complexity linear w.r.t. the number of experts.


Ò

An ensemble between the log-linear model and generative language models performs best.


Ò

Word embeddings learned by the log-linear model can be used to improve retrieval with language models.

Semantic expertise retrieval

20

Thank you! Christophe Van Gysel @cvangysel

Maarten de Rijke @mdr

Marcel Worring @marcelworring

Slides will be made available on http://chri.stophr.be

Semantic expertise retrieval

21

Unsupervised, efficient and semantic expertise retrieval

Ò Build and evaluate models on expert finding benchmarks. Ò TREC Enterprise Track (2006 - 2008):. Ò W3C (715 experts, 331k docs, 99 queries). Ò CERC (3 479 experts, 370k docs, 127 queries). Ò TU Expert Collection. (977 experts, 31k docs, 1 662 queries). Ò Compare the log-linear model to LSI, TF-IDF and ...

2MB Sizes 0 Downloads 227 Views

Recommend Documents

Unsupervised, Efficient and Semantic Expertise Retrieval
a case-insensitive match of full name or e-mail address [4]. For. CERC, we make use of publicly released ... merical placeholder token. During our experiments we prune V by only retaining the 216 ..... and EX103), where the former is associated with

Unsupervised, Efficient and Semantic Expertise Retrieval
training on NVidia GTX480 and NVidia Tesla K20 GPUs. We only iterate once over the entire training set for each experiment. 5. RESULTS AND DISCUSSION. We start by giving a high-level overview of our experimental re- sults and then address issues of s

Efficient Speaker Identification and Retrieval - Semantic Scholar
Department of Computer Science, Bar-Ilan University, Israel. 2. School of Electrical .... computed using the top-N speedup technique [3] (N=5) and divided by the ...

Efficient Speaker Identification and Retrieval - Semantic Scholar
identification framework and for efficient speaker retrieval. In ..... Phase two: rescoring using GMM-simulation (top-1). 0.05. 0.1. 0.2. 0.5. 1. 2. 5. 10. 20. 40. 2. 5. 10.

Efficient Speaker Identification and Retrieval
(a GMM) to the target training data and computing the average log-likelihood of the ... In this paper we aim to (a) improve the time and storage efficiency of the ...

Enabling Efficient Content Location and Retrieval in ...
service architectures peer-to-peer systems, and end-hosts participating in such systems .... we run simulations using the Boeing corporate web proxy traces [2] to.

Enabling Efficient Content Location and Retrieval in ...
May 1, 2001 - Retrieval performance between end-hosts is highly variable and dynamic. ... miss for a web cache) as a publish in peer-to-peer system.

Enabling Efficient Content Location and Retrieval in Peer ... - CiteSeerX
Peer-to-Peer Systems by Exploiting Locality in Interests. Kunwadee ... Gnutella overlay. Peer list overlay. Content. (a) Peer list overlay. A, B, C, D. A, B, C. F, G, H.

EFFICIENT INTERACTIVE RETRIEVAL OF SPOKEN ...
between the key term ti and the corresponding document class C(ti) is defined by .... initially large number of users can be further classified into cate- gories by ...

Unsupervised Feature Selection for Biomarker ... - Semantic Scholar
Feature selection and weighting do both refer to the process of characterizing the relevance of components in fixed-dimensional ..... not assigned.no ontology.

Enabling Efficient Content Location and Retrieval in ...
May 1, 2001 - Retrieval performance between end-hosts is highly variable and dynamic. • Need to ... Peer-to-Peer Systems by Exploiting Locality in Interests.

Enabling Efficient Content Location and Retrieval in ...
The wide-spread adoption of Internet access as a utility service is enabling ... Our interests lie in peer-to-peer content publishing and distribution, where peers ...

3D Object Retrieval using an Efficient and Compact ...
[Information Storage and Retrieval]: Information Search and Retrieval. 1. ... shape descriptor that provides top discriminative power and .... plexity which is prohibitive for online interaction as well as .... degree of significance, we can intuitiv

Efficient 3D shape matching and retrieval using a ...
software development which has greatly facilitated 3D model acquisition ..... tion (similar to PCA), is the use of singular value decomposi- tion (SVD) [28]. In [22 ...

BloomCast Efficient And Effective Full-Text Retrieval In Unstructured ...
BloomCast Efficient And Effective Full-Text Retrieval In Unstructured P2P Networks.pdf. BloomCast Efficient And Effective Full-Text Retrieval In Unstructured P2P ...

Semantic-Shift for Unsupervised Object Detection - CiteSeerX
notated images for constructing a supervised image under- standing system. .... the same way as in learning, but now keeping the factors. P(wj|zk) ... sponds to the foreground object as zFG and call it the fore- ..... In European Conference on.

UNSUPERVISED LEARNING OF SEMANTIC ... - Research at Google
model evaluation with only a small fraction of the labeled data. This allows us to measure the utility of unlabeled data in reducing an- notation requirements for any sound event classification application where unlabeled data is plentiful. 4.1. Data

Unsupervised Maximum Margin Feature Selection ... - Semantic Scholar
Department of Automation, Tsinghua University, Beijing, China. ‡Department of .... programming problem and we propose a cutting plane al- gorithm to ...

Semantic Image Retrieval and Auto-Annotation by ...
Conventional information retrieval ...... Once we have constructed a thesaurus specific to the dataset, we ... the generated thesaurus can be seen at Table 5. 4.3.

Image retrieval system and image retrieval method
Dec 15, 2005 - face unit to the retrieval processing unit, image data stored in the image information storing unit is retrieved in the retrieval processing unit, and ...

LATENT SEMANTIC RETRIEVAL OF SPOKEN ...
dia/spoken documents out of the huge quantities of Internet content is becoming more and more important. Very good results for spoken document retrieval have ...

Vectorial Phase Retrieval for Linear ... - Semantic Scholar
Sep 19, 2011 - and field-enhancement high harmonic generation (HHG). [13] have not yet been fully .... alternative solution method. The compact support con- ... calculating the relative change in the pulse's energy when using Xр!Ю (which ...