Enhancing Expert Search through Query Modeling Pavel Serdyukov1 , Sergey Chernov2 , and Wolfgang Nejdl2 1

Database Group, University of Twente, PO Box 217, 7500 AE Enschede, The Netherlands 2 L3S / University of Hannover, Appelstr. 9a D-30167 Hannover, Germany [email protected], {chernov, nejdl}@l3s.de

Abstract. An expert finding is a very common task among enterprise search activities, while its usual retrieval performance is far from the quality of the Web search. Query modeling helps to improve traditional document retrieval, so we propose to apply it in a new setting. We adopt a general framework of language modeling for expert finding. We show how expert language models can be used for advanced query modeling. A preliminary experimental evaluation on TREC Enterprise Track 2006 collection shows that our method improves the retrieval precision on the expert finding task.

1

The Expert Finding Task

New challenges for the information retrieval research community are posed by the emerging field of Enterprise Search [2]. The diversity of complex information needs within a typical enterprise together with heterogeneity of Intranet data make it difficult to improve the quality of search in general. Instead, researchers concentrate on several important search tasks. One important example of such a task is finding a relevant expert within an organization. This problem implies that user needs to find the most knowledgeable expert to answer her query personally. User submits several keywords to a local Intranet search engine and receives a set of experts, ranked by their likelihood to be an expert for the query. The current developments in expert search are driven by the Expert Finding task within the TREC 2006 Enterprise Track initiative 3 . So far, one of the most comprehensive descriptions of the problem and possible solutions using language modeling approach is presented in [1]. We also adopt a theoretically-sound language modeling method, while using different techniques for the model estimation and ranking. Numerous ad-hoc query expansion and language model based query modeling methods operate on the top-k ranked documents. At the same time, nobody applied these methods in the scope of expert finding task, what appears to be an omission in our opinion. Our algorithm allows performing a query modeling which consists of pseudorelevance feedback and query expansion. To the best of our knowledge, this is the first study of query modeling applied to the expert search task. The preliminary evaluation on the official TREC Enterprise Track 2006 test collection shows that our method improves the retrieval performance. 3

http://www.ins.cwi.nl/projects/trec-ent/wiki/index.php

2

Expert Finding as a 2-Step Process

A comprehensive description of a language modeling approach to expert finding task is presented in [1]. We adopt the notation from this work and omit some details of model estimation; an interested reader can refer to the original paper. The Step-1 of our expert finding method is similar to Model 1 approach from [1], while The Step-2 contains the actual refinement and is essentially the core of our proposal. 2.1

Step 1: Using Language Model for Expert Ranking

The basic idea of language modeling is to estimate a language model for each expert, and then to rank experts by cross-entropy of estimated query model w.r.t. expert language model [3]. In our setup, each document d in the collection is associated with a candidate ca, the association is defined as a(d, ca). Expert finding problem according to a probability ranking principle in IR is rephrased as: “What is the probability of a candidate ca to be an expert given the query q?” Each candidate ca is represented by a multinomial probability distribution p(t|ca) over a term vocabulary. Expert language model θca is computed as the maximum likelihood estimate of a term generation probability, smoothed by the background language model. The query q is also represented by the probability distribution p(t|q), and a query language model is denoted as θq . So, the system output should contain the ranking of candidates in descending order of crossentropy between language models θq and θca . A cross-entropy of query w.r.t. expert models is computed as shown in Eq.1: X ExpertScoreca (q) = − p(t|θq ) log p(t|θca ) (1) t∈q

The top-k experts with the highest scores are returned to the system (not to the user) as a result of a Step 1, where k is set empirically. So far we described the state-of-art approach, while Step 2 contains our enhancement for the expert search. 2.2

Step 2: Expert Ranking Refinement Using Query Modeling

In order to model a user query more precisely we need a source of additional knowledge about her information need. Traditionally, top-k documents for the query served in IR as such a source and were used to build an expanded and detailed query model. Expert search is a task which differs noticeably from a standard document retrieval. Users search not for the specific pieces of information, but for people who are actually generators and/or collectors of the information. It means that despite the query can be very specific, the experts in this topic can have an expertise in related topics too. Moreover, the broader their expertise, the higher are chances that they can consult on a more specific question. Therefore, we need to utilize two evidences about user information need in the context of expert finding task: 1. The top-k documents retrieved from the whole collection (using classic LM approach to document retrieval)

2. The top-k persons which we could consider relevant experts (retrieved on a Step 1). The first source enriches our knowledge about the initial user information need. Whereas second one makes it less specific and relaxes a query towards a broader topic. So, as a new query model we use a mixture of two query models: document-based (DocumentBasedN ewθq ) built on top-k documents and expert-based (ExpertBasedN ewθq ) built on top-k experts:

p(t|N ewθq ) = λp(t|DocumentBasedN ewθq ) + (1 − λ)p(t|ExpertBasedN ewθq ) (2) For the both query models estimation, instead of the methods proposed in [1], we use a principled and theoretically-sound method by Zhai and Lafferty from [3], which in our previous experiments for distributed IR outperformed other similar algorithms. Once it is computed, we mix the new query model with an initial query to prevent a topic drift. As a result, we build a new expert ranking using expanded query and term generation probabilities. In Eq.3 we again measure a cross-entropy, but using a new query model: N ewExpertScoreca (q) = −

X

p(t|N ewθq ) log p(t|θca )

(3)

t∈q

3

Preliminary Results and Discussion

In our experiments we used the W3C collection, provided by the TREC 2006 Enterprise Track, and the Lucene4 open source information retrieval library. We indexed the mailing lists of W3C dataset5 and searched for the Title query part of the official topics of the Expert Finding task 2006. The comparison between precision at first 10 results (P@10) of baseline method (Step 1 only) and our method (Step 1 and Step 2) is presented on the Fig. 1 and Fig. 2. We observe that the improvement of our method is promising, while not significant in the current experiment. Our method is effective when an average precision is high already at the step 1, and fails where average precision is below median. This is explainable since our method uses best top-k experts and documents from the Step 1 for the following query modeling. If the initial ranking is poor, the query modeling is poor too. But the precision for the best queries was improved by 10-20%, so this method is suitable to apply on top of already effective retrieval systems. It appears that a prediction of query performance could be crucial for a query modeling. The further study of the expert-search-specific query modeling and predicting of a query performance is the main focus of our future research. 4 5

http://lucene.apache.org/ For a rapid experimental setup we used only the mailing list part, while we are planning to evaluate our method on the whole collection later.

Performance comparison between the baseline language modeling mehtod and query modeling approach

1 Baseline Query Modeling

0.9

Precision at 10 (P@10)

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1

4

7

10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 Queries

Fig. 1. Performance of the baseline language modeling ranking and query modeling approach. Performance comparison between the baseline language modeling mehtod and query modeling approach 0.25 Query Modeling w.r.t. baseline 0.2

Precision at 10 (P@10)

0.15 0.1 0.05 0 1

3

5

7

9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53

-0.05 -0.1 -0.15 Queries

Fig. 2. Difference in performance of the baseline language modeling ranking and query modeling approach.

References 1. K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In SIGIR ’06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, USA, pages 43–50. ACM Press, 2006. 2. D. Hawking. Challenges in enterprise search. In Proceedings of the Australasian Database Conference ADC2004, pages 15–26, Dunedin, New Zealand, 2004. 3. C. Zhai and J. D. Lafferty. Model-based feedback in the language modeling approach to information retrieval. In CIKM’01: Proceedings of the 2001 ACM CIKM International Conference on Information and Knowledge Management, Atlanta, Georgia, USA, November 5-10, 2001, pages 403–410, 2001.

Enhancing Expert Search through Query Modeling - Semantic Scholar

... performance. 3 http://www.ins.cwi.nl/projects/trec-ent/wiki/index.php ... A comprehensive description of a language modeling approach to expert finding task is.

114KB Sizes 2 Downloads 294 Views

Recommend Documents

Enhancing Expert Search through Query Modeling - Semantic Scholar
... performance. 3 http://www.ins.cwi.nl/projects/trec-ent/wiki/index.php ... The basic idea of language modeling is to estimate a language model for each expert,.

Enhancing product search by best-selling ... - Semantic Scholar
ABSTRACT. With the rapid growth of E-Commerce on the Internet, online prod- uct search service has emerged as a popular and effective paradigm for customers to find desired products and select transactions. Most product search engines today are based

Geo-location for Voice Search Language Modeling - Semantic Scholar
guage model: we make use of query logs annotated with geo- location information .... million words; the root LM is a Katz [10] 5-gram trained on about 695 billion ... in the left-most bin, with the smallest amounts of data and LMs, either before of .

Thu.O10b.03 Voice Query Refinement - Semantic Scholar
interface problem that we call input refinement. Input ... query from the voice input are presented in Section 3 and the data used to .... sources that include voice input text from Android, tran- scribed and ... 2-way human agreement. weights or ...

Context-Aware Query Recommendation by ... - Semantic Scholar
Oct 28, 2011 - JOURNAL OF THE ROYAL STATISTICAL SOCIETY,. SERIES B, 39(1):1–38, 1977. [5] B. M. Fonseca, P. B. Golgher, E. S. de Moura, and. N. Ziviani. Using association rules to discover search engines related queries. In Proceedings of the First

Query Rewriting using Monolingual Statistical ... - Semantic Scholar
expansion terms are extracted and added as alternative terms to the query, leaving the ranking function ... sources of the translation model and the language model to expand query terms in context. ..... dominion power va. - dominion - virginia.

Thu.O10b.03 Voice Query Refinement - Semantic Scholar
sources that include voice input text from Android, tran- scribed and typed search .... formulation strategies in web search logs,” CIKM, Jan 2009. [7] M. Whittle, B.

Web Query Recommendation via Sequential ... - Semantic Scholar
wise approaches on large-scale search logs extracted from a commercial search engine. Results show that the sequence-wise approaches significantly outperform the conventional pair-wise ones in terms of prediction accuracy. In particular, our MVMM app

Web Query Recommendation via Sequential ... - Semantic Scholar
Abstract—Web query recommendation has long been con- sidered a key feature of search engines. Building a good Web query recommendation system, however, is very difficult due to the fundamental challenge of predicting users' search intent, especiall

Context-Aware Query Recommendation by ... - Semantic Scholar
28 Oct 2011 - ABSTRACT. Query recommendation has been widely used in modern search engines. Recently, several context-aware methods have been proposed to improve the accuracy of recommen- dation by mining query sequence patterns from query ses- sions

Deciphering Trends In Mobile Search - Semantic Scholar
Aug 2, 2007 - PDA and computer-based queries, where the average num- ber of words per ... ing the key and the system cycles through the letters in the order they're printed. ... tracted from that 5 seconds to estimate the network latency (the ..... M

Persuasion through Selective Disclosure - Semantic Scholar
A wealthy urban liberal sees different ads online than a working- ... image ad that associates a family-oriented lifestyle with your car brand to auto ...... selective disclosure depends now crucially on the degree of competition. ..... University of

Scalable search-based image annotation - Semantic Scholar
for image dataset with unlimited lexicon, e.g. personal image sets. The probabilistic ... more, instead of mining annotations with SRC, we consider this process as a ... proposed framework, an online image annotation service has been deployed. ... ni

SEARCH COSTS AND EQUILIBRIUM PRICE ... - Semantic Scholar
Jul 5, 2013 - eBay is the largest consumer auction platform in the world, and the primary ... posted-prices under standard assumptions and common production costs (e.g., the Peters and Severinov 2006 model for auctions ..... (e.g., prices) between ve

Scalable search-based image annotation - Semantic Scholar
query by example (QBE), the example image is often absent. 123 ... (CMRM) [15], the Continuous Relevance Model (CRM) [16, ...... bal document analysis.

Acoustic Modeling Using Exponential Families - Semantic Scholar
For general exponential models, there is no analytic solution for maximizing L(θ) and we use gradient based numerical op- timization methods. This requires us ...

MODELING OF SPIRAL INDUCTORS AND ... - Semantic Scholar
50. 6.2 Inductor. 51. 6.2.1 Entering Substrate and Layer Technology Data. 52 ... Smith chart illustration the effect the of ground shield. 75 with the outer circle ...

ACOUSTIC MODELING IN STATISTICAL ... - Semantic Scholar
The code to test HMM-based SPSS is available online [61]. 3. ALTERNATIVE ..... Further progress in visualization of neural networks will be helpful to debug ...

MODELING OF SPIRAL INDUCTORS AND ... - Semantic Scholar
ground shield (all the coupling coefficients are not shown). 8. Lumped ... mechanisms modeled( all coupling coefficients not shown). 21. ...... PHP-10, June 1974,.

Affective Modeling from Multichannel Physiology - Semantic Scholar
1 School of Electrical and Information Engineering, University of Sydney, Australia ..... Andre, E.: Emotion Recognition Based on Physiological Changes in Music.

structured language modeling for speech ... - Semantic Scholar
20Mwds (a subset of the training data used for the baseline 3-gram model), ... it assigns probability to word sequences in the CSR tokenization and thus the ...

Persuasion through Selective Disclosure - Semantic Scholar
authorities have been pressuring internet giants such as Facebook and Google ...... under symmetry, i.e., when preferences over all options M are symmetric and ...

Stable communication through dynamic language - Semantic Scholar
texts in which particular words are used, or the way in which they are ... rules of grammar can only be successfully transmit- ted if the ... are much more likely to pass through the bottleneck into the ... ternal world is not sufficient to avoid the