Precision-Oriented Query Facet Extraction Weize Kong and James Allan Center for Intelligent Information Retrieval College of Information and Computer Sciences University of Massachusetts Amherst

What are query facets? baggage allowance Facet 1 ❑AA ❑Delta ❑JetBlue

Facet 2 ❑Business ❑Economy

Facet 3 ❑International ❑Domestic

2

What are query facets? baggage allowance Facet 1 ❑AA ❑Delta ❑JetBlue

Facet 2 ❑Business ❑Economy

Facet 3 ❑International ❑Domestic

• A list of terms in a semantic class • One aspect/facet of the query

2

What are query facets? baggage allowance Facet 1 ❑AA ❑Delta ❑JetBlue

Facet 2 ❑Business ❑Economy

Facet 3 ❑International ❑Domestic

• A list of terms in a semantic class • One aspect/facet of the query • •

Helps clarify search intent Assists faceted query and exploratory search 2

Query facet extraction [Kong & Allan SIGIR’13]

baggage allowance

Step 1: apply patterns search results

Candidate facets 1

Delta, Facebook, Login

2

AA, Delta, British Airways

3

JetBlue, first, business, economy

… …

Step 2: refine facets Query Facets 1

AA, Delta, JetBlue, …

2

international, domestic

3

weight, size, quantity

4

business, economy 3

Query facet extraction [Kong & Allan SIGIR’13]

baggage allowance

Step 1: apply patterns search results

Candidate facets 1

Delta, Facebook, Login

2

AA, Delta, British Airways

3

JetBlue, first, business, economy

… …

Step 2: refine facets Query Facets 1

AA, Delta, JetBlue, …

2

international, domestic

3

weight, size, quantity

4

business, economy 3

Faceted search

4

Faceted search

Facets not available for the web

4

Using query facets to extend faceted search to the web [Kong & Allan CIKM’14]

baggage allowance American Airlines Baggage Allowance Information www.aa.com/i18n/.../baggage/baggageAllowance.jsp Airline baggage allowance information from netflights www.netflights.com Delta Baggage | Baggage Fees | Delta Air Lines www.delta.com/content/www/en_US/.../baggage.html United Airlines - Baggage Information | Baggage Policy www.gsa.gov ⋮

re-rank to the top

users select terms

Facet 1 ❑ AA x Delta ❑ ❑ JetBlue Facet 2 x International ❑ ❑ Domestic Facet 3 ❑ Weight ❑ Size ❑ Quantity Facet 4 ❑ Business x Economy ❑

5

Precision-oriented scenarios Ideal Facet 1 ❑ AA ❑ Delta ❑ JetBlue Facet 2 ❑ International ❑ Domestic Facet 3 ❑ Weight ❑ Size ❑ Quantity Facet 4 ❑ Business ❑ Economy 6

Precision-oriented scenarios Ideal

High “recall”

Facet 1 ❑ AA ❑ Delta ❑ JetBlue Facet 2 ❑ International ❑ Domestic Facet 3 ❑ Weight ❑ Size ❑ Quantity Facet 4 ❑ Business ❑ Economy

Facet 1 ❑ Delta ❑ Economy ❑ AA ❑ JetBlue Facet 2 ❑ Boarding ❑ Lounges Facet 3 ❑ International ❑ Domestic ❑ Business Facet 4 ❑ Quantity ❑ Weight ❑ Size

High “precision” Facet 1 ❑ AA ❑ Delta Facet 2 ❑ Weight ❑ Size Facet 3 ❑ Business ❑ Economy ❑ Lounges

6

Precision-oriented scenarios Ideal

High “recall”

Facet 1 ❑ AA ❑ Delta ❑ JetBlue Facet 2 ❑ International ❑ Domestic Facet 3 ❑ Weight ❑ Size ❑ Quantity Facet 4 ❑ Business ❑ Economy

Facet 1 ❑ Delta ❑ Economy Users would ❑ AA prefer this ❑ JetBlue Facet 2 ❑ Boarding ❑ Lounges Facet 3 ❑ International ❑ Domestic ❑ Business Facet 4 ❑ Quantity ❑ Weight ❑ Size

High “precision” Facet 1 ❑ AA ❑ Delta Facet 2 ❑ Weight ❑ Size Facet 3 ❑ Business ❑ Economy ❑ Lounges

6

Precision-oriented scenarios Ideal

High “recall”

High “precision”

Facet 1 Facet 1 Facet 1 ❑ Delta ❑ AA ❑ AA ❑ Economy ❑ Delta Users would ❑ Delta ❑ AA ❑ JetBlue prefer this Facet 2 ❑ JetBlue Facet 2 ❑ Weight Facet 2 ❑ International ❑ Size ❑ Boarding Users care more about the correctness of presented ❑ Domestic ❑ Lounges Facet 3 facets than the completeness of them. Facet 3 Facet 3 ❑ Business ❑ Weight ❑ International ❑ Economy ❑ Domestic ❑ Size ❑ Lounges ❑ Business ❑ Quantity Facet 4 Facet 4 ❑ Quantity ❑ Business ❑ Weight ❑ Economy ❑ Size

6

Previous models don’t work so well under precisionoriented scenarios  Low precision 0.7

LDA

QDM

QFI

QFJ

0.6 0.5

0.4450

0.4 0.3 0.2 0.1 0

Term precision

Term recall

Term clustering F1

7

Overview of this work • Improve our previous extraction model under precision-oriented scenarios – Likelihood is a bad training objective – Directly optimize the performance measure

8

Overview of this work • Improve our previous extraction model under precision-oriented scenarios – Likelihood is a bad training objective – Directly optimize the performance measure Poor performing queries E.g. “self motivation”

Well performing queries E.g. “used cars”

8

Overview of this work • Improve our previous extraction model under precision-oriented scenarios – Likelihood is a bad training objective – Directly optimize the performance measure

• Selective query faceting – Avoid showing facets for poor preforming queries – Only trigger faceting for well performing ones – Predict extraction performance

8

Overview of this work • Improve our previous extraction model under precision-oriented scenarios – Likelihood is a bad training objective – Directly optimize the performance measure

• Selective query faceting – Avoid showing facets for poor preforming queries – Only trigger faceting for well performing ones – Predict extraction performance

• Improve evaluation measures – not included in this talk 8

Optimize the performance measure

9

Evaluation measure • Compare with human created facets ❑ International ❑ AA ❑ Domestic ❑ Delta ❑ Business ❑ Twitter

Compare

Extracted facets

❑ AA ❑ International ❑ Business ❑ Delta ❑ Domestic ❑ Economy ❑ JetBlue

Annotator facets (ground truth)

• Measures: how to measure similarity – Term classification – Term clustering

10

𝑃𝑅𝐹𝛼,𝛽 • Combine three factors – Term Precision – Term Recall – Pair F1 (term clustering F-measure)

• Using weighted harmonic mean 𝑃𝑅𝐹𝛼,𝛽

𝛼 2 + 𝛽2 + 1 = 2 𝛽2 𝛼 1 + + 𝑇𝑃 𝑇𝑅 𝑃𝐹

Adjust emphasis between factors 11

𝑃𝑅𝐹𝛼,𝛽 • Combine three factors – Term Precision – Term Recall – Pair F1 (term clustering F-measure)

• Using weighted harmonic mean 𝑃𝑅𝐹𝛼,𝛽

𝛼 2 + 𝛽2 + 1 = 2 𝛽2 𝛼 1 + + 𝑇𝑃 𝑇𝑅 𝑃𝐹

Adjust emphasis between factors

Hold α=1 β=1: equal importance β=½:TR ½ important as TP, PF β=⅓:TR ⅓ important as TP, PF ⋮ [Rijsbergen 1979] 11

𝑃𝑅𝐹𝛼,𝛽 Performance measure

Query faceting model 12

Optimize 𝑃𝑅𝐹𝛼,𝛽 directly

𝑢 𝜃 =

∗ ∗ 𝑃𝑅𝐹 (𝑌 , 𝑍 ; 𝜃) 𝛼,𝛽 𝑃𝑅𝐹

(𝑌 ∗ ,𝑍 ∗ )

𝛼,𝛽

Performance measure Training objective

Empirical utility maximization Query faceting model 12

Optimize 𝑃𝑅𝐹𝛼,𝛽 directly • But it’s difficult 𝑦𝑖 = 1 𝑃 𝑦𝑖 = 1 > 𝜆 Non-continuous, non-differentiable 

• Solution: approximation by its expectation 𝑦𝑖 = 𝐸 𝑦𝑖 = 𝑃 𝑦𝑖 = 1; 𝜃 𝑃𝑅𝐹𝛼,𝛽 = 𝐸 𝑃𝑅𝐹𝛼,𝛽 ≈ 𝑃𝑅𝐹𝛼,𝛽 (𝑌, 𝑍) Independence assumption 13

Compare EUM & MLE 0.57 †

0.54









† EUM

𝑃𝑅𝐹1,𝛽

0.51 0.48

MLE

0.45 0.42 EUM: trained by optimize 𝑃𝑅𝐹1,0.5 MLE: trained by optimize likelihood Both for QFJ model

0.39 0.36

Down-weight term recall

0.33 0

1

2

3

4

5

6

7

8

9 10

†: Significant (p<0.05) over MLE baselines

1/𝛽 14

Utility is a better learning objective than EUM & MLE likelihoodCompare for precision-oriented scenarios. 0.57 †

0.54









† EUM

𝑃𝑅𝐹1,𝛽

0.51 0.48

MLE

0.45 0.42 EUM: trained by optimize 𝑃𝑅𝐹1,0.5 MLE: trained by optimize likelihood Both for QFJ model

0.39 0.36

Down-weight term recall

0.33 0

1

2

3

4

5

6

7

8

9 10

†: Significant (p<0.05) over MLE baselines

1/𝛽 14

Selective query faceting

15

Selective query faceting Only trigger faceting for well performing queries

16

Predicting Extraction Performance • Predict 𝑃𝑅𝐹𝛼,𝛽 based on its expectation 1

Real PRF

0.8 0.6 0.4 0.2

Feature 𝑃𝑅𝐹

0 0

0.5

Correlation

p-value

0.6112

1.4 × 10−11

1

Predicted PRF Results based on 10-fold cross validation 17

Predicting Extraction Performance • Predict 𝑃𝑅𝐹𝛼,𝛽 based on its expectation Threshold 1

Real PRF

0.8 0.6 0.4 0.2

Feature 𝑃𝑅𝐹

0 0

0.5

Correlation

p-value

0.6112

1.4 × 10−11

1

Predicted PRF Results based on 10-fold cross validation 17

Predicting Extraction Performance • Predict 𝑃𝑅𝐹𝛼,𝛽 based on its expectation Threshold 1

Real PRF

0.8 0.6 0.4 0.2

Feature 𝑃𝑅𝐹

0 0

0.5

Correlation

p-value

0.6112

1.4 × 10−11

1

Predicted PRF Results based on 10-fold cross validation 17

Predicting Extraction Performance • Predict 𝑃𝑅𝐹𝛼,𝛽 based on its expectation Threshold 1

Real PRF

0.8 0.6 0.4 0.2

Feature 𝑃𝑅𝐹

0 0

0.5

Correlation

p-value

0.6112

1.4 × 10−11

1

Predicted PRF Results based on 10-fold cross validation 17

Performance for the selected queries

𝑃𝑅𝐹1,1 = 0.5792, when 20 queries selected

𝑃𝑅𝐹1,1 = 0.4720, when

not applying selectively faceting Gray area indicates standard error with 95% confidence intervals. 18

Performance for the selected queries Selective query faceting can improve average performance with fair coverage of the search traffic. 𝑃𝑅𝐹1,1 = 0.5792, when 20 queries selected

𝑃𝑅𝐹1,1 = 0.4720, when

not applying selectively faceting Gray area indicates standard error with 95% confidence intervals. 18

Conclusions • Precision-oriented scenarios • Use utility objective instead of likelihood • Expectation-based approximation is effective • Selective query faceting can be useful

19

Future work • Label query facet

Facet 1 ❑ AA ❑ Delta ❑ JetBlue

Airline

• Rank/select facets and facet terms

– Critical for mobile search (smaller screen)

• Use query facets for exploratory search – Recall-oriented? – How to set the task and evaluate? 20

Thanks Demo to play with =)

http://brooloo.cs.umass.edu

21

Extending Faceted Search to the General Web

Query facet extraction. Step 1: apply patterns. Candidate facets. 1 Delta, Facebook, Login. 2 AA, Delta, British Airways. 3 JetBlue, first, business, economy …

1MB Sizes 0 Downloads 211 Views

Recommend Documents

Browsing-oriented Semantic Faceted Search
search solutions assume a precise information need, and thus optimise rel- ... 4], databases [7, 3, 2] and semantic data [18, 21, 13] (referred to as semantic ...

Diamond Browser: Faceted Search on Mobile Devices
image running on Amazon's EC2 environment and uses. Lucene/Solr, Lucidworks Enterprise, and SolrPHPClient to. HCIR 2011, October 20, 2011, Mountain ...

Diamond Browser: Faceted Search on Mobile Devices
ABSTRACT. Faceted search interfaces are commonly used on the Web, especially on sites for online shopping, document collections, and library catalogs. As use of mobile web devices such as smartphones and tablet computers with smaller screens has incr

Search:'?The Web " Tripod
Search:'?The Web " Tripod what ¡5 solar energy ... Ayópaoí Online m Kivr|TÓ tioj. SÉAtiç 50€ ФeпvfjrEpa. Movo .... lSES meeting Paris, 1973. 3. С H Castel SES ...

Extending the UTAUT model to understand the ...
Mar 17, 2017 - Design/methodology/approach – A conceptual framework was developed through extending the unified theory of acceptance and use of ...

Extending Design Environments to Software ...
However, in building the Argo software architecture ..... While working on the architecture of the basic KLAX game, the architect places the TileArtist ... server-side Spelling component might be too slow in a future multi-player product, so he or ..

General Auction Mechanism for Search Advertising
F.2.2 [Theory of Computation]: Analysis of Algorithms and Prob- lem Complexity—Nonnumerical Algorithms ... search engine. Internet advertising and sponsored ...

Proposal to discontinue TREC Federated Web Search in 2015
Federated search is the approach of querying multiple search engines ... the provided additional (structured) information, and the ranking approach used. ... task's vertical orientation is more important than the topical relevance of the retrieved ..

Extending the Entity-grid Coherence Model to ...
API (Strube & Ponzetto, 2006). 1 Introduction ... quired to evaluate the output of a NLG system is expensive ... for any other text-to-text generation system with a.

Extending Bayesian Networks to the Open-Universe Case
dependency statements, although we use a syntax based loosely on Blog [ ...... Another important thread goes under the name of probabilistic programming.

Proposal to discontinue TREC Federated Web Search in 2015
combining their results into one coherent search engine result page. Building ... the provided additional (structured) information, and the ranking approach used.

Using Web Search Query Data to Monitor Dengue ... - CiteSeerX
May 31, 2011 - and analysis, decision to publish, and preparation of the manuscript. ... Such a tool would be most ... Tools have been developed to help.

Enhancing mobile search using web search log data
Enhancing Mobile Search Using Web Search Log Data ... for boosting the relevance of ranking on mobile search. .... However, due to the low coverage.

Extending Bayesian Networks to the Open-Universe Case
alarm system that may be set off by a burglary or an earthquake, and two ... authors, venues, and so on are not known in advance, nor is the mapping between ... language processing, data association in multitarget tracking, and record linkage ...

Extending the Database Relational Model to Capture ...
(I) the search for meaningful units that are as small as possible--atomic semantics; ... modeling plus the introduction of new rules for insertion, update, and ... During the last few years numerous investigations have been aimed at capturing .... at

web search engines pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...

Composite Retrieval of Heterogeneous Web Search
mote exploratory search in a way that helps users understand the diversity of ... forward a new approach for composite retrieval, which we refer ... budget and compatibility (e.g. an iPhone and all its accessories). ..... They call this approach Prod

Speed Matters for Google Web Search - Services
22 Jun 2009 - web search latency 100 to 400 ms reduces the daily number of ... periments injected different types of server-side delay into the search results ... In Figure 1, time flows from left to right and diago- nal lines represent network commu

Entity Recommendations in Web Search - GitHub
These queries name an entity by one of its names and might contain additional .... Our ontology was developed over 2 years by the Ya- ... It consists of 250 classes of entities ..... The trade-off between coverage and CTR is important as these ...

Distributed Indexing for Semantic Search - Semantic Web
Apr 26, 2010 - 3. INDEXING RDF DATA. The index structures that need to be built for any par- ticular search ... simplicity, we will call this a horizontal index on the basis that RDF ... a way to implement a secondary sort on values by rewriting.

The Anatomy of a Large-Scale Hypertextual Web Search Engine
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure ... growing rapidly, as well as the number of new users inexperienced in the art of web research. People are likely to ...... Publishe

Overview of the TREC 2014 Federated Web Search Track
evaluation system (see Section 4), and the remaining 50 are the actual test topics (see ... deadline for the different tasks, we opened up an online platform where ...