PARAPHRASING ADAPTATION FOR WEB SEARCH RANKING Chenguang Wang (Peking University) Nan Duan (Microsoft Research Asia) Ming Zhou (Microsoft Research Asia) Ming Zhang (Peking University )

August 5, 2013

MOTIVATION Mismatch between queries and documents is a key issue for the web search task • Caused by expressing the same meaning in different natural language ways • E.g.

X is the author of Y Y was written by X

Who is the author of Gone with the Wind ? Paraphrasing engine produces alternative expressions to convey the same meaning of the input text Search Paraphrases Engine different perspectives E.g.

Gone with the Wind was written by whom? Paraphrase extraction Paraphrase generation Model optimization

?

MOTIVATION Mismatch between queries and documents is a key issue for the web search task • Caused by expressing the same meaning in different natural language ways • E.g.

X is the author of Y Y was written by X

Paraphrasing engine produces alternative expressions to convey the same meaning of the input text

• Improve paraphrasing from different perspectives • E.g. Paraphrase extraction Paraphrase generation Model optimization

MOTIVATION (CONT.) Q1: Could paraphrasing engine alleviate the mismatches of query and its relevant documents?

Q2: How to adapt the paraphrasing engine for web search ranking task specifically?

Solution Overview

Solution Overview Raw Data

Paraphrase Extraction

Paraphrase Extraction •

Extract paraphrase pairs from various data sources

Paraphrase Model •

A search-oriented model generates candidates for each original query

Parameter Optimization •

Optimize the weights of the features used in paraphrasing model

Ranking Model •

An enhanced ranking model by using augmented features computed on paraphrases of original queries.

Solution Overview Raw Data

Paraphrase Extraction

Paraphrase Extraction •

Original Query

Paraphrase Model

Extract paraphrase pairs from various data sources

Paraphrase Model •

A search-oriented model generates candidates for each original query

Parameter Optimization •

Optimize the weights of the features used in paraphrasing model

Ranking Model •

An enhanced ranking model by using augmented features computed on paraphrases of original queries.

Solution Overview Raw Data

Paraphrase Extraction

DEV Data

Original Query

Paraphrase Model

Model Optimization

𝜆𝑖 ∙ ℎ𝑖 (∙) 𝑖

Paraphrase Extraction •

Extract paraphrase pairs from various data sources

Paraphrase Model •

A search-oriented model generates candidates for each original query

Parameter Optimization •

Optimize the weights of the features used in paraphrasing model on development data

Ranking Model •

An enhanced ranking model by using augmented features computed on paraphrases of original queries.

Solution Overview Raw Data

Paraphrase Extraction

DEV Data

Original Query

Paraphrase Model Original Query +N-best Candidates

Model Optimization

𝜆𝑖 ∙ ℎ𝑖 (∙) 𝑖

Paraphrase Extraction •

Paraphrase Model •

A search-oriented model generates candidates for each original query

Parameter Optimization •

Optimize the weights of the features used in paraphrasing model on development data

Ranking Model •

Ranking Model

Extract paraphrase pairs from various data sources

An enhanced ranking model by using augmented features computed on paraphrases of original queries

PARAPHRASE EXTRACTION Bilingual-based

Monolingual-based

• Hypothesis: Phrases that align with identical pivot phrases tend to have similar meanings

• Hypothesis: Words/Phrases that share the same context tend to have similar meanings

(Bannard and Callison-Burch (2005))

(Lin and Pantel (2001)) is

is

who

author the

who

of

company president

carol carol

a

author

a

christmas

christmas

#1 is the author of #2 #1 is #2 ‘s author

Source Language

‘s

Target (Pivot) Language

公司 领导

corporation director 公司 主席

chair of firm

SEARCH-ORIENTED PARAPHRASING MODEL

Candidate

𝑸 = 𝐚𝐫𝐠 𝐦𝐚𝐱 𝑷 𝑸′ 𝑸 𝑸′∈𝓗(𝑸)

= 𝒂𝒓𝒈 𝒎𝒂𝒙 𝑸′∈𝓗(𝑸)

Original query

𝑴 𝒎=𝟏 𝝀𝒎 𝒉𝒎 (𝑸, 𝑸′)

Hypothesis space

SEARCH-ORIENTED PARAPHRASING MODEL Search-Oriented Features: • Word Addition

• Word Deletion • Word Overlap

• Word Alteration

Candidate

𝑸 = 𝐚𝐫𝐠 𝐦𝐚𝐱 𝑷 𝑸′ 𝑸 𝑸′∈𝓗(𝑸)

= 𝒂𝒓𝒈 𝒎𝒂𝒙 𝑸′∈𝓗(𝑸)

Original query

𝑴 𝒎=𝟏 𝝀𝒎 𝒉𝒎 (𝑸, 𝑸′)

Hypothesis space

• Word Reordering

found a company

• Length Difference

start a business

• Edit Distance

SEARCH-ORIENTED PARAPHRASING MODEL Traditional Features (Koehn et al., 2003):

Search-Oriented Features: • Word Addition

• Word Deletion • Word Overlap

• Word Alteration • Word Reordering

• Length Difference • Edit Distance



Translation Probability



Lexical Weight



Word Count



Paraphrase Rule Count



Language Model

Candidate

𝑸 = 𝐚𝐫𝐠 𝐦𝐚𝐱 𝑷 𝑸′ 𝑸 𝑸′∈𝓗(𝑸)

= 𝒂𝒓𝒈 𝒎𝒂𝒙 𝑸′∈𝓗(𝑸)

Original query

𝑴 𝒎=𝟏 𝝀𝒎 𝒉𝒎 (𝑸, 𝑸′)

Hypothesis space

NDCG-BASED PARAMETER OPTIMIZATION

NDCG-BASED PARAMETER OPTIMIZATION Original Query

NDCG-BASED PARAMETER OPTIMIZATION Original Query Candidate-1

Candidate-2 … Candidate-N

NDCG-BASED PARAMETER OPTIMIZATION Original Query Feature vector-1 Feature vector-2 … Feature vector-N

Candidate-1

Candidate-2 … Candidate-N

NDCG-BASED PARAMETER OPTIMIZATION Original Query Feature vector-1 Feature vector-2 … Feature vector-N

Candidate-1

Ranker

Candidate-2

Ranker





Candidate-N

Ranker

NDCG-BASED PARAMETER OPTIMIZATION Original Query Feature vector-1 Feature vector-2 … Feature vector-N

Candidate-1

NDCG-1

Candidate-2

NDCG-2





Candidate-N

NDCG-N

Candidate is sent to the ranker, and returned by an NDCG score

Ranker Ranker … Ranker

NDCG-BASED PARAMETER OPTIMIZATION Original Query Feature vector-1 Feature vector-2 … Feature vector-N

Candidate-1

NDCG-1

Candidate-2

NDCG-2





Candidate-N

NDCG-N

NDCG-based MER Training

Candidate is sent to the ranker, and returned by an NDCG score

Ranker Ranker … Ranker

NDCG-BASED PARAMETER OPTIMIZATION Original Query Feature vector-1 Feature vector-2 … Feature vector-N

Candidate-1

NDCG-1

Candidate-2

NDCG-2





Candidate-N

NDCG-N

NDCG-based MER Training

Candidate is sent to the ranker, and returned by an NDCG score

Ranker Ranker … Ranker

Updated feature weights 𝝀𝒊 ∙ ℎ𝑖 (∙) 𝑖

After optimization, candidates with higher NDCGs are preferred and ranked on the top of the N-best list

NDCG-BASED PARAMETER OPTIMIZATION (CONT.) Minimum error rate training (MERT) (Och, 2003) • To find the optimal feature weight vector that minimizes the error criterion Err according to the NDCG scores of top-1 paraphrase candidates 𝑆

𝐸𝑟𝑟(𝐷𝑖𝐿𝑎𝑏𝑒𝑙 , 𝑄𝑖 ; λ1𝑀 , ℛ)}

λ1𝑀 = arg 𝑚𝑖𝑛{ λ𝑀 1

Labeled documents for original query

𝑖=1

Best paraphrase candidate NDCG score

𝐸𝑟𝑟 𝐷𝑖𝐿𝑎𝑏𝑒𝑙 , 𝑄𝑖 ; λ1𝑀 , ℛ = 1 − 𝑁(𝐷𝑖𝐿𝑎𝑏𝑒𝑙 , 𝑄𝑖 , ℛ)

Ranking model

ENHANCED RANKING MODEL Ranking model • The paraphrase candidates act as hidden variables and expanded matching features between queries and documents

Query

𝐾

ℛ 𝑄, 𝐷𝑄 = Retrieved documents

Original query N-best paraphrase candidates

• Unigram/bigram/trigram BM25 • Original/normalized Perfect-Match

λ𝑘 𝐹𝑘 (𝑄, 𝐷𝑄 )

𝑗

ℛ 𝑄, 𝐷𝑄𝑖 > ℛ 𝑄, 𝐷𝑄 ⇔ 𝑟𝐷𝑖 > 𝑟𝐷𝑗 𝑄

𝑄

Expanded Matching Features

𝑘=1

F = (𝑭𝟏 , 𝑭𝟐 ,…, 𝑭𝑲 )

Q 𝑄1′ 𝑄2′ …… 𝑄𝑁′

Relevance rating

Document 𝐷𝑄

𝐹1 𝐹2

……

𝐹𝑁

{F , 𝐹1 , 𝐹2 ,…, 𝐹𝑁 }

EXPERIMENTS: DATASETS Paraphrase Extraction • Training data • Bilingual corpus (NIST 2008 constrained track): 5.1M sentence pairs • Monolingual corpus (Bing’s query log): 16.7M queries • Human annotated data (WordNet dictionary): 0.3M synonym pairs

• # of paraphrase pairs: 58M Evaluation Set Bing’s query log

# of queries

Development

1,419

Test

1,419

SYSTEMS Paraphrasing Denotation

Features

Optimization Metric

BL-Para (baseline)

Traditional features

BLEU

BL-Para+SF

Traditional features + Search-oriented features

BLEU

BL-Para+SF+Opt

Traditional features + Search-oriented features

NDCG

Ranking Model Denotion

Features

BL-Rank (baseline: Liu et al., 2007)

Query-documents matching features (unigram/bigram/trigram BM25 and original/normalized Perfect-Match)

BL-Rank+Para (Enhanced ranking model)

Query+Paraphrase-documents matching features

*The ranking model is learned based on SVMrank toolkit (Joachims, 2006) with default parameter setting.

IMPACTS OF SEARCH-ORIENTED FEATURES Test Set

BL-Para

BL-Para+SF Top-1

Original Query

Cand@1

Cand@1

27.28%

26.44%

26.53%

BL-Para: Paraphrase Baseline with Features: Traditional features

Optimization Metric: BLEU

Paraphrase Candidate

BL-Para+SF: Paraphrase Baseline with Features: Traditional features + Search-oriented features Optimization Metric: BLEU

IMPACTS OF OPTIMIZATION ALGORITHM Test Set BL-Para+SF

BL-Para+SF+Opt Top-1

Original Query

Cand@1

Cand@1

27.28%

26.53%

27.06%(+0.53%)

BL-Para+SF: Paraphrase Baseline with Features: Traditional features + Search-oriented features Optimization Metric: BLEU

Paraphrase Candidate

BL-Para+SF+Opt: Paraphrase Baseline with Features: Traditional features + Search-oriented features Optimization Metric: NDCG

IMPACTS OF ENHANCED RANKING MODEL Ranking model baseline (Liu et al., 2007)

Dev Set NDCG@1

NDCG@5

BL-Rank

25.31%

33.76%

BL-Rank+Para

28.59%(+3.28%)

34.25%(+0.49%)

Test set

Enhanced ranking model

NDCG@1

NDCG@5

BL-Rank

27.28%

34.79%

BL-Rank+Para

28.42%(+1.14%)

35.68%(+0.89%)

BL-Rank: Query-documents matching features (unigram/bigram/trigram BM25 and original/normalized Perfect-Match)

BL-Rank+Para: Query+Top 1 Paraphrasedocuments matching features (unigram/bigram/trigram BM25 and original/normalized Perfect-Match)

CONCLUSION We present an in-depth study on adapting paraphrasing for web search • Paraphrasing model with search-oriented features • NDCG-based optimization method Future directions: • Compare and combine paraphrasing with other query reformulation techniques to further improve the search quality • E.g., pseudo-relevance feedback, and conditional random field-based approach

THANK YOU! QUESTIONS, EMAIL CHENGUANG WANG [email protected]

BL-Para: Paraphrase Baseline with Features

Aug 5, 2013 - Features. • Unigram/bigram/trigram BM25. • Original/normalized Perfect-Match. Page 24. EXPERIMENTS: DATASETS. Paraphrase Extraction.

616KB Sizes 0 Downloads 174 Views

Recommend Documents

Interacting with Features in GPlates
See ​www.earthbyte.org/Resources/earthbyte_gplates.html​ for EarthByte data sets. Background. GPlates ... Feature Type box at the top of the window). .... Indonesian Gateway and associated back-arc basins, Earth-Sci. Rev., vol 83, p.

Interacting with Features in GPlates
... the new changes. See the GPlates online manual for further information: ... In this exercise we will learn how to query features. To start with we .... 85-138, 1995. Gaina, C., and R. D. Müller, Cenozoic tectonic and depth/age evolution of the.

Extracting Baseline Electricity Usage with Gradient Tree Boosting - SDM
Nov 15, 2015 - advanced metering infrastructure (AMI) captures electricity consumption in unprecedented spatial and tem- ... behavioral theories, enables 'behavior analytics:' novel insights into patterns of electricity consumption and ... Energy man

Search features
Search Features: A collection of “shortcuts” that get you to the answer quickly. Page 2. Search Features. [ capital of Mongolia ]. [ weather Knoxville, TN ]. [ weather 90712 ]. [ time in Singapore ]. [ Hawaiian Airlines 24 ]. To get the master li

paraphrase extraction from parallel news corpora
data set originating from a specific MT evaluation technique (The values are within the intervals with ..... [Finch et al., 2005] proposed to use Automatic Machine Translation2 evaluation techniques in paraphrase ...... Data Mining. [Marton, 2006] ..

CSIP Division BaseLine School Improvement Plan - Revised.pdf ...
Page 4 of 25. CSIP Division BaseLine School Improvement Plan - Revised.pdf. CSIP Division BaseLine School Improvement Plan - Revised.pdf. Open. Extract.

Supervised selection of dynamic features, with an ...
Abstract. In the field of data mining, data preparation has more and ..... The use of margins is validated by the fact that they provide distribution-free bounds on ...

Learning with Augmented Features for Supervised and ...
... applications clearly demonstrate that our SHFA and HFA outperform the existing HDA methods. Index Terms—Heterogeneous domain adaptation, domain adaptation, transfer learning, augmented features. 1 INTRODUCTION. IN real-world applications, it is

Database program with automatic creation of user features
Dec 27, 2007 - Of course, a speci?c database designed for one speci?c legal case will likely .... elements of the Adobe suite: Adobe Acrobat for the scanning,.

Supervised selection of dynamic features, with an ... - Semantic Scholar
cation of the Fourier's transform to a particular head-related impulse response. The domain knowledge leads to work with the log of the Fourier's coefficient.

Improving web search relevance with semantic features
Aug 7, 2009 - the components of our system. We then evaluate ... of-art system such as a commercial web search engine as a .... ment (Tao and Zhai, 2007).

Learning features by contrasting natural images with ...
1 Dept. of Computer Science and HIIT, University of Helsinki,. P.O. Box 68, FIN-00014 University of Helsinki, Finland. 2 Dept. of Mathematics and Statistics, University of ... rameterized family of probability distributions. In non-overcomplete ICA,

Learning Features by Contrasting Natural Images with ...
Michael Gutmann – University of Helsinki. ICANN2009: Learning ... Estimation method: Fit the parameters in the classifier to the data (supervised learning!) 3.

Extracting temporal EEG features with BCIpy
Our research analyzes EEG data collected by Chang et al. [1] while students ... 2 BCIpy uses the Pandas data analysis library [5,6], for the data structures and func- tions it provides, and .... during visual processing and before vocalization of the

Conditional Random Fields with High-Order Features ...
synthetic data set to discuss the conditions under which higher order features ..... In our experiment, we used the Automatic Content Extraction (ACE) data [9], ...

Program features - MCShield
Feb 26, 2012 - Hard disk drives – enables initial scan of all hard drives ..... C:\Documents and Settings\All Users\Application Data\MCShield (for Windows XP).

Quality of the Baseline Data.pdf
... of the apps below to open or edit this item. Quality of the Baseline Data.pdf. Quality of the Baseline Data.pdf. Open. Extract. Open with. Sign In. Main menu.

Consultancy Service for the Stakeholder Satisfaction Baseline ...
2.The schedule for Submission and Opening of Technical Bids shall be moved from August 12,2015 to Auqust 19,. Very truly yours,. Bids and Awards Committee. Signature over printed name. Page 1 of 1. Bid Bulletin 2 - Consultancy Service for the Stakeho

Baseline Shift versus Decision Bias - Semantic Scholar
Jul 8, 2009 - time, we show that prestimulus gamma-band fluctuations in LO behave as a decision bias at ... According to this view, each perceptual decision.

TDS-3 Handheld Meter With Carrying Case Features ...
Aug 13, 2012 - Conversion Factor: NaCl (avg. 0.5). ATC: Built-in sensor for Automatic Temperature Compensation of 1 to 50 degrees Celsius (33 to 122 degrees Fahrenheit). Power source: 2 x 1.5V button cell batteries (included) (LR44 or equivalent). Ba