A General Magnitude-Preserving Boosting Algorithm for Search Ranking Chenguang Zhu*, Weizhu Chen, Zeyuan Allen Zhu*, Gang Wang, Dong Wang*, Zheng Chen Microsoft Research Asia, *Tsinghua University 2009.11

Motivation 

Traditional learning to rank algorithms 

Pairwise: 



Ranking SVM, RankBoost, RankNet

Deficiency: 

e.g. Mis-ranking two documents with ratings 5 and 1 gets the same loss as mis-ranking two documents with ratings 5 and 4

Challenge 

Challenges for incorporating magnitude information in pairwise learning to rank 

More complex than classification: 



Deals with the order of document list

More complex than regression: 

The prediction is carried out for each single document, not document pairs

Our Contribution Basis

Algorithm

Theory

•Apply magnitude-preserving labels for pairwise learning

•Leverage Boosting for learning to rank: MPBoost

•Prove the convergence property on ranking accuracy

Formalization for Learning to Rank 

Input: Query set Q and {( x



Learning:

qi





, for each q  Q

Generate pairwise learning instances: S



n

, rqi )}i q1

q

{(( xqi , xqj ), yqij ) | rqi  rqj }

Learning to generate ranking function F ( x)

Prediction: list {xqi }i q1 ,

For a retrieved document rank the documents from the largest F ( xqi ) to the smallest one. n



Pairwise Approach Human Ratings



Generate Pairwise Labels

Traditional pairwise labels  1, ri is preferenced to rj yij    1, rj is preferenced to ri

Learning Preference

Magnitude Preserving Labels 

Magnitude Preserving Labels: 

Directed Distance Function (DDF) 

Preserve preference relationship:

sgn(dist (ri , rj ))  sgn(ri  rj ) 

Preserve magnitude information

| ri  rj || ri ' rj ' || dist (ri , rj ) || dist (ri ', rj ') |

Directed Distance Function

Directed Distance Function (DDF)  

The form of DDF can vary We investigate three kinds of DDF: 



Linear Directed Distance (LDD) dist (ri , rj )   (ri  rj ) Logarithmic Directed Distance (LOGDD) dist (ri , rj )  sgn(ri  rj )log(1   | ri  rj |)



Logistic Directed Distance (LOGITDD) dist (ri , rj )  sgn(ri  rj )

1 1 e

  | ri  rj |

Directed Distance Function (DDF)

Pairwise Approach Human Ratings



Generate Pairwise Labels

Learning Preference

We propose the Magnitude-Preserving Boosting algorithm (MPBoost)

MPBoost Algorithm 

Loss J for a ranking function F : J (F )  e



 dist ( ri , rj )( F ( xi )  F ( x j ))

Optimization:  

GentleBoost (Friedman et al.) Stage-wise gradient descent 

Require | dist (ri , rj ) | 1 (ri  rj )

MPBoost Algorithm Input: Query set Q and{( xqi , rqi )}i 1, for each q  Q Output: ranking function F ( x) 1: Generate S  q{(( xqi , xqj ), dist (rqi , rqj )) | rqi  rqj } 2: Generate index set I  {(i, j ) | (( xi ,x j ),dist(ri ,rj ))  S} 1 (1) w  3: Initialize ij , for (i, j )  I |I| 4: for t  1...T do 5: Fit the weak ranker f t , such that: ft  argmin J wse ( f ) ( t 1) ( t )  dist ( ri , rj )( ft ( xi )  ft ( x j )) 6: Update: wij  wij e / Zt ( t )  dist ( r , r )( f ( x )  f ( x )) Z  w where t  I ij e 7: end for T 8: Output the final ranking function F ( x)   f t ( x) nq

          

i

j

t

i

t

j

t 1

Convergence Property of MPBoost 

Theorem 1. The normalized ranking loss (mis-ordering) of F is bounded:  w [[ F ( x )  F ( x )]]   w [[ F ( x )  F ( x )]]   Z T

{( i , j )I | ri  rj }

(1) ij

i

j

{( i , j )I | ri  r j }

(1) ij

i

j

t 1

t

where [[ ]] is defined to be 1 if the condition  holds and 0 otherwise.



Proof: Omitted

Datasets   



OHSUMED dataset (From LETOR 3.0) Web-1 dataset (From Yandex competition) Web-2 dataset (From a commercial English search engine) 3+1+1 five-fold cross validation Dataset

Queries

Documents

Rating

OHSUMED

106

16140

{0,1,2}

Web-1

9124

97290

[0,4]

Web-2

467

50000

{0,1,2,3,4}

Experiment Methodology 

Baseline:  



RankBoost, ListNet, AdaRank-NDCG MPBoost.BINARY

Proposed method:   

MPBoost.LDD MPBoost.LOGDD MPBoost.LOGITDD

Experimental Results on OHSUMED  

MPBoost.* outperforms pairwise baselines MPBoost.BINARY outperforms RankBoost

Experimental Results on Web-1

Experimental Results on Web-2

Discussion 



The magnitude-preserving versions of MPBoost outperform baselines by 0.16% to 2.2% over average NDCG MPBoost.LOGDD and MPBoost.LOGITDD outperform MPBoost.LDD 

Linear directed distance can hardly guarantee that for all pairs | dist (ri , rj ) | 1 (ri  rj )

Overfitting Issues

Conclusion Magnitude-preserving labels can effectively improve ranking accuracy Directed Distance Function can have various forms MPBoost inherits theoretical properties from RankBoost

Q&A Thanks!

Related Work 

Qin et al.  





Based on RankSVM Multiple Hyperplane Ranker (MHS) Complex when the number of ratings are large

Cortes et al.  



Based on RankSVM Incorporate magnitude differences Limited due to σ-admissibility of cost function

MPBoost & RankBoost

Based On

Loss Function Weak Learner’s criteria

e

MPBoost

RankBoost

GentleBoost

AdaBoost

0

 w [dist (r , r )  ( f ( x )  f ( x ))]

2

ij

1

1

0

x0 , x1

i

j

i

T

Combination

 D( x , x )[[H ( x )  H ( x )]]

 dist ( ri , rj )( F ( xi )  F ( x j ))

F ( x)   f t ( x) t 1

j

 D( x , x )(h( x )  h( x )) 0

1

1

x0 , x1 T

H ( x)   t ht ( x) t 1

0

A General Magnitude-Preserving Boosting Algorithm for ...

Pairwise: Ranking SVM, RankBoost, RankNet. Deficiency: e.g. Mis-ranking two documents with ratings 5 and 1 gets the same loss as mis-ranking two.

663KB Sizes 1 Downloads 240 Views

Recommend Documents

an almost linear time algorithm for a general haplotype ...
consists of v , m, v0, f, v, where m and f are the parent nodes of v, and v0 is the anchor child node in this ..... We run Merlin and DSS on a Linux machine with two ...

Boosting Methodology for Regression Problems
Decision-support Systems, with an Application in. Gastroenterology” (with discussion), Journal of the Royal. Statistical Society (Series A), 147, 35-77. Zheng, Z. and G.I. Webb [1998]. “Lazy Bayesian Rules,”. Technical Report TR C98/17, School

Adaptive Martingale Boosting - Phil Long
has other advantages besides adaptiveness: it requires polynomially fewer calls to the weak learner than the original algorithm, and it can be used with ...

General Suffix Automaton Construction Algorithm and Space Bounds
Apr 26, 2009 - The original motivation for this work was the design of a large-scale mu- ..... represent the sub-tree rooted at [ai] and let nai represent the ...

A Randomized Algorithm for Finding a Path ... - Semantic Scholar
Dec 24, 1998 - Integrated communication networks (e.g., ATM) o er end-to-end ... suming speci c service disciplines, they cannot be used to nd a path subject ...

Looking for lumps: boosting and bagging for density ...
The solution to data mining problems often involves discovering non-linear re- lationships in large, noisy datasets. Bagging, boosting, and their variations have.

Adaptive Martingale Boosting - NIPS Proceedings
In recent work Long and Servedio [LS05] presented a “martingale boosting” al- gorithm that works by constructing a branching program over weak classifiers ...

A GENERAL FRAMEWORK FOR PRODUCT ...
procedure to obtain natural dualities for classes of algebras that fit into the general ...... So, a v-involution (where v P tt,f,iu) is an involutory operation on a trilattice that ...... G.E. Abstract and Concrete Categories: The Joy of Cats (onlin

the matching-minimization algorithm, the inca algorithm and a ...
trix and ID ∈ D×D the identity matrix. Note that the operator vec{·} is simply rearranging the parameters by stacking together the columns of the matrix. For voice ...

Polynomial algorithm for graphs isomorphism's
Polynomial algorithm for graphs isomorphism's i. Author: Mohamed MIMOUNI. 20 Street kadissia Oujda 60000 Morocco. Email1 : mimouni.mohamed@gmail.

Efficient Active Learning with Boosting
unify semi-supervised learning and active learning boosting. Minimization of ... tant, we derive an efficient active learning algorithm under ... chine learning and data mining fields [14]. ... There lacks more theoretical analysis for these ...... I

Efficient Active Learning with Boosting
compose the set Dn. The whole data set now is denoted by Sn = {DL∪n,DU\n}. We call it semi-supervised data set. Initially S0 = D. After all unlabeled data are labeled, the data set is called genuine data set G,. G = Su = DL∪u. We define the cost

A remote sensing surface energy balance algorithm for ...
(e.g. Sellers et al., 1996) can contribute to an improved future planning .... parameter is variable in the horizontal space domain with a resolution of one pixel.

A Fast Bresenham Type Algorithm For Drawing Circles
once the points are determined they may be translated relative to any center that is not the origin ( , ). ... Thus we define a function which we call the V+.3?=I

Autonomous Stair Climbing Algorithm for a Small Four ...
[email protected], [email protected], [email protected]. Abstract: In ... the contact configuration between the tracks and the ground changes and ...

A Linear Time Algorithm for Computing Longest Paths ...
Mar 21, 2012 - [eurocon2009]central placement storage servers tree-like CDNs/. [eurocon2009]Andreica Tapus-StorageServers CDN TreeLike.pdf.

A Relative Trust-Region Algorithm for Independent ...
Mar 15, 2006 - learning, where the trust-region method and the relative optimization .... Given N data points, the simplest form of ICA considers the noise-free linear .... relative trust-region optimization followed by the illustration of the relati

A RADIX-2 FFT ALGORITHM FOR MODERN SINGLE ...
Image and Video Processing and Communication Lab (ivPCL). Department of Electrical and ... Modern Single Instruction Multiple Data (SIMD) microprocessor ...

A Finite Newton Algorithm for Non-degenerate ...
We investigate Newton-type optimization meth- ... stands in the efficient optimization of the elitist Lasso prob- ..... with Intel Core2 CPU 2.83GHz and 8G RAM.

A Fast Distributed Approximation Algorithm for ...
ists graphs where no distributed MST algorithm can do better than Ω(n) time. ... µ(G, w) is the “MST-radius” of the graph [7] (is a function of the graph topology as ...

A Hybrid Genetic Algorithm with Pattern Search for ...
computer simulated crystals using noise free data. .... noisy crystallographic data. 2. ..... Table 4: Mean number of HA's correctly identified per replication and the ...