Optimizing Unified Loss for Web Ranking Specialization Fan Li

Xin Li

Jiang Bian

Yahoo! Labs

Microsoft Bing

College of Computing,Georgia Institute of Technology

[email protected]

[email protected]

[email protected]

Zhaohui Zheng Yahoo! Labs

[email protected] ABSTRACT

idea of such approaches is as follows: In the training process, each query in the training set is assigned to one or more topics, and a specialized ranking model is trained for each topic. At testing time, the new query is mapped to one or more topics it most likely belongs to, and the respective specialized ranking models are applied to make predictions. The query categories/clusters of topical ranking methods in previous works could come from two sources: they are either pre-defined by human or automatically learned by clustering algorithms. However, it may not be the best choice to directly use them for web-ranking purpose, which is discussed as follows:

In this paper, we proposed a novel divide-and-conquer approach to optimize the overall relevance in an unified framework for query clustering and query-based ranking. Latent topics and specialized ranking models are learned iteratively so that an unified objective function, which lower-bounds the conditional probability of observed grades annotated by human editors on training data, is maximized. We conducted experiments comparing the proposed method with several baseline approaches on two data-sets. Experimental results illustrate that our method can significantly improve the ranking relevance over these baselines.

Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval—Retrieval functions; H.4.m [Information Systems]: Miscellaneous—Machine learning

General Terms Algorithms, Experimentation, Theory

Keywords

• Another choice is to automatically learn the latent topics from traning data using clustering algorithms. In the training phase, training data is partitioned into K clusters based on the query-level-similarity, which is calculated from the result-set features of given queries. Example of recent works following this line include Topical RankSVM proposed in [4] and query-dependentranking models (off-line version) proposed in [7] . The limitation of such methods is that, its clustering procedure is still a separate step from the ranking model training procedure. The clustering procedure only relies on query result-set features, and does not exploit the information from labels of query-URL pairs annotated by human editors in the training data, thus it is not optimized for the final ranking purpose. This could lead to unexpected results. One possible example is that, result-set features that play dominant roles in the clustering step may be simply irrelevant for ranking tasks. In such cases, final ranking results will not get benefit from the clustering procedure.

Ranking specialization, Ranking-based Clustering, Unified Loss

1.

• Human defined categories have been used as query partitions for topical ranking in many previous works ( [2], [3], [10], [12]). However, in most cases, the categories are defined for their semantic meanings, instead of for maximizing the overall ranking performance. In fact, semantically similar queries may have very different result-set feature values and are not coherent in feature space. Thus these approaches may not be the best choice to solve the problem of heterogeneous queries from the ranking point of view.

INTRODUCTION

In the general web search ranking scenario, training and testing data usually consists of different types of queries that vary from each other significantly in semantics, user intensions, etc. Different queries, like navigational queries, personal queries, product queries, or local queries, may have very different behaviors in the ranking process. To overcome the problem caused by the heterogeneous nature of web-search queries, IR community has proposed several possible solutions based on the topical ranking idea. The basic

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CIKM’10, October 26–30, 2010, Toronto, Ontario, Canada. Copyright 2010 ACM 978-1-4503-0099-5/10/10 ...$10.00.

In this paper we propose a novel approach that overcomes the limitations mentioned above. The key idea of our approach is to maximize the overall ranking performance by iteratively optimizing two steps at training time: Ranking

1593

Specialization and Ranking-based Clustering. The first step (Ranking Specialization) trains a specialized ranking model for each latent topic, and the second step (Ranking-based Clustering) maps each query to latent topics whose specialized ranking models most fit the query. Both these two steps are designed to decrease the value of an unified loss function on training data, and this process is repeated until convergence occurs. This makes our method significantly different from previous works in this line, and we also believe it is an important advantage that enables our approach to serve the ranking purpose in a better way. The rest of the paper is organized as follows. In section 2, we describe our model and algorithm to solve it. In section 3, we show our experimental settings and results. In section 4, we summarize the conclusions.

2.

Our framework allows us to use our preferred ranking model (with least square loss) as mk in formula 1. In this paper we use Gradient-Boosting-Trees (GBDT) as an example. 1 In the training step, we used an EM-style algorithm to k |k = 1, ..., K − 1} and {mk |k = 1, ..., K} iteratively learn {β so that the objective function in formula ?? is minimized. The pseudo code of our training algorithm is listed as follows in Algorithm 1. Algorithm 1 Overall Training pseudo code k |k = 1, ..., K − 1} 1. Initialize values for {β 2. Iterate until convergence k |k = 1, .., K −1}, and (a) Fix the current values of {β learn {mk |k = 1, ..., K} (using starndard GBDT learning algorithm with sample weights set as i , βk )) so that formula 1 is minimized z k (F

METHOD

2.1 Our loss function In ranking problem, we are given training set {qi , Ui , Gi |i = 1, ..., N }, where qi means the ith query, Ui means the list of query-URL feature vectors associated with qi , and Gi means the list of human grades assigned to URLs in Ui . We use Uij to represent the jth URL in Ui , and use Gij to represents the human grade assigned to the query-URL pair (qi , Uij ). Let’s also assume there is a set of query-dependent features, denoted as Fi , for each query qi . In this paper, we propose the loss function as: XX X k ) i , β L= z(F (Gij − Sk (qi , Uij , mk ))2 i

j

k

k ) ≥ 0 and i , β st. ∀i, z(F

X

k ) = 1 i , β z k (F

(b) Fix the current mk , and use linear programming k |k = 1, ..., K} so that formula 1 is to learn {β minimized, with the constraint that for any i, k, i , β k ) ≥ 0. z(F k |k = 1, ..., K}. 3. Return {mk |k = 1, ..., K} and {β

(1)

k

where mk represents the specialized ranking model for the k ) represents a function mapping queries i , β kth cluster and z(F k are the parameters of the mapping functo latent topics. β tion. In the training procedure, the loss function in formula 1 k and mk iteratively. The can be solved by optimizing β learning of mk corresponds to the Ranking Specialization step and the learning of βk corresponds to the Rankingbased clustering step. In the testing procedure, given a test query qi and a list of associated URLs Ui , The predicted score of the ijth queryURL pair is calculated as X ˆ ij = i , β k )Sk (qi , Uij , mk ) G z(F (2) k

• Step 2(a) is easy to solve since it is no more than learning regular ranking functions with additional weights associated with training examples. We can use standard GBDT learning algorithms, with sample weights k )), to learn ranking models {mk |k = i , β set as zk (F 1, ..., K}. • Step 2(b) can be solved byPstandard linear programming. When mk is fixed, j (Gij − Sk (qi , Uij , mk ))2 becomes a fixed number and the objective function can be reduced to a standard linear programming form.

3. EXPERIMENTS 3.1 Data collections used in the experiments We conducted experiments on two data-sets, including both the publicly benchmark dataset (LETOR 3.0) and that obtained from a commercial search engine. • LETOR 3.0: LETOR 3.0 [11] is a benchmark dataset for research on ranking [1]. We use TREC2003 and and TREC2004 datasets in LETOR 3.0 to evaluate our approach. TREC2003 contains 350 queries and TREC2004 contains 225 ones. For each query, there are about 1,000 associated documents. Each querydocument pair is given a binary judgment: relevant

2.2 Learning Algorithm 2.2.1 zk (Fi , βk ) and mk in our model k ) i , β In this paper, we assumed the mapping function z(F has the following linear form: ( k , i β F if k < K   P z(Fi , βk ) = t , if k = K i β F 1 − K−1 t=1

1 GBDT is a model successfully applied for the learning to rank problem ( [6], [13], [14]). The basic idea of GBDT model is to compute a sequence of binary trees, where each successive tree is built for the prediction residuals of the preceding tree. Training data is partitioned into two samples at each split node. In the tree-growing process, we find the node and feature to split so that the global loss over all queries in the training data is minimized.

The main reason we take the assumptions of linearity is to simplify our finial loss function in formula 1 so that the pak can be solved efficiently using linear programrameters β ming.

1594

• Single Ranking Model (Single-RM): This baseline approach trains a single model using all the training data, and apply this model on all the testing queries.

0.75 Single−RM Semantic−Topical−RM Hard−Clustering−Topical−RM Soft−Clustering−Topical−RM Offline−KNN−Topical−RM Our Method

0.74

NDCG@K

0.73

• Ranking Model with pre-defined topics (SemanticTopical-RM): This baseline approach trains a model for each pre-defined semantic topic on the training data. Given a testing query, the corresponding model on this query’s topic will be invoked to generate the ranking results.

0.72

0.71

0.7

0.69

• Ranking Model with topics generated by traditional clustering (Hard-Clustering-Topical-RM): In this method, we implement the Hard-Clustering-based Topical Ranking approach. After identifying the topics using traditional clustering approaches, we assign each training query into the closest query cluster. Based on this hard partition of training queries, we train a separate ranking model for each query cluster using its own fraction of training queries. At testing time, according to the correlation between the test query and query categories, the ranking model of the most correlated query cluster is selected to generate the ranking results.

0.68

0.67

0.66

1

5

10

K

Figure 1: The values of metric NDCG@K with K = 1, 5, 10 for our method, Single-RM, SemanticRM, Hard-Clustering-RM, Soft-Clustering-RM and Offline-KNN-Topical-RM on SE-Dataset, using GBDT. or irrelevant. In total, there are 64 features for each query-document pair, which can be referred to [11] for the details.

• Ranking Model with topics generated by traditional clustering (Soft-Clustering-Topical-RM): In this method, we implement the Soft-Clustering-based Topical Ranking approach. We first simulate the idea in [4] and generate the topics and membership probabilities for training queries. Then we train a separate ranking model for each query cluster using the membership probabilities as query weights. At testing time, we also follow [4] and set the final predictive score as the weighted sum of predictive scores of ranking models in different clusters.

Both of these two tracks classify all the queries into three pre-defined categories, including topic distillation (TD), homepage finding (HP) and named page finding (NP), according to search intent. • Commercial search engine dataset (SE-Dataset) We also conduct experiment on a dataset obtained from a major commercial search engine. This dataset contains 71,810 training queries and 1,227,094 querydocument pairs for training as well as 7,668 testing queries and 252,086 query-document pairs for testing. Each query is associated with its retrieved documents, along with five level human judged labels that represent the degrees of relevance. Features for each querydocument pair used in building the ranking functions can be roughly grouped into the following categories: text-matching features, link-based features, user-click features, query and page classification features. We denote this dataset as SE-Dataset. This dataset classifies all the queries into five semantic topics, including autos domain, local domain, product domain, travel domain and ”others” domain.

• Ranking Model with topics generated by KNNbased clusters (Offline-KNN-Topical-RM): We simulate the idea of the KNN offline-2 model proposed in [7] and construct topics based on K nearest neighbors. 2

3.4 Experimental Results We use Normalized Discounted Cumulative Gain (NDCG) [9] as the evaluation metric in this paper. The number of clusters of our method, and of the baseline methods, is tuned by cross-validation on the training corpus. On SE-dataset, we used GBDT as the ranking model, and on TREC dataset, we tried Both GBDT and Rank-SVM. In table 1 and 2, we report the NDCG5 scores (averaged on five-fold cross-validations) of our method compared with the Single-RM, Semantic-RM, Hard-Clustering-RM, SoftClustering-RM and Offline-KNN-Topical-RM on TREC2003 and TREC2004 datasets, based on Rank-SVM and GBDT respectively. The results indicates that our method achieves much better relevance than all the baselines. We conduct ttest on the improvements, and the results indicate that the improvements of our method over other ranking methods are statistically significant (p-value< 0.05)

3.2 Generating Query Features In this paper, we generate a set of query dependent features by taking advantage of the ranking features of top pseudo feedbacks of the query. For each training query q ∈ T rain, we first retrieve a set of pseudo feedbacks, D(q) = {d1 , d2 , · · · , dT }, consisting of the top T documents ranked by a reference model (we use BM25 in this paper). Then we take the mean and variance of the ranking feature values from these T documents to generate our query dependent features.

2 In order to make the method more scalable, we tried a slight modification in our implementation. If there are more than 5000 queries in the training set, we will sample 5000 queries from the training set randomly, and only build clusters and train models for these selected queries.

3.3 Baseline approaches We compared our method with the following baseline approaches in experiments.

1595

Table 1: Results on TREC2003 data: five-fold averaged NDCG5 values of our method, Single-RM, SemanticRM, Hard-Clustering-RM, Soft-Clustering-RM and Offline-KNN-Topical-RM, using GBDT and Rank-SVM respectively Ranking Method GBDT Gain Rank-SVM Gain Single-RM 0.682 0.594 Semantic-Topical-RM 0.687 +0.7% 0.636 +7.1% Hard-Clustering-Topical-RM 0.691 +1.3% 0.628 +5.7% Soft-Clustering-Topical-RM 0.698 +2.3% 0.64 +7.7% Offline-KNN-Topical-RM 0.693 +1.6% 0.631 +6.3% Our method 0.721 +5.7% 0.684 +15%

Table 2: Results on TREC2004 data: five-fold averaged NDCG5 values of our method, Single-RM, SemanticRM, Hard-Clustering-RM, Soft-Clustering-RM and Offline-KNN-Topical-RM, using GBDT and Rank-SVM respectively Ranking Method GBDT Gain Rank-SVM Gain Single-RM 0.579 0.563 Semantic-Topical-RM 0.593 +2.4% 0.586 +4.1% Hard-Clustering-Topical-RM 0.595 +2.7% 0.583 +3.6% Soft-Clustering-Topical-RM 0.596 +2.9% 0.581 +3.5% Offline-KNN-Topical-RM 0.593 +2.4% 0.577 +2.5% Our method 0.621 +7.2% 0.614 +9.1%

[5] B. Bolstad, R. Irizarry, M. Astrand, and T. Speed. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19:185–193, 2003. [6] J. H. FRIEDMAN. Greedy function approximation: A gradient boosting machine. In Annals of Statistics 29, page 1189´lC1232, 2001. [7] X. Geng, T.-Y. Liu, T. Qin, A. Arnold, H. Li, and H.-Y. Shum. Query dependent ranking using k-nearest neighbor. In SIGIR, page 3, 2007. [8] R. Herbrich, T. Graepel, and K. Obermayer. Support vector learning for ordinal regression. In Proc. of ICANN, 1999. [9] K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of ir techniques. In ACM Transactions on Information Systems, 2002. [10] U. Lee, Z. Liu, and J. Cho. Automatic identification of user goals in web search. In WWW, pages 391–400, 2005. [11] T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In Proc. of SIGIR, 2007. [12] T.-Y. Liu, Y. Yang, H. Wan, H.-J. Zeng, Z. Chen, and W.-Y. Ma. Support vector machines classification with a very large-scale taxonomy. In SIGKDD, pages 36–43, 2005. [13] Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, and G. Sun. A general boosting method and its application to learning ranking functions for web search. In NIPS, 2007. [14] Z. Zheng, H. Zha, T. Zhang, O. Chapelle, K. Chen, and G. Sun. A regression framework for learning ranking functions using relative relevance judgments. In SIGIR, pages 287–294, 2007.

In figure 1, we plot the bar graph that gives the values of metric NDCG@K with K = 1, 5, 10 for our method and the baselines on SE-dataset. The result shows that our method is consistently better.

4.

CONCLUSIONS

In this paper, we explored improving the overall ranking performance by a divide-and-conquer approach to learn multiple specialized ranking functions for different types of queries. Compared with previous works that treat clustering and ranking as separate steps, our approach generates query partitions and specialized ranking models within a consistent framework, and the human annotated relevance grades are exploited to supervise the implicit clustering procedure in our model. Thus we expect our method to achieve better overall ranking performance compared with previous works. Experiments are conducted with several state-of-art baseline on two data-sets. The empirical results show that our method can significantly outperform these baselines on both datasets.

5.

REFERENCES

[1] Letor dataset website. http://research.microsoft.com/enus/um/beijing/projects/letor/. [2] S. M. Beitzel, E. C. Jensen, A. Chowdhury, and O. Frieder. Varying approaches to topical web query classification. In SIGIR, pages 783–784. ACM, 2007. [3] S. M. Beitzel, E. C. Jensen, O. Frieder, D. Grossman, D. D. Lewis, A. Chowdhury, and A. Kolcz. Automatic web query classification using labeled and unlabeled training data. In SIGIR, pages 581–582. ACM, 2005. [4] J. Bian, X. Li, F. li, H. Zha, and Z. Zheng. Ranking specialization for web search: A divide-and-conquer approach by using topical ranksvm. In Proc. of WWW, 2010.

1596

Optimizing unified loss for web ranking specialization

Optimizing Unified Loss for Web Ranking Specialization ... Another choice is to automatically learn the latent top- .... Algorithm 1 Overall Training pseudo code. 1.

327KB Sizes 0 Downloads 259 Views

Recommend Documents

Ranking with query-dependent loss for web search
Feb 4, 2010 - personal or classroom use is granted without fee provided that copies are not made or distributed ...... we will try to explore more meaningful query features and investigate their ... Overview of the okapi projects. In Journal of.

Best Practices for Optimizing Web Advertising ... - AccountingWEB
May 1, 2006 - Set Clear Objectives: Identify what business goals the campaign is designed to ... small number of people will see the ads at a tremendously high .... Yahoo! Movies. -. 100. 200. 300. 400. 500. 600. 700. 800. 900. 1. 3. 5. 7. 9.

Optimizing the update packet stream for web ... - Research at Google
Key words: data synchronization, web applications, cloud computing ...... A. Fikes, R. Gruber, Bigtable: A Distributed Storage System for Structured Data,. OSDI ...

Best Practices for Optimizing Web Advertising Effectiveness
May 1, 2006 - long enough for several best practices to emerge. ..... Aggregated Q1 2006 click-through-rate data for 136 large clients of ... Source: Dynamic Logic MarketNorms, based on benchmark analysis of 2,000-plus campaigns,.

Paraphrasing Adaptation for Web Search Ranking - ACL Anthology
4 Aug 2013 - (Quirk et al., 2004), model optimization (Zhao et al., 2009) and etc. But as far as we know, none of previous work has explored the impact of using a well designed paraphrasing engine for web search ranking task specifically. In web sear

FRank: A Ranking Method with Fidelity Loss - Microsoft
advantage in cases in which ground truths come from several annotators that ...... features reflect the business secrete of that search engine company, we would not ..... American Society for Information Science and Technology. 54(10) (pp.

FRank: A Ranking Method with Fidelity Loss - Microsoft
result, our new learning method named Fidelity Rank (FRank) combines the ... from click-through data. ...... Discovery and Data Mining (KDD), ACM, 2002.

FRank: A Ranking Method with Fidelity Loss
[email protected]. 2Microsoft Research Asia. 4F, Sigma Center, No. 49,. Zhichun Road, Haidian. District,. Beijing 100080, China. {tyliu, wyma}@microsoft.com.

Optimizing utilization of resource pools in web ... - Research at Google
Sep 19, 2002 - Modern web middleware platforms are complex software systems that expose ...... There is a vast body of research work in the area of analytical ...

Optimizing Web Services Performance by Differential ...
fact that the Web services are based on the XML-based ... performance of an XML parser based on the ... data (for example large arrays of floating point numbers ...

Web Image Retrieval Re-Ranking with Relevance Model
ates the relevance of the HTML document linking to the im- age, and assigns a ..... Database is often used for evaluating image retrieval [5] and classification ..... design plan group veget improv garden passion plant grow tree loom crop organ.

Framework Specialization Aspects
Figure 4 presents a design theme partially describing a special- ization aspect ... adding the concrete strategy to the set of strategies of the context and, hence ...

Ranking Web Sites with Real User Traffic
Feb 12, 2008 - We analyze the traffic-weighted Web host graph obtained from a large sample .... In particular, the well-known PageRank [10] and HITS [27] algorithms are able ..... a reference along with the best power-scaling fit, although a ...

Optimizing Network Coding Algorithms for Multicast Applications.pdf
PhD Thesis - Optimizing Network Coding Algorithms for Multicast Applications.pdf. PhD Thesis - Optimizing Network Coding Algorithms for Multicast ...

Optimizing Technology for Learning - MID NJ ASTD
Nov 6, 2014 - Registration and Networking ... an ELearning Strategy for the Future: Ten. Key Shifts to Watch” ... the new social network / communication plat-.

Optimizing Technology for Learning - MID NJ ASTD
Nov 6, 2014 - Advertising Marketing Communications at. Canadore College ... cations and marketing. He also holds the Cer- ... Email: [email protected].

Optimizing Performance for the Flash Platform - Adobe
Aug 21, 2012 - available only in Flash Player 10.1 for Windows Mobile. ..... A 250 ms refresh rate was chosen (4 fps) because many device ...... code, then the virtual machine spends significant amounts of time verifying code and JIT ...

Optimizing Constellations for Single-Subcarrier ...
limited systems results in a smaller power penalty than when using formats optimized for average power in peak-power limited systems. We also evaluate the modulation formats in terms of their ... In the absence of optical amplification, an IM/DD syst

Politics, Political Science, and Specialization
discipline of authors whose books are reviewed in ... Review of Books, or journals of political and ..... mining who did not know the identities of those agents who.

Framework Specialization Aspects
GUI components (e.g. MFCs - Microsoft Foundation Classes), In- tegrated Development ... framework, which either introduce new features or correct defects, are likely to imply ...... be suitable for certain type of systems (e.g. mobile phones with.

"into the past" "specialization" - GitHub
Agent. Agent. Agent. Entity. Agent. Data. DDS. Data. DDS netcdf_handler. 3.9.3 ascii. 4.1.3 dap_module. 3.9.2 netcdf_handler ascii dap_module.

Optimizing Performance for the Flash Platform - Adobe
Aug 21, 2012 - Movie quality . ... Chapter 6: Optimizing network interaction .... asynchronous operations, such as a response from loading data over a network.

Developmental specialization in the right intraparietal sulcus for the ...
Page 3 of 3. Symbolic task. In the symbolic task, two Arabic numer- als 1–9, measuring 200 pixels in height, were presented. side by side. Participants chose which side of the screen. contained the larger number. A total of 36 stimulus pairs. were

Optimizing CABAC for VLIW architectures
Figure 5: Renormalization and bit insertion normalizes the values of low and range in the interval. [0, 1] so that they are at least separated by a QUAR-.