Re-ranking Search Results based on Perturbation of Concept-Association Graphs Gaurav Chandalia and Rohini Srihari Computer Science, SUNY – University at Buffalo

With thanks to Matthew Beal for discussions on Matrix Perturbation theory

Objective 

Capture the semantics of the document corpus



Explore the inherent relationships between concepts (words) across all the documents



Effective algorithm that can exploit such a semantic representation to re-rank search results

4/29/2006

2

Motivation 

For advanced IR applications like question answering, traditional bag of words model is not enough



Semantic Representation  How such information can be leveraged and used as building blocks for advanced IR applications



Technique should…  Not be explicitly dependent on the query  Be domain independent

4/29/2006

3

Approach 











Process the document corpus to extract salient concepts and associations between the terms (offline) For a given query, retrieve the documents based on the Vector Space Model Consider the Concept-Association graph of a relevant subset of search results Perturb the graph and see if the resulting graph is still relevant to the query Re-rank the search results based on the amount of perturbation introduced in the original graph Raises two questions…  What is the nature of perturbation?  How to measure the change in the graph after perturbation?

4/29/2006

4

Approach… 



Nature of Perturbation  Perturb the original graph by adding associations between certain concepts  Associations are obtained from documents that are to be re-ranked Measure the change after Perturbation  Perturbed Subspace HITS algorithm  Considers projection of eigenvectors on subspace representing the original graph

4/29/2006

5

System Overview

4/29/2006

6

Concept-Association Graph 





Concepts  Named Entity objects representing items such as names of person, organization, etc…  Noun Groups  Arguments of General Events Associations  Capture relationships between the concepts such as affiliation associated with a person  General Events such as Subject-Verb-Object patterns Graph Construction  Represent the Concept-Association graph as an adjacency matrix of n x n concepts  Association between two concepts (i,j) is indicated by making the (i, j)th entry in the matrix 1 and 0 otherwise

4/29/2006

7

Re-ranking search results 



Perform link analysis on the Concept-Association graph to measure the effects of perturbation Algorithm is based on…  Hypertext Induced Topic Selection (HITS)  Identify authoritative web pages  Represent the Web as an adjacency matrix and use the iterative power method to compute the principal eigenvectors of the matrix  Subspace HITS  Identify authoritative web pages by projecting each eigenvector representing a hub or an authority on a subspace

4/29/2006

8

Perturbed Subspace HITS algorithm 



Compute the top k eigenvectors Va = (v1a, v2a, … vka) and the eigenvalues d1a, d1a, … dka of T = ATA, where A is the adjacency matrix of the original graph Compute the authority scores of T = ATA as xsa = ∑kp = 1 dpa ((vsa)T vpa)2



where s = 1 to k

For each document doc that is to be re-ranked  Initialize B = A. Perturb the graph B using the concepts and associations from doc b b b b  Compute the top k eigenvectors V = (v1 , v2 , … vk ) and the eigenvalues d1b, d1b, … dkb of S = BTB  Compute the change in authority scores for document doc with respect to the original graph in the following way: fs = ∑kp = 1 dpb ((vsb)T vpa)2 – xsa

4/29/2006

where s = 1 to k

9

Perturbed Subspace HITS algorithm… 

Re-rank the documents using Borda Count  Vectors representing the change in authority scores act as voters and documents act as candidates  Each voter ranks all the candidates  Ranks from all the voters are then combined to give a single ranked list of documents

4/29/2006

10

Dataset 



High Accuracy Retrieval of Documents (HARD) track 2003 Text REtrieval Conference (TREC) Results were evaluated using the standard TREC Evaluation Code

4/29/2006

NYT

APW

XIE

CR

FR

Total

No. of docs.

137,806

77,876

104,698

16,609

35,230

372,219

Size

750MB

245MB

310MB

147MB

330MB

1.7GB

11

Dataset… Query No.

4/29/2006

Query

Relevant Docs. in the collection

Docs. Retrieved in baseline run

33

Animal Protection

401

211

48

Y2K Crisis

562

301

51

Hate Crimes Prevention

168

140

65

Mad Cow Disease

145

126

69

Environmental Protection

513

101

70

Red Cross activities

111

90

77

Insect-Borne illnesses

194

25

84

Recent Earthquakes

86

43

99

Globalization and Democracy

399

170

102

Microsoft monopoly

285

249

116

Genetic Modification technology

200

116

146

NATO/UN Tension over Balkans crisis

305

131

147

Regional Economic integration

327

181

187

National Leadership Transactions

194

51

12

Results 

Query vs. Exact Precision graph HITS

4/29/2006

Perturbed Subspace HITS

13

Results… 

Query vs. Exact Precision at 200 documents retrieved graph Perturbed Subspace HITS

4/29/2006

14

Conclusions & Future Work 



Conclusions  Our approach is not explicitly dependent on the query  Exploits the information contained in the associations (semantic links) between concepts (words)  Re-ranking technique is based on blind feedback Future Work  Evaluation on a larger set of queries  Formalize the Perturbed Subspace HITS re-ranking algorithm  Improve the extraction of concepts and associations

4/29/2006

15

References 











J. Aslam and M. Montague. Models for metasearch. Proc. SIGIR 2001, September 2001. J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. 9th Ann. ACM SIAM Symp. Discrete Algorithms, pages 668–677. ACM, 1998. A. Y. Ng, A. X. Zheng, and M. I. Jordan. Link analysis, eigenvectors and stability. In Proc. 17th International Joint Conference on Artificial Intelligence, 2001. A. Y. Ng, A. X. Zheng, and M. I. Jordan. Stable algorithms for link analysis. In Proc. 24th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2001. G. W. Stewart and J. guang Sun. Matrix Perturbation Theory. Academic Press, Inc., London, UK, 1990. R. K. Srihari, W. Li, C. Niu, and T. Cornell. Semantex: A customizable intermediate level information extraction engine. To appear in the Journal of Natural Language Engineering, 12(4), 2006.

4/29/2006

16

Re-ranking Search Results based on Perturbation of ...

Apr 29, 2006 - search results. ▫. Perturb the graph and see if the resulting graph is still relevant to the query. ▫. Re-rank the search results based on the amount of perturbation .... Microsoft monopoly. Globalization and Democracy. Recent Earthquakes. Insect-Borne illnesses. Red Cross activities. Environmental Protection.

140KB Sizes 0 Downloads 68 Views

Recommend Documents

Improving Web Image Search by Bag-Based Reranking
such as mi-SVM can be readily incorporated into our bag-based reranking .... very likely that multiple relevant images are clustered in a pos- ..... The equality holds as the objective function is concave in ..... 32-GB random access memory).Missing:

Using Text-based Web Image Search Results ... - Semantic Scholar
top to mobile computing fosters the needs of new interfaces for web image ... performing mobile web image search is still made in a similar way as in desktop computers, i.e. a simple list or grid of ranked image results is returned to the user.

Large-scale discriminative language model reranking for voice-search
Jun 8, 2012 - The Ohio State University ... us to utilize large amounts of unsupervised ... cluding model size, types of features, size of partitions in the MapReduce framework with .... recently proposed a distributed MapReduce infras-.

Large-scale discriminative language model reranking for voice-search
Jun 8, 2012 - voice-search data set using our discriminative .... end of training epoch need to be made available to ..... between domains WD and SD.

Eye-Mouse Coordination Patterns on Web Search Results Pages
Apr 10, 2008 - Researchers have explored the potential of analyzing users' click patterns on web ... software, with a 17-inch screen at 1024x768. Participants.

EFFECTS OF RIDE MOTION PERTURBATION ON THE ...
Automotive Research Center (ARC), a research partnership between the. University of ... me during these years, but I plan to spend the rest of my life trying. iv ... Chapter 4 - Study 2: Analysis of the effects of vibration frequency, magnitude, ....

Perturbation Based Guidance for a Generic 2D Course ...
values in muzzle velocity and meteorological conditions (wind, air density, temperature), aiming errors of .... Predictive guidance has the potential of being very energy efficient and requiring low ..... Moreover, there are alternative methods to.

Impact Of Ranking Of Organic Search Results ... - Research at Google
Mar 19, 2012 - average, 50% of the ad clicks that occur with a top rank organic result are ... In effect, they ... Below is an illustration of a search results page.

Time-Series Linear Search for Video Copies Based on Compact ...
supported in part by the National Science Council of Taiwan under Grants NSC. 99-2221-E-415-011 and NSC 99-2631-H-001-020. C. Y. Chiu is with the Department of Computer Science and Information. Engineering ...... University, Taiwan, in 1997, and the

A Review on Search Based Software Engineering - International ...
and Bryan Jones in 2001, and provided an insight into the application of the metaheuristic search techniques to solve different ... estimation, through design, testing and to maintenance. Most of the .... There are only two key ingredients for the ap

a multimodal search engine based on rich unified ... - Semantic Scholar
Apr 16, 2012 - Google's Voice Actions [2] for Android, and through Voice. Search [3] for .... mented with the objective of sharing one common code base.

A Study on Dominance-Based Local Search ...
of moves to be applied is generally defined by a user-given parameter. .... tion of an open-source software framework for dominance-based multiobjective.