MRR: an Unsupervised Algorithm to Rank Reviews by Relevance Vinicius Woloszyn

Henrique D. P. dos Santos

et al.

Department of Computer Science Federal University of Rio Grande do Sul and Pontifical Catholic University of Rio Grande do Sul 2017 IEEE/WIC/ACM International Conference on Web Intelligence

Leipzig, August 24, 2017

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

1 / 21

Introduction

Many works address the problem of ranking documents by their relevance. Most of them rely on supervised algorithms such as classification and regression. Annotated: Neural Network, SVM Statistics: TF-IDF, Readability, POS-Tag

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

2 / 21

Introduction

The quality of results produced by supervised algorithms is dependent on the existence of a large, domain-dependent training data set. Amazon, Yelp Netflix, IMDB

Unsupervised methods are an attractive alternative to avoid the labor-intense and error-prone task of manual annotation of training datasets.

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

3 / 21

MRR - Ranking documents by their relevance

Graph-based Vertices are the documents (review), and the edges are defined in terms of the similarity between pairs of documents (ratings score and textual). f (u, v ) = α ∗ sim txt(u, v ) + (1 − α) ∗ sim star(u, v )

(1)

α : tune similarity function

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

4 / 21

MRR - Ranking documents by their relevance

Similarity Functions Textual Cosine similarity of TF-IDF vectors sim txt(u, v ) = cos(tfidf (t.t), tfidf (v .t))

(2)

Stars Euclidean distance normalized by Min-Max scaling sim star (u, v ) = 1 −

|u.rs − v .rs| − min(rs) max(rs) − min(rs)

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

(3)

5 / 21

MRR - Ranking documents by their relevance

Graph Centrality Hypothesis: a relevant document has a high centrality index since it is similar to many other documents. Centrality index produces a ranking of vertices’ importance, indicating the ranking of the most relevant document.

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

6 / 21

MRR - Graph-Specific Similarity Threshold

Graph Pruning Centrality is dependent on the existence of edges between nodes. Prune the graph based on a minimum similarity between review. E : mean of graph similarity ( W 0 (u, v ) =

1, 0,

f (u, v ) ≥ E ∗ β otherwise

(4)

β : tune prune function

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

7 / 21

Main steps of the MRR algorithm

♦♦ 3 ♥♥

0.55

0.9

7

♠♠ 4 ♥ ♦♦♦ ♣♣

0.55 0.01

2

0.45

(A) Similarity Function

♠♠ 3 ♥ ♦♦

♦ 2 ♣♣

8

0.8

0.15

0.08

♠♠ 4 ♥

0.8

♠♠ 4 ♥ ♦♦♦ ♣♣

0.32

♦♦ 3 ♥♥

7

0

0.85

0.9

8

0.8

♠♠ 4 ♥ 0.9

0.8

0

♦ 2 ♣♣

♦♦ 3 ♥♥

♠♠ 4 ♥

0.85

0.9

2

♠♠ 3 ♥ ♦♦

0.34

♠♠ 4 ♥ ♦♦♦ ♣♣ ♠♠ 3 ♥ ♦♦

♦ 2 ♣♣ 0.08

(B) Graph-Speci€c Threshold

0.22

(C) PageRank Scores

(A) Builds a similarity graph G between pairs of documents; (B) Prune by removing all edges lower than the similarity threshold; (C) Employ PageRank to obtain the centrality scores;

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

8 / 21

MRR Algorithm Algorithm 1 - MRR Algorithm (R, α, β): S 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13:

for each u, v ∈ R do W [u, v ] ← α ∗ sim txt(u, v )+(1-α) ∗ sim star(u, v ) end for E ← mean(W ) for each u, v ∈ R do if W [u, v ] ≥ E ∗ β then W 0 [u, v ] ← 1 else W 0 [u, v ] ← 0 end if end for S ← PageRank(W 0 ) Return S

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank Reviews Leipzig, by Relevance August 24, 2017

9 / 21

Experiment Design

Dataset: reviews (rating score and text) of electronics and books from the Amazon website. Gold Standard: Human perception of helpfulness: h(r ∈ R) =

vote+ (r ) vote+ (r ) + vote− (r )

(5)

Metric: Normalized Discounted Cumulative Gain as NDCG@n

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

10 / 21

Amazon Dataset

Electronics Books Votes 48.20 (± 302.84) 29.71 (± 73.58) Positive 40.12 (± 291.99) 20.60 (± 64.18) Negative 8.08 (± 22.27) 9.11 (± 21.44) Rating 3.73 (± 1.50) 3.41 (± 1.54) Words 350.32 (± 402.02) 287.44 (± 273.75) Products 383 461 Total 19,756 24,234 Table: Profiling of the Amazon dataset.

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

11 / 21

MRR Evaluation

Experiments: Baselines comparison; Graph-Specific Threshold Assessment; Parameter Sensibility; and Run-time Performance.

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

12 / 21

Experiment Design

Baselines: TSUR et al. (2009) as REVRANK; Core Virtual Review (200 most frequent words), Rank by similarity distance to Core

Wu et al. (2011) as PR HS LEN; Sentences similarity based on POS-Tags, PageRank, Hits and Length

SVM Regression: a) textual features TF-IDF and the star score, b) the same features used by Wu et al. (2011)

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

13 / 21

Relevance Ranking Assessment

SVM WU SVM TFIDF REVRANK PR HS LEN MRR

NDCG@1

NDCG@5

0.80770 0.85539 0.66052 0.72689 0.79877

0.91817 0.93119 0.68172 0.77131 0.81876

Table: Mean Performance on Book Reviews

MRR statistically outperformed all unsupervised baselines

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

14 / 21

Relevance Ranking Assessment

SVM WU SVM TFIDF REVRANK PR HS LEN MRR

NDCG@1

NDCG@5

0.76416 0.88986 0.67903 0.87434 0.89403

0.91535 0.94621 0.72133 0.87184 0.89246

Table: Mean Performance on Electronic Reviews

MRR statistically outperformed all unsupervised baselines MRR is comparable to supervised methods

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

15 / 21

Graph-Specific Threshold Assessment

MRR performance is always better using a Graph-Specific threshold.

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

16 / 21

Parameter Sensibility: α and β

α in all settings had a low influence (4%) β produced the highest variation (17%). Nevertheless when 0.8 ≤ β ≤ 0.9, the MRR varying only 6% .

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

17 / 21

Run-time Assessment Time required for producing a ranking for 383 products (log scale)

MRR presents a significantly lower running time Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

18 / 21

Final Remarks

Contributions: Unsupervised method: does not depend on an annotated training set; Faster than other graph-centrality methods; It performs well in different domains (e.g. closed vs. open-ended); Significantly superior to the unsupervised baselines, and comparable to a supervised approach in a specific setting.

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

19 / 21

Further Work

Next steps: Others clustering techniques for graph; Methods to select the most relevant reviews; Segmented Bushy Path widely explored in text summarization;

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

20 / 21

Thanks

Thank You! Question? source: https://github.com/vwoloszyn/MRR contact: [email protected]

Vinicius Woloszyn, Henrique D. P. dos Santos,MRR: et al.an(UFRGS) Unsupervised Algorithm to Rank ReviewsLeipzig, by Relevance August 24, 2017

21 / 21

MRR: an Unsupervised Algorithm to Rank Reviews by ... - GitHub

Next steps: Others clustering techniques for graph;. Methods to select the most relevant reviews;. Segmented Bushy Path widely explored in text summarization;. Vinicius Woloszyn, Henrique D. P. dos Santos, et al. (UFRGS). MRR: an Unsupervised Algorithm to Rank Reviews by Relevance. Leipzig, August 24, 2017. 20 / 21 ...

290KB Sizes 0 Downloads 307 Views

Recommend Documents

Aggregating Reviews to Rank Products and ... - Research at Google
Wall Street Journal publicized that the average rating for top review sites is an astoundingly positive 4.3 out of 5 stars ... have different rating scales (1-5 stars, 0-10 stars, etc.) ... Proceedings of the Fourth International AAAI Conference on W

Micropinion Generation: An Unsupervised Approach to ... - CiteSeerX
unsupervised, it uses a graph data structure that relies on the structural redundancies ..... For example, “Pros: battery, sound; Cons: hard disk, screen”. Since we ...

The Algorithm Design Manual - GitHub
design form one of the core practical technologies of computer science. .... placed. Degree of difficulty ratings (from 1 to 10) have been assigned to all ... Updating a book dedication after ten years focuses attention on the effects of time. ......

An improved memetic algorithm using ring neighborhood ... - GitHub
4, 5, 6, 7 that the con- vergence to the known optimal result of test functions is very fast and most test functions have been converged after around. 1 × 105 FEs.

An improved memetic algorithm using ring neighborhood ... - GitHub
Nov 29, 2013 - The main motivation of using ring neighborhood topology is to provide a good ... mine the choice of solutions for local refinements, by utiliz- ...... 93,403 g08. 2,755. 2,990 g09. 13,455. 23,990 g10. 95,788. 182,112 g11. 1,862.

The DES Algorithm Illustrated - GitHub
The DES (Data Encryption Standard) algorithm is the most widely used encryption algorithm in the world. For many years, and among many people, "secret code making" and DES have been synonymous. And despite the recent coup by the Electronic Frontier F

MRR COLOURINGCONTEST11.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. MRR ...

An Introduction to BigQuery - GitHub
The ISB-CGC platform includes an interactive Web App, over a Petabyte of TCGA data in Google Genomics and Cloud Storage, and tutorials and code ...

A Fast and Efficient Algorithm for Low-rank ... - Semantic Scholar
The Johns Hopkins University [email protected]. Thong T. .... time O(Md + (n + m)d2) where M denotes the number of non-zero ...... Computer Science, pp. 143–152 ...

A Fast and Efficient Algorithm for Low-rank ... - Semantic Scholar
republish, to post on servers or to redistribute to lists, requires prior specific permission ..... For a fair comparison, we fix the transform matrix to be. Hardarmard and set .... The next theorem is dedicated for showing the bound of d upon which

Ripple Protocol Consensus Algorithm Review - GitHub
May 11, 2015 - 1. Reviewed white papers and development documentation at https://ripple. com. 2. .... denial of service due to the Ripple network being unable to process transactions, ..... https:// download.wpsoftware.net/bitcoin/pos.pdf. 15 ...

Emscripten: An LLVM-to-JavaScript Compiler - GitHub
May 14, 2013 - Emscripten, or (2) Compile a language's entire runtime into ...... html.) • Poppler and FreeType: Poppler12 is an open source. PDF rendering ...

Rank as an inherent incentive
May 15, 2012 - a Indiana University, School of Public and Environmental Affairs, 1315 East 10th St., Bloomington, IN 47405, USA ... wealth or status by itself is a major motivator.1 Recently, economic ..... Trade University were invited to participat

Emscripten: An LLVM-to-JavaScript Compiler - GitHub
Apr 6, 2011 - written in languages other than JavaScript on the web: (1). Compile code ... pile that into JavaScript using Emscripten, or (2) Compile a ... detail the methods used in Emscripten to deal with those ..... All the tests were run on a Len

Online Vape Reviews By License To Vape.pdf
Website:http://www.licensetovape.com/. Google Site: https://sites.google.com/site/licensetovapeshops/. Google Folder: https://goo.gl/q24tVj. Twitter: https://twitter.com/mrmarklbanks. https://twitter.com/licensetovape. https://plus.google.com/+Licens

An algorithm portfolio based solution methodology to ...
9. Subassem bly. 10. CD Drive. 11. Laptop A. Assembly. 13. Laptop B. Assembly. 14 ... Supply chain for laptop family manufactured through PSC. S.R. Yadav et ...

An Efficient Geometric Algorithm to Compute Time ... - IEEE Xplore
An Efficient Geometric Algorithm to Compute Time-optimal trajectories for a Car-like Robot. Huifang Wang, Yangzhou Chen and Philippe Sou`eres.

An Algorithm to Construct Super-Symmetric Latin ...
Abstract. Literature shows that there are several ways of generating Latin squares, but there is not enough implementation about Super-symmetric Latin squares.

An Alternative Algorithm to Multiply a Vector by a ...
ticular, we do not analyse the possible benefits of automata reordering ..... to do before integrate this new algorithm in a new version of the PEPS software tool.

An Alternative Algorithm to Multiply a Vector by a ...
rently used by Stochastic Petri Nets and Performance Evaluation Process Alge- bra solvers. ..... Tool Performance Analysis using Stochastic Automata Networks.

An Alternative Algorithm to Multiply a Vector by a ...
use Tensor (or Kronecker) Algebra to represent the infinitesimal generator of the underlying Markov chain. Such tensor .... notation2 employed by the PEPS tool [5]. ...... A data structure for the efficient Kronecker solution of GSPNs. In.

No Unsupervised Thinking: How to increase ... - MarketingExperiments
(Logos and identifying marks have been concealed.) Recoup the ... For this test we couldn't remove the bottom objects, but you should remove as many ... In research project after research project, we see the same direct, positive correlation.