Unifying Personalized PageRank and Prolog William W ...

Viewer
Transcript

Unifying Personalized PageRank and Prolog

William W. Cohen with: William Yang Wang, Katie Mazaitis, Einat Minkov, Ni Lao, Tom Mitchell & others

Machine Learning Dept. and Language Technologies Inst. School of Computer Science Carnegie Mellon University

My History Machine Learning

Representation languages: DBs, KR

Text cat, IR, IE

History 82

94

1982/1984: Ehud Shapiro’s thesis: –  MIS: Learning logic programs as debugging an empty Prolog program –  Thesis contained 17 figures and a 25-page appendix that were a full implementation of MIS in Prolog –  Incredibly elegant work

96

•  “Computer science has a great advantage over

84 86 88 90 92

98 00 04 08 12

• 

other experimental sciences: the world we investigate is, to a large extent, our own creation, and we are the ones to determine if it is simple or messy.”

History 82 84 86 88 90 92 94 96 98 00 04 08 12

•  Grad school Rutgers, job at AT&T •  Worked in group doing KR, DB, learning, information retrieval, … •  My work: learning logical (description-logic-like, Prolog-like, rule-based) representations that model large noisy real-world datasets.

History 82 84 86 88 90 92 94 96 98 00 04 08 12

•  The web takes off –  as predicted by William Gibson

•  IR folks start looking at retrieval and questionanswering with the Web •  Alon Halevy (DB guy) starts the Information Manifold project to integrate data on the web –  VLDB 2006 10-year Best Paper Award for 1996 paper on IM •  I started got very interested in information integration….

History 82 84 86 88 90 92 94 96 98 00

•  As the world of computer science gets richer and more complex, computer science can no longer limit itself to studying “our own creation”. •  Tension exists between –  Elegant theories of representation –  The not-so-elegant real world that is being represented

04 08 12

•  Concise logical representations often “don’t fit” complex realworld data

History 82 84 86 88 90 92 94 96 98 00 04 08 12

•  The beauty of the real world is its complexity….

History 82 84 86 88 90 92 94 96 98 00 04 08 12

•  The web takes off –  as predicted by William Gibson

•  IR folks start looking at retrieval and questionanswering with the Web •  Alon Halevy (DB guy) starts the Information Manifold project to integrate data on the web –  VLDB 2006 10-year Best Paper Award for 1996 paper on IM •  I started got very interested in information integration….

WHIRL language:

SELECT R.a,S.a,S.b,T.b FROM R,S,T WHERE R.a~S.a and S.b~T.b Link items as needed by Q

Incrementally produce a ranked list of possible links, with “best matches” first. User (or downstream process) decides how much of the list to generate and examine.

(~ TFIDF-similar)

Query Q

R.a

S.a

S.b

T.b

Anhai

Anhai

Doan

Doan

Dan

Dan

Weld

Weld

William

Will

Cohen

Cohn

Steve

Steven

Minton

Mitton

William

David

Cohen

Cohn

History 82 84 86 88 90 92 94 96 98 00 04 08 12

•  Alon Halevy (DB guy) starts the Information Manifold project to integrate data on the web –  VLDB 2006 10-year Best Paper Award for 1996 paper on IM •  William Cohen (ML guy) wrote WHIRL system, bridging KR/DB ideas with a key IR idea: integration by reasoning about the similarity of strings •  Combining complex models of similarity and logic –  SIGMOD 2008 10-Year Best Paper Award for 1998 Paper on WHIRL

Beyond TFIDF: graph similarity 82 84

“William W. Cohen, CMU”

86 88 90 92 94 96 98 00 04 08 12

cohen dr

william

w

“Dr. W. W. Cohen”

“Christos Faloutsos, CMU”

cmu

“George H. W. Bush” “George W. Bush”

Personal Info Management as Similarity Queries on a Graph Einat Minkov, Univ Haifa [SIGIR 2006, EMNLP 2008, TOIS 2010]

NSF

Sent To

Term In Subject

William graph proposal CMU 6/17/07 6/18/07

[email protected]

Beyond TFIDF: graph similarity 82 84 86 88 90 92 94 96 98 00 04 08 12

•  Personalized PageRank aka Random Walk with Restart: –  Similarity measure for nodes in a graph, analogous to TFIDF for text in a WHIRL database

–  natural extension to PageRank –  amenable to learning parameters of the walk (gradient search, w/ various optimization metrics): •  Toutanova, Manning & NG, ICML2004; Nie et al, WWW2005; Xi et al, SIGIR 2005 –  very fast to compute –  queries: Given type t* and node x, find y:T(y)=t* and y~x Given type t* and nodes X, find y:T(y)=t* and y~X

Tasks can be reduced to similarity queries Person name disambiguation

[ term “andy” file msgId ] “person”

Threading

q  What are the adjacent messages in this thread? q  A proxy for finding “more messages like this one”

Alias finding

What are the email-addresses of Jason ?...

[ file msgId ] “file” [ term Jason ] “email-address”

Meeting attendees finder

Which email-addresses (persons) should I notify about this meeting?

[ meeting mtgId ] “email-address”

Results on one task + Learning

100%

80%

Recall

PERSON NAME DISAMBIGUATION

Mgmt. game

60%

40%

20%

0% 1

2

3

4

5

6

Rank

7

8

9

10

Beyond TFIDF: graph similarity 82 84 86

•  Personalized PageRank aka Random Walk with Restart: –  Given type t* and nodes X, find y:T(y)=t* and y~X

88 90 92 94 96 98 00 04 08 12

•  New and better learning methods –  richer parameterization –  faster PPR inference –  structure learning

•  Other tasks: –  relation-finding in parsed text –  information management for biologists –  inference in large noisy knowledge bases –  work with Ni Lao (formerly CMU, now Google)

History Machine Learning

Representation languages: DBs, KR

Linguistic similarity: NLP, IE, IR

Machine Learning

Representation languages: DBs, KR

Linguisticègraph similarity: NLP, IE, IR

Machine Learning

Representation languages: DBs, KR

????

Linguisticègraph similarity: NLP, IE, IR

Unifying Personalized PageRank and Prolog: ProPPR

William Yang Wang, Katie Mazaitis

Sample ProPPR program….

Horn rules

features of rules

D’oh! This is a graph!

.. and search space…

•  Score for a query soln (e.g., “Z=sport” for “about(a,Z)”) depends on probability of reaching a ☐ node* •  learn transi=on probabili=es based on features of the rules •  implicit “reset” transi=ons with (p≥α) back to query node •  Looking for answers supported by many short proofs “Grounding” size is O(1/αε)

… ie independent of DB size è fast approx incremental inference (Reid,Lang,Chung, 08) Learning: supervised variant of personalized PageRank (Backstrom & Leskovic, 2011)

*Exactly as in Stochastic Logic Programs [Cussens, 2001]

Sample Task: Cita=on Matching •  Task: •  cita=on matching (Alchemy: Poon & Domingos). •  Dataset: •  CORA dataset, 1295 cita=ons of 132 dis=nct papers. •  Training set: sec=on 1-‐4. •  Test set: sec=on 5. •  ProPPR program: •  translated from corresponding Markov logic network (dropping non-‐Horn clauses) •  # of rules: 21.

Task: Cita=on Matching

Time: Cita=on Matching vs Alchemy

“Grounding” is independent of DB size

Accuracy: Cita=on Matching

Our rules UW rules

AUC scores: 0.0=low, 1.0=hi w=1 is before learning

It gets becer….. •  Learning uses many example queries •  e.g: sameCitation(c120,X) with X=c123+, X=c124-, … •  Each query is grounded to a separate small graph (for its proof) •  Goal is to tune weights on these edge features to optimize RWR on the query-graphs. •  Can do SGD and run RWR separately on each query-graph •  Graphs do share edge features, so there’s some synchronization needed

Learning can be parallelized by splidng on the separate “groundings” of each query

Another Sample Task

Lao: A learned random walk strategy is a weighted set of random-walk “experts”, each of which is a walk constrained by a path (i.e., sequence of relations) Recommending papers to cite in a paper being prepared 1) papers co-cited with on-topic papers

6) approx. standard IR retrieval 7,8) papers cited during the past two years

12-13) papers published during the past two years

Another study: learning inference rules for a noisy KB (Lao, Cohen, Mitchell 2011)

AthletePlays ForTeam HinesWard

Steelers

TeamPlays InLeague

AthletePlaysInLeague ?

NFL

IsA PlaysIn

American isa-1

Synonyms of the query team

•  Paths learned are like ProPPR rules •  …but they are learned separately for each rela=on type, and one learned rule can’t call another athletePlaySport(Athlete,Sport) ç onTeam(Athlete,Team), teamPlaysSport(Team,Sport) teamPlaysSport(Team,Sport) ç memberOf(Team,Conference), hasMember(Conference,Team2), plays(Team2,Sport). teamPlaysSport(Team,Sport) ç onTeam(Athlete,Team), athletePlaysSport(Athlete,Sport)

•  Paths learned are like ProPPR rules •  …but they are learned separately for each rela=on type, and one learned rule can’t call another athletePlaySportViaRule(Athlete,Sport) ç onTeamViaKB(Athlete,Team), teamPlaysSportViaKB(Team,Sport) teamPlaysSportViaRule(Team,Sport) ç memberOfViaKB(Team,Conference), hasMemberViaKB(Conference,Team2), playsViaKB(Team2,Sport). teamPlaysSportViaRule(Team,Sport) ç onTeamViaKB(Athlete,Team), athletePlaysSportViaKB(Athlete,Sport)

Experiment: •  Take top K paths for each predicate learned by Lao’s PRA •  (I don’t know how to do structure learning for ProPPR yet) •  Convert to a mutually recursive ProPPR program •  Train weights on entire program (~=800 rules, 12k queries)

athletePlaySport(Athlete,Sport) ç onTeam(Athlete,Team), teamPlaysSport(Team,Sport) athletePlaySport(Athlete,Sport) ç athletePlaySportViaKB(Athlete,Sport) teamPlaysSport(Team,Sport) ç memberOf(Team,Conference), hasMember(Conference,Team2), plays(Team2,Sport). teamPlaysSport(Team,Sport) ç onTeam(Athlete,Team), athletePlaysSport(Athlete,Sport) teamPlaysSport(Team,Sport) ç teamPlaysSportViaKB(Team,Sport)

Joint Inference for Rela=on Predic=on •  •  •  •  • 

Task: link predic=on. Dataset: a subset of 19,527 beliefs from NELL. Training set: 12,331 queries. Test set: 1,185 queries. # Rules: 797.

You can do more with ProPPR…

Machine Learning

Representation languages: DBs, KR

ProPPR

Linguisticègraph similarity: NLP, IE, IR

•  Semantically simple •  Extends PPR and Prolog •  Scalable and flexible: •  Applicable to very large databases even with arbitrary recursion in a logic program •  Easily parallelizable learning-to-perform-PPR method o  Not (yet) fast o  Learned probabilities are about a proof process on a logic program, not about state of the world

Basket-Sensitive Personalized Item Recommendation - Hady W. Lauw

Indexable Bayesian Personalized Ranking for ... - Hady W. Lauw

Unifying Self- and Other-Repair

Unifying Suspension and Granular Rheology - Physics (APS)

Unifying Service- and Aspect-Oriented Software Development

$pdf-1450\fundamental-accounting-principles-by-william-w-pyle-john ...$

pdf-1450\fundamental-accounting-principles-by-william-w-pyle-john ...

123451440-Como-Liberar-Sus-Poderes-Psiquicos-William-W ...

Vibraciones MecÃ¡nicas (Schaum) - William W. Seto - 1ed.pdf ...

Distributed PageRank Computation Based on Iterative ... - CiteSeerX

Prolog based Description Logic Reasoning

Personalized Medicine.pdf

Google Search & Personalized News

google pagerank algorithm pdf

PageRank for Product Image Search - eSprockets

Efficient Computation of PageRank

Aspectual Services: Unifying Service- and Aspect ...