NIPS Learning Semantics 2014

Viewer
Transcript

Embedding Probabilistic Logic for Machine Reading Sebastian Riedel (University College London)

1

Overview Machine Reading & Reasoning … … with Probabilistic Logics and Embeddings Challenges Injecting Explanations Extracting Explanations

2

Machine Reading “Who works in London and is interested in NLP?

in(UCL,London)

interest(x,NLP),! worksFor(x,y),  in(y,London)

Relational DB

topic(Seb,NLP) worksFor(Seb,UCL)

[Kwiatkowski et al., 2013]

Narrow domain-specific schema

[Mintz et al., 2009]

Semantics Statistical NLP Syntax

Coreference

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 3

Machine Reading [Riedel et al., 2013] in(UCL,London)

“Who works in London and is interested in NLP?

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL)

Relational DB

interest(x,NLP),! worksFor(x,y),! in(y,London)

Semantics Wide universal schema Syntax

Coreference

Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 4

Semantics as Reasoning [Riedel et al., 2013] in(UCL,London)

“Who works in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]

Statistical Relational Learner and Reasoner

faculty-at(x,y): lecturer-at(x,y)

Wide universal schema Syntax

Coreference

Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 5

Benefit: Transitive Reasoning in(UCL,London)

“Who works in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]

Statistical Relational Learner and Reasoner

faculty-at(x,y): lecturer-at(x,y)

Wide universal schema Syntax

Coreference

Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 6

Benefit: More Coverage in(UCL,London)

“Who is faculty in London and interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9]

Statistical Relational Learner and Reasoner

faculty-at(x,y): lecturer-at(x,y)

Wide universal schema Syntax

Coreference

Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 7

Benefit: Code Reuse in(UCL,London)

“Who lives in London and is interested in NLP? interest(x,NLP),! worksFor(x,y),! in(y,London)

works-in-area-of(Seb,NLP) lecturer-at(Seb,UCL) worksFor(x,y): faculty-at(x,y) interest(x,y): works-in-area-of(x,y)[0.9] livesIn(x,z): worksFor(x,y),! locatedIn(y,z) [0.6]

Statistical Relational Learner and Reasoner

[Lao et al., 2011]

Wide universal schema Syntax

Coreference

Statistical NLP

”Sebastian Riedel works in the area of NLP and is now Lecturer at UCL“ 8

Reasoner and Learner Statistical Relational Learner and Reasoner

? 9

Probabilistic Logics Use (weighted) logics to define graphical models lecturer-at

prof-at

works-for

Examples Markov Logic  [Richardson and Domingos, 2006]

Bayesian Logic  Programs [Kersting , 2007]

10

Probabilistic Logics Use (weighted) logics to define graphical models lecturer-at

prof-at

works-for

Problems Inference Rule Learning

11

Matrix Factorization Think of database as a matrix or tensor lecturer-at

prof-at

works-for

1

1

1 1 1

1

12

Matrix Factorization Embed entity (pairs) in low dimensional vector spaces lecturer-at

prof-at

works-for

1

1

1 1 1

?? ??

1

?? ?? 13

Matrix Factorization Embed relations in low dimensional vector spaces

1

1

lecturer-at

1 1 1

?? ??

1

??

? ?

prof-at

? ?

works-for

? ?

?? 14

Matrix Factorization Find a matrix-matrix product that approximates observed DB

1

1

lecturer-at

1 1 1

?? ??

1

⇡

??

⇥

? ?

prof-at

? ?

works-for

? ?

?? 15

Matrix Factorization Or a non-linear function of this product

1

1

1 1 1

1

⇡

sigmoid

⇥

16

Matrix Factorization Low rank forces some 0 cells to become non-zero => prediction

1 1

1 .9

1 1

1 .9

⇡

sigmoid

⇥

[Nickel, Bordes, …] 17

Results for Relation Extraction [Riedel et al. 2013, NAACL] Averaged 11-point Precision/Recall 1 0.9 0.8

Precision

0.7 SU12 N F NF NFE

0.6 0.5 0.4 0.3 0.2 0.1 0

0.2

0.4

0.6

0.8

1

Recall

18

Facts

|P|

Challenge 1: Injecting Symbolic Rules

First-orde Formulae

KB

8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”

?

sigmoid

⇥

Figure 1: Injecting Logic into Matrix Factorization: G

entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 19

Challenge 1: Injecting Symbolic Rules

“a liquid turns into a solid ! when its temperature is ! lowered below its freezing point

?

sigmoid

⇥

20

Some Experiments “Zero-shot” learning Given: a lot of relational data, but not for worksFor Goal: given few of worksFor rules, learn to predict worksFor

Results (in MAP for several relations) Only rules: 0.23 Apply rules after factorization: 0.34 Apply rules before factorization: 0.43 Incorporate rules into training objective: 0.52

[Rocktaeschel et al. 2014, SP14] 21

Facts

|P|

Challenge 1: Injecting Symbolic Rules

First-orde Formulae

KB

8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”

?

sigmoid

⇥

Figure 1: Injecting Logic into Matrix Factorization: G

entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 22

Facts

|P|

Challenge 2: Extracting Explanations

First-orde Formulae

KB

8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “lecturers are employees!”

?

sigmoid

⇥

Figure 1: Injecting Logic into Matrix Factorization: G

entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In entities and relations to learn the embeddings such that th 23

Facts

|P|

Challenge 2: Extracting Explanations

First-orde Formulae

KB

8x, y : #2-unit-of-#1(x, y) ) organi Example: “Boeing and the Sikorsky Aircraf 8x, y : #1-city-in-#2(x, y) ) locati Example: “With 900,000 people, San Jose#1 “I returned Sebastian! because we know he is a lecturer! at UCL, which is in London,! so he most likely lives in London! …

?

sigmoid

⇥

Figure 1: Injecting Logic into Matrix Factorization: G

entity-pairs P and predicates/relations R, matrix factori embeddings that approximate the observed matrix. In [Thrun 1995, NIPS, Craven 1996, NIPS] that th entities and relations to learn the embeddings such 24

Summary Do semantics in a probabilistic relational reasoner Reasoner: matrix/tensor factorization (or other LV models) Challenges: inject explanations extract explanations

Do this for: deeper downstream tasks such as question answering, fact checking, machine comprehension We are hiring (thanks to the Paul G. Allen Foundation) 25

Thanks

26

Cold-Start Reinforcement Learning with Softmax ... - NIPS Proceedings