Word Context Entropy Kenneth Heafield Google Inc

January 16, 2008

Example code from Hadoop 0.13.1 used under the Apache License Version 2.0 and modified for presentation. Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

1 / 15

1

Problem Context Entropy

2

Implementation Streaming Entropy Reducer Sorting

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

2 / 15

Problem

Word Weighting Idea Measure how specific a word is Applications Query refinement Automatic tagging Example Specific Generic

airplane 5.0 a 9.8

Kenneth Heafield (Google Inc)

whistle 4.2 is 9.6

purple 5.3 any 8.7

pangolin 1.6 from 9.6

Word Context Entropy

January 16, 2008

3 / 15

Problem

Context

Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock

Quotes copied from fortune computers file Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

4 / 15

Problem

Context

Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock Computer A, scientist, the, science, known, science Quotes copied from fortune computers file Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

4 / 15

Problem

Context

Neighbors Idea Non-specific words appear in random contexts. Example A bug in the code is worth two in the documentation. A complex system that works is invariably found to have evolved from a simple system that works. A computer scientist is someone who fixes things that aren’t broken. I’m still waiting for the advent of the computer science groupie. If I’d known computer science was going to be like this, I’d never have given up being a rock ’n’ roll star. A bug, complex, from, simple, computer, being, rock Computer A, scientist, the, science, known, science Quotes copied from fortune computers file Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

4 / 15

Problem

Context

Context Distribution Cambridge

0.4 0.3

Ambiguous

0.2

Closer to equal

0.1 0

Boston

City

England

MA

University

Attleboro

0.4 0.3

Just a city

0.2

Spiked

0.1 0

Boston

City

England

Kenneth Heafield (Google Inc)

MA

University

Word Context Entropy

January 16, 2008

5 / 15

Problem

Entropy

Entropy

Definition Measures how uncertain a random Xvariable N is: Entropy (N) = − p (N = n) log2 p (N = n) n

Properties Minimized at 0 when only one outcome is possible Maximized at log2 k when k outcomes are equally probable

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

6 / 15

Problem

Entropy

Context Distribution Entropy Cambridge

0.4 0.3 0.2 0.1 0

Boston

City

England

MA

University

Attleboro

0.4 0.3 0.2 0.1 0

Boston

City

England

Kenneth Heafield (Google Inc)

MA

University

n Boston City England MA University Entropy

p 0.1 0.2 0.3 0.2 0.2

−p log2 p 0.332 0.464 0.521 0.464 0.464 2.246

x Boston City England MA University Entropy

p 0.2 0.3 0.1 0.3 0.1

−p log2 p 0.464 0.521 0.332 0.521 0.332 2.171

Word Context Entropy

January 16, 2008

7 / 15

Problem

Summary

Summary

Goal Measure how specific a word is Approach 1

Count the surrounding words

2

Normalize to make a probability distribution

3

Evaluate entropy

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

8 / 15

Implementation

All At Once Implementation Mapper outputs key word and value neighbor . Reducer

1 2 3

Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .

Example Reduce Values Hash Table Normalize Entropy

City, Boston, City, MA, England, City, England City→ 3, Boston→ 1, MA→ 1, England→ 2 City→ 37 , Boston→ 17 , MA→ 71 , England→ 27 City→.523, Boston→.401, MA→.341, England→.401

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

9 / 15

Implementation

All At Once Implementation Mapper outputs key word and value neighbor . Reducer

1 2 3

Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .

Example Reduce Values Hash Table Normalize Entropy

City, Boston, City, MA, England, City, England City→3, Boston→1, MA→1, England→2 3, 1, 1, 2 .523, .401, .341, .401

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

9 / 15

Implementation

All At Once Implementation Mapper outputs key word and value neighbor . Reducer

1 2 3

Counts each neighbor using a hash table. Normalizes counts. Computes entropy and outputs key word, value entropy .

Example Reduce Values Hash Table Normalize Entropy

City, Boston, City, MA, England, City, England City→3, Boston→1, MA→1, England→2 3, 1, 1, 2 .523, .401, .341, .401

Issues - Too many neighbors of “the” to fit in memory. Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

9 / 15

Implementation

Two Phases Implementation 1

Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

10 / 15

Implementation

Two Phases Implementation 1

Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.

2

Entropy Mapper is Identity. All counts for word go one Reducer. Reducer buffers counts, normalizes, and computes entropy.

Issues + Entropy Reducer needs only counts in memory. - There can still be a lot of counts.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

10 / 15

Implementation

Streaming Entropy

Streaming Entropy Observation Normalization and entropy can be Xcomputed simultaneously. Entropy (N) = − p (N = n) log2 p (N = n)

(1)

n

X c(n)

(log2 c(n) − log2 t) t n  X  c(n) = log2 t − log2 c(n) t n 1X = log2 t − c(n) log2 c(n) t n =−

(2) (3) (4)

Moral Provided counts c(n), Reducer need only remember total t and a sum. Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

11 / 15

Implementation

Streaming Entropy

Two Phases with Streaming Implementation 1

Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.

2

Entropy Mapper is Identity. All counts for word go one Reducer. Reducer computes streaming entropy.

Issues + Constant memory Reducer.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

12 / 15

Implementation

Streaming Entropy

Two Phases with Streaming Implementation 1

Count Mapper outputs key (word, neighbor ) and empty value. Reducer counts values. Then it outputs key word and value count.

2

Entropy Mapper is Identity. All counts for word go one Reducer. Reducer computes streaming entropy.

Issues + Constant memory Reducer. - Not enough disk to store counts thrice on HDFS.

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

12 / 15

Implementation

Word A Foo A A A Alice The A A

Reducer Sorting

Neighbor Plane Bar Bird Plane The Bob Problem Plane The

Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

13 / 15

Implementation

Word A Foo A A A Alice The A A

Neighbor Plane Bar Bird Plane The Bob Problem Plane The

Kenneth Heafield (Google Inc)

Sort



Word A A A A A A Alice Foo The

Reducer Sorting

Neighbor Bird Plane Plane Plane The The Bob Bar Problem

Word Context Entropy

January 16, 2008

13 / 15

Implementation

Word A Foo A A A Alice The A A

Neighbor Plane Bar Bird Plane The Bob Problem Plane The

Kenneth Heafield (Google Inc)

Reducer Sorting

Word A

Neighbor Bird

Reducer Output →Reduce→A,1

A A A

Plane Plane Plane

→Reduce→A,3

A A

The The

→Reduce→A,2

Alice

Bob

→Reduce→Alice,1

Foo

Bar

→Reduce→Foo, 1

The

Problem

→Reduce→The, 1

Sort



Word Context Entropy

January 16, 2008

13 / 15

Implementation

Reducer Sorting

Word A

Neighbor Bird

Call

Output

A A A

Plane Plane Plane

A A

The The

→Reduce

Alice

Bob

→Reduce →(A;1,3,2)

→Reduce ↓(A;1)

Word A Foo A A A Alice The A A

Neighbor Plane Bar Bird Plane The Bob Problem Plane The

→Reduce ↓(A;1,3)

Sort



↓(A;1,3,2) ↓(Alice;1) Foo

Bar

→Reduce →(Alice;1) ↓(Foo;1)

The

Problem

→Reduce →(Foo;1) ↓(The;1)

close() Kenneth Heafield (Google Inc)

Word Context Entropy

→Close

→(The;1) January 16, 2008

13 / 15

Implementation

Reducer Sorting

Recall Streaming Entropy Observation Normalization and entropy can be Xcomputed simultaneously. Entropy (N) = − p (N = n) log2 p (N = n)

(5)

n

=−

X c(n)

(log2 c(n) − log2 t) t  X  c(n) log2 c(n) = log2 t − t n 1X c(n) log2 c(n) = log2 t − t n

(6)

n

(7) (8)

Moral Computing t and a sum can be done in parallel. Kenneth Heafield (Google Inc)

Word Context Entropy

January 16, 2008

14 / 15

Implementation

Word A

Neighbor Bird

A A A

Plane Plane Plane

A A

The The

Alice

Bob

Call

Reducer Sorting

Output

→Reduce ↓(A;1)

→Reduce ↓(A;1,3)

→Reduce ↓(A;1,3,2)

→Reduce →(A;1,3,2) ↓(Alice;1)

Foo

Bar

→Reduce →(Alice;1) ↓(Foo;1)

The

Problem

→Reduce →(Foo;1) ↓(The;1)

close()

→Close

Kenneth Heafield (Google Inc)

→(The;1) Word Context Entropy

January 16, 2008

15 / 15

Implementation

Word A

Neighbor Bird

Call

A A A

Plane Plane Plane

A A

The The

→Reduce

Alice

Bob

→Reduce

Reducer Sorting

Output

→Reduce ↓(A,1,0)

→Reduce ↓(A,4,4.7)

↓(A,6,6.7) →(A,(6,6.7))

↓(Alice,1,0) Foo

Bar

→Reduce

→(Alice,(1,0))

↓(Foo,1,0) The

Problem

→Reduce

→(Foo,(1,0))

↓(The,1,0) close()

→Close

Kenneth Heafield (Google Inc)

→(The,(1,0)) Word Context Entropy

January 16, 2008

15 / 15

Word Context Entropy

Jan 16, 2008 - Problem. Word Weighting. Idea. Measure how specific a word is ... A computer scientist is someone who fixes things that aren't broken. I'm still ...

274KB Sizes 9 Downloads 282 Views

Recommend Documents

Improving Word Representations via Global Visual Context
Department of Electrical Engineering and Computer Science. University of Michagan [email protected]. Abstract. Visually grounded semantics is a very ...

Improving Word Representations via Global Visual Context
Department of Electrical Engineering and Computer Science ... In this work, we propose to use global visual context to help learn better word ... In this way, we are able to measure how global visual information contributes (or affects) .... best and

CONTEXT DEPENDENT WORD MODELING FOR ...
Good CDW units should be consistent and trainable. ... these two criteria to different degrees. A simple .... CDW based language models, a translation N-best list. (N=10) is .... [13] S. Chen, J. Goodman, “An Empirical Study of Smoothing Tech-.

pdf entropy
Page 1 of 1. File: Pdf entropy. Download now. Click here if your download doesn't start automatically. Page 1 of 1. pdf entropy. pdf entropy. Open. Extract.

Moment-entropy inequalities
MOMENT-ENTROPY INEQUALITIES. Erwin Lutwak, Deane Yang and Gaoyong Zhang. Department of Mathematics. Polytechnic University. Brooklyn, NY 11201.

Maximum-entropy model
Test set: 264 sentences. Noisy-channel. 63.3. 50.247.1. 75.3. 64.162.1. 80.9. 72.069.5. Maximum EntropyMaximum Entropy with Bottom-up. F-measure. Bigram F-measure. BLEU score. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. S. NP. VP. NP. PP. The apple on t

POSSIBLE ENTROPY FUNCTIONS Tomasz ...
By a topological dynamical system (X, T) we shall mean a compact metric space. X with a continuous map T : X ↦→ X. The set of all invariant measures of such.

Autism, Eye-Tracking, Entropy
Abstract— Using eye-tracking, we examine the scanning patterns of 2 year old and 4 year old toddlers with and without autism spectrum disorder as they view static images of faces. We use several measures, such as the entropy of scanning patterns, i

Entropy by Jeremy Rifkin.pdf
Sign in. Page. 1. /. 1. Loading… Page 1 of 1. Page 1 of 1. Entropy by Jeremy Rifkin.pdf. Entropy by Jeremy Rifkin.pdf. Open. Extract. Open with. Sign In. Main menu. Page 1 of 1.

MOMENT-ENTROPY INEQUALITIES Erwin Lutwak ...
probability and analytic convex geometry. In this paper we ..... The following lemma presents the solution to the problem of maximizing the λ-Rényi entropy when ...

Monotonic iterative algorithm for minimum-entropy autofocus
m,n. |zmn|2 ln |zmn|2 + ln Ez. (3) where the minimum-entropy phase estimate is defined as. ˆφ = arg min .... aircraft with a nose-mounted phased-array antenna.

Entropy, Compression, and Information Content
By comparing the literal translation to the more fluid English translation, we .... the Winzip utility to shrink a document before sending it over the internet, or if you.

WORD OPPOSITE RHYMING WORD
Aug 24, 2014 - Social Development. Let us ensure our children use magic words on a regular basis. Be a role model for our children and use these words yourself as well around our children. Magic words like : Please, Thank You, Sorry, Etc. Also, greet

From Context to Micro-context – Issues and ...
Sensorizing Smart Spaces for Assistive Living .... of smart home sensor data in such a manner as to meet critical timing ..... Ambient Assisted Living, (http://www.aaliance.eu/public/documents/aaliance-roadmap/aaliance-aal-roadmap.pdf). 25.

word by word pdf
Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. word by word pdf. word by wor

in context academia in context programming - Research at Google
Think Programming,. Not Statistics. R is a programming language designed to work with data ... language—human or computer—can be learned by taking a ... degree of graphics but ones that are not ... Workshops and online courses in R are.

context-aware communication - CiteSeerX
Context-aware computing applications examine and react to a ... aware communication applications from the research .... net office network, desktop computers, and office ...... 102–4. [9] K. Nagel et al., "The Family Intercom: Developing a Con-.

Context-Aware Computing Applications
have been prototyped on the PARCTAB, a wireless, palm-sized computer. .... with Scoreboard [15], an application that takes advantage of large (3x4 foot),.

POSSIBLE ENTROPY FUNCTIONS Tomasz ...
hand, we can now ignore some technical details responsible for minimality of the ..... of course, also a Toeplitz function (into an appropriate product alphabet), ...

Autism, Eye-Tracking, Entropy
we build an analytical methodology for eye-tracking analysis, using measures and tools requiring little computational muscle, yet which are still quite ... Software Systems Ltd.; and support from the Sloan Foundation. Generous research and ...