Hung Bui SRI International

Dinh Phung, Svetha Venkatesh, Hai Phan Curtin University of Technology

Talk Outline

Why model permutations?

Distribution of random permutations

Hidden Permutation Model (HPM)

How to estimate HPM parameters?

How to perform approximate inference?

Experiments with location-based activity recognition

Why Model Permutations?

Permutations arise in many real-world problems

Usually, there is an unknown matching that needs to be recovered

Data association, information extraction from text, machine translation, activity recognition

Correspondence in data association Field-to-value matching in IR Word/phrase matching in machine translation A permutation is the simplest form of matching

Brute-force computation is at least O(n!)

Permutations in Activity Recognition

Many activities require carrying out a collection of substeps, each performed just once (or a repeated a small number of times)

Factors affecting ordering between steps:

AAAI travel = (get_approval, book_hotel, book_air_ticket, register, prepare_slides, do_travel) Ordering of steps is an unknown permutation that needs to be recovered

Strongly ordered: A enables B; A and B follow a timetable Weakly ordered: A performed before B out of habit Unordered: A performed before B by chance

Learning these ordering constraints from data can lead to better recognition performance

Permutations and Markov Models

Standard HMM does not enforce permutation constraints

xn = x1 ? xn = x2 ?. . .

Permutation constraints lead to awkward graphical models, since conditional independence is lost Need a more direct way of defining distribution on permutations

Distributions on Permutations

Let Per(n) = permutations of {1,2,…,n}

Multinomial over Per(n) requires n! parameters (Kirshner et al, ICML 2003)

Exponential Family

f : P er(n) → Rd : feature function λ ∈ Rd : natural parameters

Very general Few parameters

E.F. distribution on permutations

Pr(x | λ) = exp {f (x), λ − A(λ)} Log-partition function A(λ) = ln

x∈P er(n)

exp (f (x), λ)

Expensive

Exponential Family on Permutations (cont.)

What features to use?

Factors affecting ordering between activity steps: Strongly ordered: A enables B; A and B follow a timetable Weakly ordered: A performed before B out of habit Unordered: A performed before B by chance

Does step i appear before step j in x? −1 fij (x) = I{x−1 < x i j }

With no loss of information, keep only fij (x) for i < j

d=

n(n−1) 2

features (also num. parameters)

Exponential Family on Permutations (cont.)

Simplified densityforms Pr(x | λ) = exp

Pr(x | λ) = exp

−1 i

l

λij − A(λ)

λxl xk − A(λ)

Sum over all in-order pairs

Example x = (2 4 1 5 3)

λ2,4 + λ2,5 + λ2,3 + λ4,5 + λ1,5 + λ1,3

Some Properties

Swapping xi and xi+1

x′ = (x1 , . . . xi+1 , xi , . . . , xn ) −λx ,x ′ Pr(x |λ) e i i+1 if xi < xi+1 = Pr(x|λ) eλxi+1 ,xi if xi > xi+1 Cost of switching adjacent (i, j), i < j is eλij

Reverse permutation

x′ = (xn , xn−1 . . . x1 ) exp( i

const(λ)

Hidden Permutation Model

“Graphical Model”

Pr(x|λ)

Pr(ot |xt = i, η) = M ult(ηi )

Joint distribution

Pr(x, o|λ, η) = Pr(x|λ)

n

t=1

Pr(ot |xt , ηxt )

Max. Likelihood Estimation, Permutation Known

Log-likelihood function: L(λ, η) = ln P (x | λ) + ln P (o| x, η)

Optimize η

trivial (count frequency)

Optimize λ

Convex problem Derivative:

i appears before j ?

▽λij (L) = fij (x) −

fij (x)P (x | λ)

x

Pr( i appears before j)

Max. Likelihood Estimation, Permutation Unknown

Log-likelihood function l (λ, η) =

K

k=1

log

x

P (ok , x | λ, η)

Need to jointly optimize both λ, η ; Non-convex problem Can we use EM ? M-step to for λ does not have a closed form

Can try coordinate ascent:

Fix η and improve λ by one gradient step Fix λ and improve η by EM (now has closed form) Didn’t work as well as simple gradient ascent

Max. Likelihood Estimation, Permutation Unknown

Derivative for λ ▽λij (l) =

fij (x)P (x| o, λ, η) Pr( i appears before j given o)

x

−

fij (x)P (x | λ)

Pr( i appears before j)

x

Derivative for η

Avoid dealing with constraints by transforming to natural parameter for multinomial ▽ηiv (l) =

∈ o[v]}P (x | o, λ, η) I{x−1 i

x

− Pr(v|ηi )

Pr( i appears at one of v’s position(s) given o)

Approximate Inference via MCMC

Typical “inference” problem requires calculating an expectation.

Expectations can be approximated if we can generate sample from x ∼ Pr(x|λ)

How to draw random permutations?

Try a well-known MCMC idea

Start with a random initial permutation Randomly switch two positions Accept new permutation with probability

|λ) min PP(x (x|λ) , 1 ′

Location-Based Activity Recognition on Campus Detection Problem

Student Activity Routines (Permutation with Partial-Order Constraints)

X

Atomic Activities Atomic activities Banking Lecture 1 Lecture 2 Lecture 3 Lecture 4 Group meeting 1 Group meeting 2 Group meeting 3 Coffee Breakfast Lunch

X

Corresponding Locations

Physical locations Bank Watson theater Hayman theater Davis theater Jones theater Bookmark cafe, Library, CBS Library, CBS, Psychology Bld Angazi cafe, Psychology Bld TAV, Angazi cafe, Bookmark cafe TAV, Angazi cafe, Bookmark cafe TAV, Bookmark cafe

GPS “Places”

X

“Places” from GPS

Preprocessing Removal of points above a speed threshold Often missing precisely the samples we want! (e.g. buildings) Interpolation within a day and across days Clustered into groups to find significant places using DBSCAN

Detection Performance Activity 1

Activity 2

HMM KIR HPM HMM KIR HPM

TP 18.2 18.5 19.1 17.9 18.0 18.8

FP 19.5 2.0 4.1 4.4 0.7 0.4

Precision 48.3% 90.2% 82.3% 80.3% 96.3% 97.9%

Recall 91.0% 92.5% 95.5% 89.5% 90.5% 94.0%

In a long sequence of GPS “places”, detect occurrences of activity routine

Simulated Data, Supervised (Atomic Activities Given) Activity 1

Activity 2

NBC HMM KIR HPM NBC HMM KIR HPM

TP 16.6 18.3 18.3 19.1 17.1 17.7 18.1 18.5

FP 11.1 19.8 8.5 5.1 11.0 3.8 4.7 0.5

Precision 59.9% 48.0% 68.3 % 78.9% 60.9% 82.3% 79.4 % 97.4%

Simulated Data, Unsupervised

Recall 80.3% 91.5% 91.5% 95.5% 85.5% 88.5% 90.5% 92.5%

NBC HMM HPM

TP 6 8.5 9.8

FP 4 5.3 1.9

Precision 60% 61.6% 83.8%

Real Data, Unsupervised

Recall 60% 85% 98%

Conclusion

Modelling permutation is hard, but not impossible A general way to parameterize distribution over permutations using the exponential family If permutation is not observed, use the Hidden Permutation Model (HPM) Demonstrated better performance than other models that do not exploit permutation constraints, as well as naïve multinomial permutation model (Kirshner et al). Future work

Generalize to permutations with repetitions In supervised mode, a discriminative formulation similar to CRF might work better