Hidden Permutation Model and Location-Based Activity Recognition
Hung Bui SRI International
Dinh Phung, Svetha Venkatesh, Hai Phan Curtin University of Technology
Talk Outline
Why model permutations?
Distribution of random permutations
Hidden Permutation Model (HPM)
How to estimate HPM parameters?
How to perform approximate inference?
Experiments with location-based activity recognition
Why Model Permutations?
Permutations arise in many real-world problems
Usually, there is an unknown matching that needs to be recovered
Data association, information extraction from text, machine translation, activity recognition
Correspondence in data association Field-to-value matching in IR Word/phrase matching in machine translation A permutation is the simplest form of matching
Brute-force computation is at least O(n!)
Permutations in Activity Recognition
Many activities require carrying out a collection of substeps, each performed just once (or a repeated a small number of times)
Factors affecting ordering between steps:
AAAI travel = (get_approval, book_hotel, book_air_ticket, register, prepare_slides, do_travel) Ordering of steps is an unknown permutation that needs to be recovered
Strongly ordered: A enables B; A and B follow a timetable Weakly ordered: A performed before B out of habit Unordered: A performed before B by chance
Learning these ordering constraints from data can lead to better recognition performance
Permutations and Markov Models
Standard HMM does not enforce permutation constraints
xn = x1 ? xn = x2 ?. . .
Permutation constraints lead to awkward graphical models, since conditional independence is lost Need a more direct way of defining distribution on permutations
Distributions on Permutations
Let Per(n) = permutations of {1,2,…,n}
Multinomial over Per(n) requires n! parameters (Kirshner et al, ICML 2003)
Exponential Family
f : P er(n) → Rd : feature function λ ∈ Rd : natural parameters
Very general Few parameters
E.F. distribution on permutations
Pr(x | λ) = exp {f (x), λ − A(λ)} Log-partition function A(λ) = ln
x∈P er(n)
exp (f (x), λ)
Expensive
Exponential Family on Permutations (cont.)
What features to use?
Factors affecting ordering between activity steps: Strongly ordered: A enables B; A and B follow a timetable Weakly ordered: A performed before B out of habit Unordered: A performed before B by chance
Does step i appear before step j in x? −1 fij (x) = I{x−1 < x i j }
With no loss of information, keep only fij (x) for i < j
d=
n(n−1) 2
features (also num. parameters)
Exponential Family on Permutations (cont.)
Simplified densityforms Pr(x | λ) = exp
Pr(x | λ) = exp
−1 i
l
λij − A(λ)
λxl xk − A(λ)
Sum over all in-order pairs
Example x = (2 4 1 5 3)
λ2,4 + λ2,5 + λ2,3 + λ4,5 + λ1,5 + λ1,3
Some Properties
Swapping xi and xi+1
x′ = (x1 , . . . xi+1 , xi , . . . , xn ) −λx ,x ′ Pr(x |λ) e i i+1 if xi < xi+1 = Pr(x|λ) eλxi+1 ,xi if xi > xi+1 Cost of switching adjacent (i, j), i < j is eλij
Reverse permutation
x′ = (xn , xn−1 . . . x1 ) exp( i
const(λ)
Hidden Permutation Model
“Graphical Model”
Pr(x|λ)
Pr(ot |xt = i, η) = M ult(ηi )
Joint distribution
Pr(x, o|λ, η) = Pr(x|λ)
n
t=1
Pr(ot |xt , ηxt )
Max. Likelihood Estimation, Permutation Known
Log-likelihood function: L(λ, η) = ln P (x | λ) + ln P (o| x, η)
Optimize η
trivial (count frequency)
Optimize λ
Convex problem Derivative:
i appears before j ?
▽λij (L) = fij (x) −
fij (x)P (x | λ)
x
Pr( i appears before j)
Max. Likelihood Estimation, Permutation Unknown
Log-likelihood function l (λ, η) =
K
k=1
log
x
P (ok , x | λ, η)
Need to jointly optimize both λ, η ; Non-convex problem Can we use EM ? M-step to for λ does not have a closed form
Can try coordinate ascent:
Fix η and improve λ by one gradient step Fix λ and improve η by EM (now has closed form) Didn’t work as well as simple gradient ascent
Max. Likelihood Estimation, Permutation Unknown
Derivative for λ ▽λij (l) =
fij (x)P (x| o, λ, η) Pr( i appears before j given o)
x
−
fij (x)P (x | λ)
Pr( i appears before j)
x
Derivative for η
Avoid dealing with constraints by transforming to natural parameter for multinomial ▽ηiv (l) =
∈ o[v]}P (x | o, λ, η) I{x−1 i
x
− Pr(v|ηi )
Pr( i appears at one of v’s position(s) given o)
Approximate Inference via MCMC
Typical “inference” problem requires calculating an expectation.
Expectations can be approximated if we can generate sample from x ∼ Pr(x|λ)
How to draw random permutations?
Try a well-known MCMC idea
Start with a random initial permutation Randomly switch two positions Accept new permutation with probability
|λ) min PP(x (x|λ) , 1 ′
Location-Based Activity Recognition on Campus Detection Problem
Student Activity Routines (Permutation with Partial-Order Constraints)
X
Atomic Activities Atomic activities Banking Lecture 1 Lecture 2 Lecture 3 Lecture 4 Group meeting 1 Group meeting 2 Group meeting 3 Coffee Breakfast Lunch
X
Corresponding Locations
Physical locations Bank Watson theater Hayman theater Davis theater Jones theater Bookmark cafe, Library, CBS Library, CBS, Psychology Bld Angazi cafe, Psychology Bld TAV, Angazi cafe, Bookmark cafe TAV, Angazi cafe, Bookmark cafe TAV, Bookmark cafe
GPS “Places”
X
“Places” from GPS
Preprocessing Removal of points above a speed threshold Often missing precisely the samples we want! (e.g. buildings) Interpolation within a day and across days Clustered into groups to find significant places using DBSCAN
Detection Performance Activity 1
Activity 2
HMM KIR HPM HMM KIR HPM
TP 18.2 18.5 19.1 17.9 18.0 18.8
FP 19.5 2.0 4.1 4.4 0.7 0.4
Precision 48.3% 90.2% 82.3% 80.3% 96.3% 97.9%
Recall 91.0% 92.5% 95.5% 89.5% 90.5% 94.0%
In a long sequence of GPS “places”, detect occurrences of activity routine
Simulated Data, Supervised (Atomic Activities Given) Activity 1
Activity 2
NBC HMM KIR HPM NBC HMM KIR HPM
TP 16.6 18.3 18.3 19.1 17.1 17.7 18.1 18.5
FP 11.1 19.8 8.5 5.1 11.0 3.8 4.7 0.5
Precision 59.9% 48.0% 68.3 % 78.9% 60.9% 82.3% 79.4 % 97.4%
Simulated Data, Unsupervised
Recall 80.3% 91.5% 91.5% 95.5% 85.5% 88.5% 90.5% 92.5%
NBC HMM HPM
TP 6 8.5 9.8
FP 4 5.3 1.9
Precision 60% 61.6% 83.8%
Real Data, Unsupervised
Recall 60% 85% 98%
Conclusion
Modelling permutation is hard, but not impossible A general way to parameterize distribution over permutations using the exponential family If permutation is not observed, use the Hidden Permutation Model (HPM) Demonstrated better performance than other models that do not exploit permutation constraints, as well as naïve multinomial permutation model (Kirshner et al). Future work
Generalize to permutations with repetitions In supervised mode, a discriminative formulation similar to CRF might work better