Hidden Permutation Model and Location-Based Activity Recognition

Hung Bui SRI International

Dinh Phung, Svetha Venkatesh, Hai Phan Curtin University of Technology

Talk Outline 

Why model permutations?



Distribution of random permutations



Hidden Permutation Model (HPM)



How to estimate HPM parameters?



How to perform approximate inference?



Experiments with location-based activity recognition

Why Model Permutations? 

Permutations arise in many real-world problems 



Usually, there is an unknown matching that needs to be recovered    



Data association, information extraction from text, machine translation, activity recognition

Correspondence in data association Field-to-value matching in IR Word/phrase matching in machine translation A permutation is the simplest form of matching

Brute-force computation is at least O(n!)

Permutations in Activity Recognition 

Many activities require carrying out a collection of substeps, each performed just once (or a repeated a small number of times) 





Factors affecting ordering between steps:   



AAAI travel = (get_approval, book_hotel, book_air_ticket, register, prepare_slides, do_travel) Ordering of steps is an unknown permutation that needs to be recovered

Strongly ordered: A enables B; A and B follow a timetable Weakly ordered: A performed before B out of habit Unordered: A performed before B by chance

Learning these ordering constraints from data can lead to better recognition performance

Permutations and Markov Models 

Standard HMM does not enforce permutation constraints

xn = x1 ? xn = x2 ?. . .





Permutation constraints lead to awkward graphical models, since conditional independence is lost Need a more direct way of defining distribution on permutations

Distributions on Permutations 

Let Per(n) = permutations of {1,2,…,n} 



Multinomial over Per(n) requires n! parameters (Kirshner et al, ICML 2003)

Exponential Family

f : P er(n) → Rd : feature function λ ∈ Rd : natural parameters

Very general Few parameters

E.F. distribution on permutations

Pr(x | λ) = exp {f (x), λ − A(λ)} Log-partition function   A(λ) = ln 

x∈P er(n)

exp (f (x), λ)

Expensive

Exponential Family on Permutations (cont.) 

What features to use?    



Factors affecting ordering between activity steps: Strongly ordered: A enables B; A and B follow a timetable Weakly ordered: A performed before B out of habit Unordered: A performed before B by chance

Does step i appear before step j in x? −1 fij (x) = I{x−1 < x i j }



With no loss of information, keep only fij (x) for i < j

d=

n(n−1) 2

features (also num. parameters)

Exponential Family on Permutations (cont.) 

Simplified densityforms  Pr(x | λ) = exp 

Pr(x | λ) = exp







−1 i


l
 λij − A(λ)

λxl xk − A(λ)



Sum over all in-order pairs

Example x = (2 4 1 5 3)

λ2,4 + λ2,5 + λ2,3 + λ4,5 + λ1,5 + λ1,3

Some Properties 

Swapping xi and xi+1

x′ = (x1 , . . . xi+1 , xi , . . . , xn )  −λx ,x ′ Pr(x |λ) e i i+1 if xi < xi+1 = Pr(x|λ) eλxi+1 ,xi if xi > xi+1 Cost of switching adjacent (i, j), i < j is eλij 

Reverse permutation

x′ = (xn , xn−1 . . . x1 )  exp( i
const(λ)

Hidden Permutation Model 

“Graphical Model”

Pr(x|λ)

Pr(ot |xt = i, η) = M ult(ηi ) 

Joint distribution

Pr(x, o|λ, η) = Pr(x|λ)

n 

t=1

Pr(ot |xt , ηxt )

Max. Likelihood Estimation, Permutation Known 

Log-likelihood function: L(λ, η) = ln P (x | λ) + ln P (o| x, η)



Optimize η 



trivial (count frequency)

Optimize λ  

Convex problem Derivative:

i appears before j ?

▽λij (L) = fij (x) −



fij (x)P (x | λ)

x

Pr( i appears before j)

Max. Likelihood Estimation, Permutation Unknown 

Log-likelihood function l (λ, η) =

K 

k=1

 

log

  x



P (ok , x | λ, η)

Need to jointly optimize both λ, η ; Non-convex problem Can we use EM ?  M-step to for λ does not have a closed form 

Can try coordinate ascent:   

Fix η and improve λ by one gradient step Fix λ and improve η by EM (now has closed form) Didn’t work as well as simple gradient ascent

Max. Likelihood Estimation, Permutation Unknown 

Derivative for λ ▽λij (l) =



fij (x)P (x| o, λ, η) Pr( i appears before j given o)

x



fij (x)P (x | λ)

Pr( i appears before j)

x



Derivative for η 

Avoid dealing with constraints by transforming to natural parameter for multinomial ▽ηiv (l) =



∈ o[v]}P (x | o, λ, η) I{x−1 i

x

− Pr(v|ηi )

Pr( i appears at one of v’s position(s) given o)

Approximate Inference via MCMC 

Typical “inference” problem requires calculating an expectation.



Expectations can be approximated if we can generate sample from x ∼ Pr(x|λ)



How to draw random permutations? 

Try a well-known MCMC idea   

Start with a random initial permutation Randomly switch two positions Accept new permutation with probability



 |λ) min PP(x (x|λ) , 1 ′

Location-Based Activity Recognition on Campus Detection Problem

Student Activity Routines (Permutation with Partial-Order Constraints)

X

Atomic Activities Atomic activities Banking Lecture 1 Lecture 2 Lecture 3 Lecture 4 Group meeting 1 Group meeting 2 Group meeting 3 Coffee Breakfast Lunch

X

Corresponding Locations

Physical locations Bank Watson theater Hayman theater Davis theater Jones theater Bookmark cafe, Library, CBS Library, CBS, Psychology Bld Angazi cafe, Psychology Bld TAV, Angazi cafe, Bookmark cafe TAV, Angazi cafe, Bookmark cafe TAV, Bookmark cafe

GPS “Places”

X

“Places” from GPS 

Preprocessing  Removal of points above a speed threshold  Often missing precisely the samples we want! (e.g. buildings)  Interpolation within a day and across days  Clustered into groups to find significant places using DBSCAN

Detection Performance Activity 1

Activity 2

HMM KIR HPM HMM KIR HPM

TP 18.2 18.5 19.1 17.9 18.0 18.8

FP 19.5 2.0 4.1 4.4 0.7 0.4

Precision 48.3% 90.2% 82.3% 80.3% 96.3% 97.9%

Recall 91.0% 92.5% 95.5% 89.5% 90.5% 94.0%

In a long sequence of GPS “places”, detect occurrences of activity routine

Simulated Data, Supervised (Atomic Activities Given) Activity 1

Activity 2

NBC HMM KIR HPM NBC HMM KIR HPM

TP 16.6 18.3 18.3 19.1 17.1 17.7 18.1 18.5

FP 11.1 19.8 8.5 5.1 11.0 3.8 4.7 0.5

Precision 59.9% 48.0% 68.3 % 78.9% 60.9% 82.3% 79.4 % 97.4%

Simulated Data, Unsupervised

Recall 80.3% 91.5% 91.5% 95.5% 85.5% 88.5% 90.5% 92.5%

NBC HMM HPM

TP 6 8.5 9.8

FP 4 5.3 1.9

Precision 60% 61.6% 83.8%

Real Data, Unsupervised

Recall 60% 85% 98%

Conclusion  







Modelling permutation is hard, but not impossible A general way to parameterize distribution over permutations using the exponential family If permutation is not observed, use the Hidden Permutation Model (HPM) Demonstrated better performance than other models that do not exploit permutation constraints, as well as naïve multinomial permutation model (Kirshner et al). Future work  

Generalize to permutations with repetitions In supervised mode, a discriminative formulation similar to CRF might work better

## x

Curtin University of Technology ... Data association, information extraction from text, machine translation .... Group meeting 1 Bookmark cafe, Library, CBS.

#### Recommend Documents

x 6 = x 8 = x 10 = x 12 = x 6 = x 8 = x 10 = x 12
All of the above. 10b. The table below shows the total cost of potatoes, y, based on the number of pounds purchased, x. Number of. Pounds. Total Cost. 2. \$3.00. 4. \$6.00. 7. \$10.50. 10. \$15.00. Which of the following shows the correct keystrokes to e

x
(q0, x0) which we call the Aubinproperty and also through the lower semicontinuity of L around (q0 .... Our interest will center especially on the critical cone associated with (2) for ..... We start by recording a background fact about our underlyin

9 x 10 4 x 10 2 x 10 0 x 10 3 x 10 8 x 10 11 x 10 7 x 10 1 ...
Tens TIME: (2 minutes) (90 seconds) (75 seconds). 9 x 10. 4 x 10. 2 x 10. 0 x 10. 3 x 10. 8 x 10. 11 x 10. 7 x 10. 1 x 10. 10 x 10. 5 x 10. 12 x 10. 6 x 10. 3 x 10. 8.

x = f(x; _x) + G(x)u - IEEE Xplore
Apr 2, 2010 - Composite Adaptation for Neural. Network-Based Controllers. Parag M. Patre, Shubhendu Bhasin, Zachary D. Wilcox, and. Warren E. Dixon.

720p 720p = 1280 x 720 1080p = 1920 x 1080 - TechSmith
For HD versions, make sure you are editing and producing in HD dimensions. YouTube will not artificially upscale videos with smaller dimensions. HD versions.

71.9 x 57.5 x 1.3 cm
76 1/2 x 81 x 4 3/4 in. (194.3 x 205.7 x 12.1 cm). Center panel: 53 3/4 x. 27 in. (136.5 x 68.6 cm). Side panels: 49 x 24 in. (124.5 x 61 cm). Madonna and Child.

58 :/X
accounting standards and techniques to meet the educational requirements of ... Accountancy, Business and Management (ABM) Cup 2017, on March 3, 2017.

engine-x
/etc/nginx/sites-enabled/default server{ listen 127.0.0.1:80; server_name basic; access_log /var/log/nginx/basic.access.log; error_log /var/log/nginx/basic.error.log; location / { root /var/www/basic; index index.html index.htm;. }.

720p 720p = 1280 x 720 1080p = 1920 x 1080 - TechSmith

return x + value - GitHub
movl %esi, (%rdi) retq. __ZN4Plus6plusmeEi: ... static code and data auto plus = Plus(1); ... without garbage collection using object = std::map;.

x
and resources. Objective 6 - Improve the Agency's information security posture. ... customer self-service application at this time. Base. Microsoft Office 2007. Upgrade ... To implement a web-site that provides the public ..... What is to be accompli

X-Station.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. X-Station.pdf.

X-Guide.pdf
You don't miss anything by skipping. those Episodes as they don't affect the central Theme (Mythology Arc) at. all! So nothing to worry! Skipping is fine!

X-Guide.pdf
Page 1 of 19. by Gurjit Singh. created March 2016. DON'T WORRY! THERE ARE NO SPOILERS IN THIS LONG DOCUMENT! I PROMISE! IT'S SAFE!