Exploiting the graphical model structure of latent Gaussian models Helen Ogden

Rn i∼j

i=1

where i ∼ j indicates a pair of players who compete against each other. This model is an example of a latent Gaussian model. We develop methods to approximate the likelihood, which may be used across this wider class of models.

Laplace approximation to the likelihood The Laplace approximation to the likelihood replaces the integrand n Y Y g(u; θ) = logit−1 {yij [β(xi − xj ) + σ(ui − uj )]} φ(ui ) i∼j

i=1

with a normal approximation: g˜ (u; θ) = cθ φ(u, µθ , Σθ ), giving ˜ y) ≈ cθ . In our example, the Laplace approximation should be L(θ; good if each player competes in a lot of matches, but may be poor if each player competes in a small number of matches.

Graphical models Graphical models are distributions for u = (u1, . . . un ) such that Y p(u) ∝ g(u) = gC (uC ). C

We represent this factorization structure on a dependence graph, with one node per variable, and an edge between any two variables contained within the same term of the factorization. Variables not joined by an edge are conditionally independent, given all the other variables. [email protected]

For a tree-structured pairwise competition model with n = 63 players, we approximate the difference in the loglikelihood at two points close to the MLE.

Variable elimination with continuous variables At each stage, we must compute g¯Ni (uNi ) = −∞ C:i∈C gC (uC )dui . For each fixed uNi , we can use numerical integration to approximate g¯Ni (uNi ). We want an approximate representation of the function g¯Ni (.), so we store g¯Ni (uNi ) at a fixed set of points uNi , and interpolate between those points. The approximate representation is based on the normal approximation used to construct the Laplace approximation, so the method is always at least as accurate as the Laplace approximation. See [3] for details. R∞ Q

1.8 1.6

Suppose we have a graphical model where each u ∈ {0, 1}, and we i P want to compute Z = u g(u). Naively, we could just sum 2n contributions. We can reduce that cost by exploiting the structure of the dependence graph. If the dependence graph is a tree (has no cycles), we can compute Z at cost linear in n. 1. Choose a ‘leaf‘ variable‘ i. 2. Sum over ui : it is involved in only one term j gij (ui , uj )P of the factorization. Compute g¯j (uj ) = ui gij (ui , uj ), and replace gij with i g¯j in the factorization. 3. The resulting marginal distribution for the remaining variables is a graphical model with j new tree-structured dependence graph. Repeat until all variables removed. This is a special case of the variable elimination algorithm. More generally, aP stepQof the algorithm involves computing g¯Ni (uNi ) = ui C:i∈C gC (uC ), at cost O(2|NI |), where Ni are the neighbours of i in the dependence graph. We get a factorization over a new dependence graph, with i removed and Ni joined together. See [2] for details.

1.4

Variable elimination with discrete variables

2.0

The likelihood for the pairwise competition model is the normalizing constant Z corresponding to a graphical model. The dependence graph has a node for each player, and an edge for each match. The Laplace approximation may fail if this dependence graph is sparse, so we focus on this case.

1.2

We observe binary matches Yij played between pairs of players, and model Pr(Yij = 1|λi , λj ) = logit−1(λi − λj ), where λi is a latent ‘ability‘ of player i. We have covariates xi for player i, and model λi = βxi + σui , where ui ∼ N(0, 1). The likelihood is Z Y n Y L(θ; y) = logit−1 {yij [β(xi − xj ) + σ(ui − uj )]} φ(ui )du,

Comparing likelihood approximations

1.0

Example: pairwise competition models

Rephrasing the problem

Difference in Loglikelihoods

The likelihood for a latent Gaussian model is a high-dimensional integral. We exploit the structure of the integrand to reduce the cost of computing an accurate approximation of the likelihood, by modifying methods designed for graphical models with discrete variables.

1

10

100

1000

10000

Time (seconds)

The black line is an importance sampling approximation, based on the same normal distribution used to construct the Laplace approximation. The red line is the approximation based on variable elimination, as the number of points used for storage increases.

Sparse but non-tree-like graphs Many latent Gaussian models have dependence graphs which are ‘close‘ to trees, and variable elimination works well. However, challenging cases remain: if we have a graphical model with binary variables, and a square lattice dependence√graph, the cost of computing Z with variable elimination is O(n2 n ). I have been working on a new approximation to Z in these cases, controlled by a parameter t. For a square lattice, the cost is O(n2t ), and the error shrinks quickly with t. I hope to combine this new approximation with approximate function storage to get better approximations to the likelihood in latent Gaussian models. See [1].

References [1] https://github.com/heogden/rgraphpass. [2] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. The MIT Press, 2009. [3] Helen E. Ogden. A sequential reduction method for inference in generalized linear mixed models. Electron. J. Statist., 9:135–152, 2015.

warwick.ac.uk/heogden

Exploiting the graphical structure of latent Gaussian ... - amlgm2015

We represent this factorization structure on a dependence graph, with one node per variable, and an edge between any two variables contained within the same ...

120KB Sizes 1 Downloads 208 Views

Recommend Documents

Exploiting the graphical structure of latent Gaussian ... - amlgm2015
[1] https://github.com/heogden/rgraphpass. [2] D. Koller and N. Friedman. ... for inference in generalized linear mixed models. Electron. J. Statist., 9:135–152, 2015.

Learning Structural Changes of Gaussian Graphical ...
from data, so as to gain novel insights into ... the structural changes from data can facilitate the gen- ..... validation, following steps specified in (Hastie et al.,.

Learning Structural Changes of Gaussian Graphical ...
The value of λ1 can be determined easily via cross- validation. In our experiments, we used 10-fold cross- validation, following steps specified in (Hastie et al.,.

Gaussian mixture modeling by exploiting the ...
This test is based on marginal cumulative distribution functions. ... data-sets and real ones. ..... splitting criterion can be considered simply as a transformation.

Gaussian mixture modeling by exploiting the ...
or more underlying Gaussian sources with common centers. If the common center .... j=1 hr q(xj) . (8). The E- and M-steps alternate until the conditional expectation .... it can be easily confused with other proofs for several types of Mahalanobis ..

Exploiting Structure for Tractable Nonconvex Optimization
Proceedings of the 31st International Conference on Machine. Learning, Beijing, China, 2014. JMLR: W&CP volume ..... Artificial Intel- ligence, 126(1-2):5–41, ...

Exploiting Problem Structure in Distributed Constraint ...
To provide communication facilities for rescue services, it may be neces- ...... In [PF03], agents try to repair constraint violations using interchangeable values ...... Supply chain management involves planning and coordinating a range of activ- ..

Exploiting Low-rank Structure for Discriminative Sub-categorization
recognition. We use the Office-Caltech dataset for object recognition and the IXMAS dataset for action recognition. LRLSE-LDAs based classifi- cation achieves ...

Discovering and Exploiting 3D Symmetries in Structure ...
degrees of gauge freedom can be held fixed in the bundle adjustment step. The similarity ... planes and Euclidean transformations that best explain these matches are ..... Int. Journal of Computer Vision, 60(2):91–110, 2004. [18] G. Loy and J.

Exploiting Low-rank Structure for Discriminative Sub ...
The transpose of a vector/matrix is denoted by using superscript . A = [aij] ∈. R m×n defines a matrix A with aij being its (i, j)-th element for i = 1,...,m and j = 1,...,n,.

Exploiting structure in large-scale electrical circuit and power system ...
such as the QR method [14] on laptop computers for up to a few thousands ..... 10. 20. 30. 40. 50. 60 nz = 1194. Fig. 3. Sparse Jacobian matrix (left) and dense ...

Exploiting Low-rank Structure for Discriminative Sub-categorization
1 Department of Computer Science,. University of Maryland,. College Park, USA. 2 Microsoft .... the top-K prediction scores from trained exemplar classifiers.

Exploiting Syntactic Structure for Natural Language ...
Assume we compare two models M1 and M2 they assign probability PM1(Wt) and PM2(Wt) ... A common choice is to use a finite set of words V and map any word not ... Indeed, as shown in 27], for a 3-gram model the coverage for the. (wijwi-2 ...

Why Gaussian macro-finance term structure models are
sional factor-structure in which the risk factors are both ..... errors for. Measurement errors for yield factors macro-variables. TSf. X. X. TSn. X. FVf. X. FVn. TSfm. X.

exploiting the tiger - Anachak
The Temple does not have such a licence but has, by its own records, bred at least 10 ... To be part of a conservation breeding programme, the genetic make-up and history of ..... Of the 11 tigers listed on the Temple's website in 2008, two have.

exploiting the tiger - Anachak
shown around the world on the Discovery Network), tourist numbers grew ... A mother and her young are the basic social unit occupying a territory. Males are .... All adult tigers are kept in separate pens, apart from the time each day when they.

POLYTOMOUS LATENT SCALES FOR THE INVESTIGATION OF THE ...
Jan 27, 2011 - We also propose a methodology for analyzing test data in an effort ...... containing few observations may be joined to gain statistical power ...

Exploiting the Unicast Functionality of the On ... - Semantic Scholar
Ad hoc networks are deployed in applications such as disaster recovery and dis- tributed collaborative computing, where routes are mostly mul- tihop and network ... the source must wait until a route is established before trans- mitting data. To elim

The Capacity Region of the Gaussian Cognitive Radio ...
MIMO-BC channels. IV. GENERAL DETERMINISTIC CHANNELS. For general deterministic cognitive channels defined in a manner similar to [13] we have the following outer bound: Theorem 4.1: The capacity region of a deterministic cogni- tive channel such tha

Exploiting the Unicast Functionality of the On ... - Semantic Scholar
any wired infrastructure support. In mobile ad hoc networks, unicast and multicast routing ... wireless network with no fixed wired infrastructure. Each host is a router and moves in an arbitrary manner. ..... channel reservation control frames for u

The role of latent inhibition in acquired predator ...
tween the two cues. For example, Chivers et al. (1996) dem- .... to the non-normality of the line cross data, we conducted nonparametric Mann–Whitney U tests to ...