Extracting Coactivated Features from Multiple Data Sets

Viewer
Transcript

Extracting Coactivated Features from Multiple Data Sets Michael U. Gutmann University of Helsinki

Aapo Hyvärinen University of Helsinki

[email protected]

[email protected]

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 1

Contents

Introduction Extraction of coactivated features

This talk is about a new method to find related features (structure) in multiple data sets.

Testing on artificial data

■ Application to real data Summary

■ ■ ■

Michael U. Gutmann – University of Helsinki

Background information on the extraction of related features from multiple data sets Explanation of the statistical model underlying our method Testing our method on artificial data Application to real data (here: natural images, in the paper: also brain imaging data)

ICANN 2011 - p. 2

Introduction

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 3

An example Data set 1

15 10

10

5

5

0

0

−5

−5

−10

−10

−15 −15

−10

−5

0

5

Data set 2

15

10

15

−15 −15

−10

−5

0

5

10

15

Goals: 1. Characterize each data set separately → Eigenvalues and eigenvectors of covariance matrices 2. Find relations between the two data sets

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 4

Correlation based method (1/3) Whitened data set 1 (normalized representation) 4

4

3

3

2

2

1

1

0

0

−1

−1

−2

−2

−3

−3

−4 −4

■ ■

After change of basis: data is still white

−2

0

2

4

−4 −4

−2

0

2

4

Whitening is defined up to a rotation Choose coordinate systems which best describe the relation between the two data sets 1 and 2

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 5

Correlation based method (2/3) Whitened data set 1 4

3

3

e1

2

Correlation between projections on ei

Whitened data set 2

4

3

e2

2

2 1

1

α1

α2

0

−1

−1

−2

−2

−3

−3

−4 −4

−4 −4

α1

1 0

0.5

0

0

−1 −2

−2

0

2

4

−0.5

−3 −2

0

2

4

−2

0

2

α2 ■

■

■

■

The x-coordinate of a data point in the coordinate system defined by ei is given by its projection on ei . Compute the correlation between the x-coordinates for different coordinate systems. Choose the coordinate systems for which the x-coordinates are most strongly correlated (here: correlation coefficient of ≈ 0.6). Described method is called Canonical Correlation Analysis (CCA).

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 6

Correlation based method (3/3) Data set 1

30 20

20

10

10

0

0

−10

−10

−20

−20

−30 −30

−20

−10

0

10

Whitened data set 1

Data set 2

30

20

−30 −30

30

−20

Whitened data set 2

6

6

4

4

2

2

−10

0

10

20

30

Correlation between projections on ei 0.08

3

0.06

2

0.04

e1

0

α1

−2

e2

−2

−4 −6 −6

α1

0

1

−2

0

2

4

6

−6 −6

0

0 −0.02

−1

α2

−0.04

−4

−4

0.02

−2

−4

−2

0

2

4

6

−0.06

−3 −2

0

2

−0.08

α2

→ Method does not seem to work here. Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 7

What does “not work” mean?

It means 1. that we did not find meaningful features within each data set 2. that we did not find any relation between the two data sets Whitened data set 1 6

6

Axis 1

4

2

2

0

0

−2

−2

−4

−4

−4

−2

0

2

4

10

6

−6 −6

Projection on axis 2

4

−6 −6

Blue: original data Red: data with same marginal distr.

Whitened data set 2

Axis 2 −4

−2

0

2

4

6

5

0

−5

−10 −10

−5

0

5

10

Projection on axis 1

⇓

⇓

Identify independent components Michael U. Gutmann – University of Helsinki

⇓

Identify coactivation ICANN 2011 - p. 8

In this talk . . .

Introduction ● CCA

I will present a method where

● Limitations ● New method Extraction of coactivated features Testing on artificial data Application to real data Summary

1. the features for each data set are maximally statistically independent 2. the features across the data sets tend to be jointly activated: they have statistically dependent variances 3. multiple data sets can be analyzed

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 9

Extraction of coactivated features

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 10

Statistical model underlying our method (1/2) ■ Introduction Extraction of coactivated features

■

Given n data sets, we assume that each set is formed by iid. observations of a random vector zi ∈ Rd . To model structure within a data set, we assume that

● Statistical model ● Learning

zi =

Testing on artificial data

d X

qik sik

(i = 1, . . . n)

k=1

Application to real data

■

qik : orthonormal, si1 , . . . , sid : statistically independent To model structure across the data sets, we assume that s1k , . . . , snk are statistically dependent. 6

6

4

4

q11

0

−2

−2

q12

q21

−4

−4

Michael U. Gutmann – University of Helsinki

5

q22

2

0

−6 −6

10

z22

z21

2

Blue: original data Red: data with same marginal distr.

s21

Summary

−4

−2

0

z11

2

4

6

−6 −6

−4

−2

0

z12

0

−5

2

4

6

−10 −10

−5

0

s11

5

10

ICANN 2011 - p. 11

Statistical model underlying our method (2/2) ■ Introduction

Dependency assumptions: The k-th sources from all the data sets share a common (latent) variance variable σk :

Extraction of coactivated features

s1k = σk s˜1k

● Statistical model

s2k = σk s˜2k

snk = σk s˜nk

...

● Learning Testing on artificial data

■

Application to real data Summary

■

The s˜1k , . . . , s˜nk are Gaussian (zero mean, possibly correlated). Choosing a prior for σk completes the model specification. Correlation of squares

s11

Linear correlation 1

1

0.8

0.8

Blue: original data Red: data with same marginal distr. 10

5 0.6

s21

s12

0.6

s21

0.4

s22

0.2

0.2

−5

0 0

s11

Michael U. Gutmann – University of Helsinki

0

0.4

s12

s21

s22

s11

s12

s21

s22

−10 −10

−5

0

s11

5

10

ICANN 2011 - p. 12

Applying the method – learning the parameters ■ Introduction Extraction of coactivated features

■

● Statistical model ● Learning Testing on artificial data

■

Application to real data Summary

The most interesting parameters of the model are the features qik . They can be learned by maximizing the log-likelihood ℓ(q11 , . . . , qnd ). For the special case of uncorrelated sources, ! n d T X XX 1 n ℓ(q1 , . . . , qd ) = (qik T zi (t))2 , Gk t=1 k=1

(1)

i=1

where Gk (u) = log

Z

2

u pσk (σk ) − 2 n exp 2σk (2πσk2 ) 2

dσk .

(2)

Equations show that the presented method is related to Independent Subspace Analysis (see paper)

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 13

Testing on artificial data

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 14

Simulation setup and goals

Introduction

■

Extraction of coactivated features Testing on artificial data ● Setup ● Results Application to real data Summary

■

Michael U. Gutmann – University of Helsinki

Setup ◆ Three data sets (n = 3) of dimension four (d = 4) ◆ No linear correlation in the sources si k ◆ Randomly chosen orthonormal mixing matrices Q1 , Q2 , Q3 ◆ 10000 observations ◆ Learning of the parameters by maximization of √ log-likelihood, with ad hoc nonlinearity G(u) = − 0.1 + u Quantities of interest ◆ Error in the mixing matrices ◆ Identification of the coupling

ICANN 2011 - p. 15

Results −3

−4.2

● Setup

Objective

Testing on artificial data

Squared estimation error

−4.3

Introduction Extraction of coactivated features

3

−4.4 −4.5

● Results

−4.6 Application to real data

−4.7

Summary

0.6

0.7 0.8 0.9 Correct coupling (fraction)

1

x 10

2.5 2 1.5 1 0.5 0.5

0.6 0.7 0.8 0.9 Correct coupling (fraction)

1

Results for one estimation problem. Optimization performed for 20 different initializations. ■ ■

■

Michael U. Gutmann – University of Helsinki

Correct coupling at maximum of the log-likelihood Presence of local maxima (not nice! but no catastrophe, see following slides) Learning the right mixing matrices without learning the right coupling seems possible. ICANN 2011 - p. 16

Application to real data

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 17

Setup

Introduction

■

Extraction of coactivated features Testing on artificial data

■

Application to real data

■

● Setup ● Results Summary

■

Michael U. Gutmann – University of Helsinki

Data set 1: 10000 image patches of size 25px × 25px, extracted at random locations from natural video data Data set 2: same image patches, 40ms later For each data set separately: whitening and dimension reduction (98% of variance retained) Learning of 50 features per data set by maximization of the log-likelihood of the model (same objective function as for the artificial data).

ICANN 2011 - p. 18

Learned features Our method: First data set

Second data set

Introduction Extraction of coactivated features Testing on artificial data Application to real data ● Setup ● Results Summary

Canonical correlation analysis: First data set

Michael U. Gutmann – University of Helsinki

Second data set

ICANN 2011 - p. 19

Summary

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 20

Summary ■ Introduction

Presented a new method to find related features (structure) in multiple data sets:

Extraction of coactivated features

1. The features for each data set are maximally statistically independent. 2. The features across the data sets tend to be jointly activated: they have statistically dependent variances. 3. Multiple data sets can be analyzed.

Testing on artificial data Application to real data Summary ● Summary

■

Michael U. Gutmann – University of Helsinki

In the paper: ◆ more theory (in particular, more on the relation to CCA) ◆ more simulations with natural images ◆ simulations with brain imaging data

ICANN 2011 - p. 21