Extracting Coactivated Features from Multiple Data Sets Michael U. Gutmann University of Helsinki

Aapo Hyvärinen University of Helsinki

[email protected]

[email protected]

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 1

Contents

Introduction Extraction of coactivated features

This talk is about a new method to find related features (structure) in multiple data sets.

Testing on artificial data

■ Application to real data Summary

■ ■ ■

Michael U. Gutmann – University of Helsinki

Background information on the extraction of related features from multiple data sets Explanation of the statistical model underlying our method Testing our method on artificial data Application to real data (here: natural images, in the paper: also brain imaging data)

ICANN 2011 - p. 2

Introduction

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 3

An example Data set 1

15 10

10

5

5

0

0

−5

−5

−10

−10

−15 −15

−10

−5

0

5

Data set 2

15

10

15

−15 −15

−10

−5

0

5

10

15

Goals: 1. Characterize each data set separately → Eigenvalues and eigenvectors of covariance matrices 2. Find relations between the two data sets

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 4

Correlation based method (1/3) Whitened data set 1 (normalized representation) 4

4

3

3

2

2

1

1

0

0

−1

−1

−2

−2

−3

−3

−4 −4

■ ■

After change of basis: data is still white

−2

0

2

4

−4 −4

−2

0

2

4

Whitening is defined up to a rotation Choose coordinate systems which best describe the relation between the two data sets 1 and 2

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 5

Correlation based method (2/3) Whitened data set 1 4

3

3

e1

2

Correlation between projections on ei

Whitened data set 2

4

3

e2

2

2 1

1

α1

α2

0

−1

−1

−2

−2

−3

−3

−4 −4

−4 −4

α1

1 0

0.5

0

0

−1 −2

−2

0

2

4

−0.5

−3 −2

0

2

4

−2

0

2

α2 ■







The x-coordinate of a data point in the coordinate system defined by ei is given by its projection on ei . Compute the correlation between the x-coordinates for different coordinate systems. Choose the coordinate systems for which the x-coordinates are most strongly correlated (here: correlation coefficient of ≈ 0.6). Described method is called Canonical Correlation Analysis (CCA).

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 6

Correlation based method (3/3) Data set 1

30 20

20

10

10

0

0

−10

−10

−20

−20

−30 −30

−20

−10

0

10

Whitened data set 1

Data set 2

30

20

−30 −30

30

−20

Whitened data set 2

6

6

4

4

2

2

−10

0

10

20

30

Correlation between projections on ei 0.08

3

0.06

2

0.04

e1

0

α1

−2

e2

−2

−4 −6 −6

α1

0

1

−2

0

2

4

6

−6 −6

0

0 −0.02

−1

α2

−0.04

−4

−4

0.02

−2

−4

−2

0

2

4

6

−0.06

−3 −2

0

2

−0.08

α2

→ Method does not seem to work here. Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 7

What does “not work” mean?

It means 1. that we did not find meaningful features within each data set 2. that we did not find any relation between the two data sets Whitened data set 1 6

6

Axis 1

4

2

2

0

0

−2

−2

−4

−4

−4

−2

0

2

4

10

6

−6 −6

Projection on axis 2

4

−6 −6

Blue: original data Red: data with same marginal distr.

Whitened data set 2

Axis 2 −4

−2

0

2

4

6

5

0

−5

−10 −10

−5

0

5

10

Projection on axis 1





Identify independent components Michael U. Gutmann – University of Helsinki



Identify coactivation ICANN 2011 - p. 8

In this talk . . .

Introduction ● CCA

I will present a method where

● Limitations ● New method Extraction of coactivated features Testing on artificial data Application to real data Summary

1. the features for each data set are maximally statistically independent 2. the features across the data sets tend to be jointly activated: they have statistically dependent variances 3. multiple data sets can be analyzed

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 9

Extraction of coactivated features

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 10

Statistical model underlying our method (1/2) ■ Introduction Extraction of coactivated features



Given n data sets, we assume that each set is formed by iid. observations of a random vector zi ∈ Rd . To model structure within a data set, we assume that

● Statistical model ● Learning

zi =

Testing on artificial data

d X

qik sik

(i = 1, . . . n)

k=1

Application to real data



qik : orthonormal, si1 , . . . , sid : statistically independent To model structure across the data sets, we assume that s1k , . . . , snk are statistically dependent. 6

6

4

4

q11

0

−2

−2

q12

q21

−4

−4

Michael U. Gutmann – University of Helsinki

5

q22

2

0

−6 −6

10

z22

z21

2

Blue: original data Red: data with same marginal distr.

s21

Summary

−4

−2

0

z11

2

4

6

−6 −6

−4

−2

0

z12

0

−5

2

4

6

−10 −10

−5

0

s11

5

10

ICANN 2011 - p. 11

Statistical model underlying our method (2/2) ■ Introduction

Dependency assumptions: The k-th sources from all the data sets share a common (latent) variance variable σk :

Extraction of coactivated features

s1k = σk s˜1k

● Statistical model

s2k = σk s˜2k

snk = σk s˜nk

...

● Learning Testing on artificial data



Application to real data Summary



The s˜1k , . . . , s˜nk are Gaussian (zero mean, possibly correlated). Choosing a prior for σk completes the model specification. Correlation of squares

s11

Linear correlation 1

1

0.8

0.8

Blue: original data Red: data with same marginal distr. 10

5 0.6

s21

s12

0.6

s21

0.4

s22

0.2

0.2

−5

0 0

s11

Michael U. Gutmann – University of Helsinki

0

0.4

s12

s21

s22

s11

s12

s21

s22

−10 −10

−5

0

s11

5

10

ICANN 2011 - p. 12

Applying the method – learning the parameters ■ Introduction Extraction of coactivated features



● Statistical model ● Learning Testing on artificial data



Application to real data Summary

The most interesting parameters of the model are the features qik . They can be learned by maximizing the log-likelihood ℓ(q11 , . . . , qnd ). For the special case of uncorrelated sources, ! n d T X XX 1 n ℓ(q1 , . . . , qd ) = (qik T zi (t))2 , Gk t=1 k=1

(1)

i=1

where Gk (u) = log

Z



2

u pσk (σk ) − 2 n exp 2σk (2πσk2 ) 2



dσk .

(2)

Equations show that the presented method is related to Independent Subspace Analysis (see paper)

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 13

Testing on artificial data

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 14

Simulation setup and goals

Introduction



Extraction of coactivated features Testing on artificial data ● Setup ● Results Application to real data Summary



Michael U. Gutmann – University of Helsinki

Setup ◆ Three data sets (n = 3) of dimension four (d = 4) ◆ No linear correlation in the sources si k ◆ Randomly chosen orthonormal mixing matrices Q1 , Q2 , Q3 ◆ 10000 observations ◆ Learning of the parameters by maximization of √ log-likelihood, with ad hoc nonlinearity G(u) = − 0.1 + u Quantities of interest ◆ Error in the mixing matrices ◆ Identification of the coupling

ICANN 2011 - p. 15

Results −3

−4.2

● Setup

Objective

Testing on artificial data

Squared estimation error

−4.3

Introduction Extraction of coactivated features

3

−4.4 −4.5

● Results

−4.6 Application to real data

−4.7

Summary

0.6

0.7 0.8 0.9 Correct coupling (fraction)

1

x 10

2.5 2 1.5 1 0.5 0.5

0.6 0.7 0.8 0.9 Correct coupling (fraction)

1

Results for one estimation problem. Optimization performed for 20 different initializations. ■ ■



Michael U. Gutmann – University of Helsinki

Correct coupling at maximum of the log-likelihood Presence of local maxima (not nice! but no catastrophe, see following slides) Learning the right mixing matrices without learning the right coupling seems possible. ICANN 2011 - p. 16

Application to real data

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 17

Setup

Introduction



Extraction of coactivated features Testing on artificial data



Application to real data



● Setup ● Results Summary



Michael U. Gutmann – University of Helsinki

Data set 1: 10000 image patches of size 25px × 25px, extracted at random locations from natural video data Data set 2: same image patches, 40ms later For each data set separately: whitening and dimension reduction (98% of variance retained) Learning of 50 features per data set by maximization of the log-likelihood of the model (same objective function as for the artificial data).

ICANN 2011 - p. 18

Learned features Our method: First data set

Second data set

Introduction Extraction of coactivated features Testing on artificial data Application to real data ● Setup ● Results Summary

Canonical correlation analysis: First data set

Michael U. Gutmann – University of Helsinki

Second data set

ICANN 2011 - p. 19

Summary

Michael U. Gutmann – University of Helsinki

ICANN 2011 - p. 20

Summary ■ Introduction

Presented a new method to find related features (structure) in multiple data sets:

Extraction of coactivated features

1. The features for each data set are maximally statistically independent. 2. The features across the data sets tend to be jointly activated: they have statistically dependent variances. 3. Multiple data sets can be analyzed.

Testing on artificial data Application to real data Summary ● Summary



Michael U. Gutmann – University of Helsinki

In the paper: ◆ more theory (in particular, more on the relation to CCA) ◆ more simulations with natural images ◆ simulations with brain imaging data

ICANN 2011 - p. 21

Extracting Coactivated Features from Multiple Data Sets

s Background information on the extraction of related features from multiple data sets s Explanation of the statistical model underlying our method s Testing our ...

6MB Sizes 0 Downloads 238 Views

Recommend Documents

Extracting Coactivated Features from Multiple Data Sets
data sets. The coupling takes the form of coactivation (dependencies of ..... Comparison of the permutation matrices allows to assess the estimated coupling.

Working with Panel Data: Extracting Value from Multiple ... - SAS Support
where i denotes the individual and t is any one of T time points. ... software. This paper demonstrates several available methods, gives details about each ... model because it is commonly found in the literature, regardless of field of study. This.

Interoperability with multiple instruction sets
Feb 1, 2002 - 712/209,. 712/210. See application ?le for complete search history. ..... the programmer speci?es the sorting order is to pass the address of a ...

Interoperability with multiple instruction sets
Feb 1, 2002 - ABSTRACT. Data processing apparatus comprising: a processor core hav ing means for executing successive program instruction. Words of a ...

Noise reduction in multiple-echo data sets using ...
Abstract. A method is described for denoising multiple-echo data sets using singular value decomposition (SVD). .... The fact that it is the result of a meaningful optimization and has .... (General Electric Healthcare, Milwaukee, WI, USA) using.

On Extracting Knowledge from the Data Warehouse for ...
Towards Identifying Representative Characteristics of Web Services. Compositions ..... good design model for composite services needs to strike a balance ... International Computer Software and Applications Conference. Dallas, USA. 2003.

Extracting temporal EEG features with BCIpy
Our research analyzes EEG data collected by Chang et al. [1] while students ... 2 BCIpy uses the Pandas data analysis library [5,6], for the data structures and func- tions it provides, and .... during visual processing and before vocalization of the

Extracting Collocations from Text Corpora - Semantic Scholar
1992) used word collocations as features to auto- matically discover similar nouns of a ..... training 0.07, work 0.07, standard 0.06, ban 0.06, restriction 0.06, ...

Extracting Protein-Protein Interactions from ... - Semantic Scholar
statistical methods for mining knowledge from texts and biomedical data mining. ..... the Internet with the keyword “protein-protein interaction”. Corpuses I and II ...

Extracting Protein-Protein Interactions from ... - Semantic Scholar
Existing statistical approaches to this problem include sliding-window methods (Bakiri and Dietterich, 2002), hidden Markov models (Rabiner, 1989), maximum ..... MAP estimation methods investigated in speech recognition experiments (Iyer et al.,. 199

Panoramic Mesh Model Generation from Multiple Range Data for ...
tition the input point cloud to sub-point clouds according to each camera's ... eling the real scene, however mesh modeling from 3D data is commonly needed to.

Data Fusion: Resolving Conflicts from Multiple ... - Research at Google
Data fusion aims at resolving conflicts from different sources and find values that ... on this analysis, we describe Bayesian models that compute the probability of.

path planning for multiple features based localization - Irisa
formation gain. We adopt here a ..... ular grid is considered and one path is a se- quence of .... all, we derived an information gain as the deter- minant of the ...

path planning for multiple features based localization - Irisa
path planning, Cram`er Rao Bound, map-based localization, dynamic programming. ... A feature map of ..... The same reasoning leads to the integration of a.

Combining multiple features for text-independent writer ...
A writer verification system compares two hand- ... vs text-independentmethods. ... Database. Identification. Writer 1. Writer n system. Query sample. Figure 1. ..... System. A web browser can then be used to visualize these HTML files. For a ...

Reasoning with Large Data Sets
Framework consisting of a collection of components which cover various aspects of ..... IRIS is an open source project developed under LGPL and available at:.

Simulated and Experimental Data Sets ... - Semantic Scholar
Jan 4, 2006 - For simplicity of presentation, we report only the results of apply- ing statistical .... identify the correct synergies with good fidelity for data sets.

Reasoning with Large Data Sets
query execution plan as well as in memory, storage and recovery man- agement. ... This technique will be further extended for data distributed over many disks ...

1 Visibility Data & AIPS++ Measurement Sets - GitHub
you screw up, restore it with: $ cd ~/Workshop2007 ... cp -a (/net/birch)/data/oms/Workshop2007/demo.MS . ... thus, “skeleton”: we ignore the data in the MS.

Simulated and Experimental Data Sets ... - Semantic Scholar
Jan 4, 2006 - and interact with a highly complex, multidimensional environ- ment. ... Note that this definition of a “muscle synergy,” although common ... structure of the data (Schmidt et al. 1979 .... linear dependency between the activation co

Horizontal Aggregations in SQL to Prepare Data Sets for Data ...
Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis..pdf. Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining ...

PROCESSING THEMATIC FEATURES: FROM ...
words (Dressler, 2006), such as truck + driver > truckdriver. Given its .... πλύ-σιμο [plí-simo] 'washing', κατακτη-τής [katakti-tís] 'conqueror'. The experiment ...

Unsupervised Features Extraction from Asynchronous ...
Now for many applications, especially those involving motion processing, successive ... 128x128 AER retina data in near real-time on a standard desktop CPU.

Extracting data from a graph using grabit MATLAB file.pdf
And load the graph from which you want to extract data. Page 3 of 8. Extracting data from a graph using grabit MATLAB file.pdf. Extracting data from a graph ...