Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Introduction to Kernel Methods Fabio A. Gonz´alez Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogot´ a

September 30, 2010

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction Motivation

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction Motivation

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods F. Gonz´ alez Introduction Motivation

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Problem 1 How to separate these two classes using a linear function?

Introduction to Kernel Methods

Problem 2

F. Gonz´ alez Introduction Motivation

The Kernel Trick

How to do symbolic regression?

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Σ = {A, C , G, T } f :

Σd ACGTA GTCCA GGTAC CCTGA .. .

→ R 7 → 10.0 7 → 11.3 7 → 1.0 7 → 4.5 .. .. . .

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Problem 1 • How to separate these two classes using a linear function?

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Map to R3 :

φ : R2 → R3 (x , y) 7→ (x 2 , y 2 , xy)

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Map to R3 :

φ : R2 → R3 (x , y) 7→ (x 2 , y 2 , xy)

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Input space vs. feature space

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Dot product in the feature space

F. Gonz´ alez Introduction



The Kernel Trick

φ : R2 → R3

√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )

Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data



hφ(x ), φ(z )i =

D

E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )

= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that

k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to

calculate the dot product in the feature space!

Introduction to Kernel Methods

Dot product in the feature space

F. Gonz´ alez Introduction



The Kernel Trick

φ : R2 → R3

√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )

Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data



hφ(x ), φ(z )i =

D

E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )

= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that

k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to

calculate the dot product in the feature space!

Introduction to Kernel Methods

Dot product in the feature space

F. Gonz´ alez Introduction



The Kernel Trick

φ : R2 → R3

√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )

Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data



hφ(x ), φ(z )i =

D

E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )

= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that

k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to

calculate the dot product in the feature space!

Introduction to Kernel Methods

Dot product in the feature space

F. Gonz´ alez Introduction



The Kernel Trick

φ : R2 → R3

√ (x1 , x2 ) 7→ (x12 , x22 , 2x1 x2 )

Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data



hφ(x ), φ(z )i =

D

E √ √ (x12 , x22 , 2x1 x2 ), (z12 , z22 , 2z1 z2 )

= x12 z12 + x22 z22 + 2x1 x2 z1 z2 = (x1 z1 + x2 z2 )2 = hx , z i2 • A function k : X × X → R such that

k (x , z ) = hφ(x ), φ(z )i is called a kernel • Morale: you don’t need to apply φ explicitly to

calculate the dot product in the feature space!

Introduction to Kernel Methods

Kernel induced feature space

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• The feature space induced by the kernel is not unique:

The kernel k (x , z ) = hx , z i2 also calculates the dot product in the four dimensional feature space: φ : R2 → R4 (x1 , x2 ) 7→ (x12 , x22 , x1 x2 , x2 x1 ) • The example can be generalised to Rn

Introduction to Kernel Methods

Kernel induced feature space

F. Gonz´ alez Introduction The Kernel Trick Mapping the input space to the feature space Calculating the dot product in the feature space

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• The feature space induced by the kernel is not unique:

The kernel k (x , z ) = hx , z i2 also calculates the dot product in the four dimensional feature space: φ : R2 → R4 (x1 , x2 ) 7→ (x12 , x22 , x1 x2 , x2 x1 ) • The example can be generalised to Rn

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

The Process

Introduction to Kernel Methods

The Approach

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Data items are embedded into a vector space called the

A Kernel Pattern Analysis Algorithm

• Linear relations are sought among the images of the data

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

feature space items in the feature space • The pattern analysis algorithm are based only on the

pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be

efficiently calculated using a kernel function

Introduction to Kernel Methods

The Approach

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Data items are embedded into a vector space called the

A Kernel Pattern Analysis Algorithm

• Linear relations are sought among the images of the data

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

feature space items in the feature space • The pattern analysis algorithm are based only on the

pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be

efficiently calculated using a kernel function

Introduction to Kernel Methods

The Approach

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Data items are embedded into a vector space called the

A Kernel Pattern Analysis Algorithm

• Linear relations are sought among the images of the data

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

feature space items in the feature space • The pattern analysis algorithm are based only on the

pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be

efficiently calculated using a kernel function

Introduction to Kernel Methods

The Approach

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Data items are embedded into a vector space called the

A Kernel Pattern Analysis Algorithm

• Linear relations are sought among the images of the data

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

feature space items in the feature space • The pattern analysis algorithm are based only on the

pairwise dot products, they do not need the actual coordinates of the embedded points • The pairwise dot products in the feature space could be

efficiently calculated using a kernel function

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Problem definition

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Given a training set S = {(x1 , y1 ), . . . , (xl , yl )} of points

xi ∈ Rn with corresponding labels yi ∈ R the problem is to find a real-valued linear function that best interpolates the training set: g(x) = hw, xi = w0 x =

A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions

wi xi

i=1

• If the data points were generated by a function like g(x),

it is possible to find the parameters w by solving Xw = y

Kernel Algorithms Kernels in Complex Structured Data

n X

where

 x0 1   X =  ...  x0 l 

Introduction to Kernel Methods

Problem definition

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning

• Given a training set S = {(x1 , y1 ), . . . , (xl , yl )} of points

xi ∈ Rn with corresponding labels yi ∈ R the problem is to find a real-valued linear function that best interpolates the training set: g(x) = hw, xi = w0 x =

A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions

wi xi

i=1

• If the data points were generated by a function like g(x),

it is possible to find the parameters w by solving Xw = y

Kernel Algorithms Kernels in Complex Structured Data

n X

where

 x0 1   X =  ...  x0 l 

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Graphical representation

Introduction to Kernel Methods

Loss function

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Minimize

L(g, S ) = L(w, S ) =

l X

(yi − g(xi ))2 =

i=1

=

l X

l X i=1

L(g, (xi , yi ))

i=1

• This could be written as

L(w, S ) = kξk2 = (y − Xw)0 (y − Xw)

ξi2

Introduction to Kernel Methods

Loss function

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Minimize

L(g, S ) = L(w, S ) =

l X

(yi − g(xi ))2 =

i=1

=

l X

l X i=1

L(g, (xi , yi ))

i=1

• This could be written as

L(w, S ) = kξk2 = (y − Xw)0 (y − Xw)

ξi2

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

∂L(w, S ) = −2X0 y + 2X0 Xw = 0, ∂w therefore X0 Xw = X0 y,

Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

and w = (X0 X)−1 X0 y

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Dual representation of the problem

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• w = (X0 X)−1 X0 y = X0 X(X0 X)−2 X0 y = X0 α • So, w is a linear combination of the training samples,

w=

Pl

i=1 αi xi .

Introduction to Kernel Methods

Dual representation of the problem

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• w = (X0 X)−1 X0 y = X0 X(X0 X)−2 X0 y = X0 α • So, w is a linear combination of the training samples,

w=

Pl

i=1 αi xi .

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• From the solution of the primal problem:

The Kernel Trick

X0 Xw = X0 y,

The Kernel Approach to Machine Learning

• then

A Kernel Pattern Analysis Algorithm

• using the dual representation

XX0 Xw = XX0 y,

XX0 XX0 α = XX0 y,

Primal linear regression Dual linear regression

Kernel Functions

• then

α = (XX0 )−1 y,

Kernel Algorithms Kernels in Complex Structured Data

• and

g(x) = w0 x = α0 Xx. • Note: XX0 may be close to singular, or singular according

to machine precision.

Introduction to Kernel Methods

Ridge regression

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• If XX0 is singular, the pseudo-inverse could be used: to

find the w that satisfies X0 Xw = X0 y with minimal norm. • Optimisation problem:

min Lλ (w, S ) = min λ kwk2 + w

w

l X

(yi − g(xi ))2 ,

i=1

where λ defines the trade-off between norm and loss. This controls the complexity of the model (the process is called regularization).

Introduction to Kernel Methods

Ridge regression

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• If XX0 is singular, the pseudo-inverse could be used: to

find the w that satisfies X0 Xw = X0 y with minimal norm. • Optimisation problem:

min Lλ (w, S ) = min λ kwk2 + w

w

l X

(yi − g(xi ))2 ,

i=1

where λ defines the trade-off between norm and loss. This controls the complexity of the model (the process is called regularization).

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick

• Taking the derivative and making it equal to zero:

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,

w = (X0 X + λIn )−1 X0 y.

Primal linear regression Dual linear regression

Kernel Functions

• In terms of α:

w = λ−1 X0 (y − Xw) = X0 α,

Kernel Algorithms Kernels in Complex Structured Data

• then

α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick

• Taking the derivative and making it equal to zero:

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,

w = (X0 X + λIn )−1 X0 y.

Primal linear regression Dual linear regression

Kernel Functions

• In terms of α:

w = λ−1 X0 (y − Xw) = X0 α,

Kernel Algorithms Kernels in Complex Structured Data

• then

α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick

• Taking the derivative and making it equal to zero:

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,

w = (X0 X + λIn )−1 X0 y.

Primal linear regression Dual linear regression

Kernel Functions

• In terms of α:

w = λ−1 X0 (y − Xw) = X0 α,

Kernel Algorithms Kernels in Complex Structured Data

• then

α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction The Kernel Trick

• Taking the derivative and making it equal to zero:

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

X0 Xw + λw = (X0 X + λIn )w = X0 y, where In is an identity matrix of n × n dimension, • then,

w = (X0 X + λIn )−1 X0 y.

Primal linear regression Dual linear regression

Kernel Functions

• In terms of α:

w = λ−1 X0 (y − Xw) = X0 α,

Kernel Algorithms Kernels in Complex Structured Data

• then

α = λ−1 (y − Xw) = (XX0 + λIl )−1 y.

Introduction to Kernel Methods

Prediction function

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

g(x) = hw, xi =

* l X i=1

+ αi xi , x

=

l X i=1

αi hxi , xi

Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products

  x0 1 hx1 , x1 i  ..  0  G = XX =  .  [x1 · · · xl ] = hxl , x1 i x0 l 

hx1 , xl i

 hxl , xl i

• G may be replaced by a general kernel matrix, K, with

kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:

α = (K + λIl )−1 y • The predicted function is approximated as:

 k (x, x1 )   .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1   . i=1 k (x, xl ) l X





Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products

  x0 1 hx1 , x1 i  ..  0  G = XX =  .  [x1 · · · xl ] = hxl , x1 i x0 l 

hx1 , xl i

 hxl , xl i

• G may be replaced by a general kernel matrix, K, with

kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:

α = (K + λIl )−1 y • The predicted function is approximated as:

 k (x, x1 )   .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1   . i=1 k (x, xl ) l X





Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products

  x0 1 hx1 , x1 i  ..  0  G = XX =  .  [x1 · · · xl ] = hxl , x1 i x0 l 

hx1 , xl i

 hxl , xl i

• G may be replaced by a general kernel matrix, K, with

kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:

α = (K + λIl )−1 y • The predicted function is approximated as:

 k (x, x1 )   .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1   . i=1 k (x, xl ) l X





Introduction to Kernel Methods F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Primal linear regression Dual linear regression

Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Ridge regression as a kernel method • The Gram matrix G = XX0 is the matrix of dot products

  x0 1 hx1 , x1 i  ..  0  G = XX =  .  [x1 · · · xl ] = hxl , x1 i x0 l 

hx1 , xl i

 hxl , xl i

• G may be replaced by a general kernel matrix, K, with

kij = k (xi , xj ) = < φ(xi ), φ(xj ) > • The α’s are calculated as:

α = (K + λIl )−1 y • The predicted function is approximated as:

 k (x, x1 )   .. g(x) = αi k (x, xi ) = y 0 (K + λIl )−1   . i=1 k (x, xl ) l X





Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Characterisation

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Theorem (Mercer’s Theorem) A function k : X × X → R, which is either continuous or has a countable domain, can be decomposed k (x, z) = hφ(x), φ(z)i into a feature map φ into a Hilbert space F applied to both its arguments followed by the evaluation of the inner product in F if and only if it satisfies the finitely positive semi-definite property.

Introduction to Kernel Methods

Some kernel functions

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive

coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)

Introduction to Kernel Methods

Some kernel functions

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive

coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)

Introduction to Kernel Methods

Some kernel functions

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive

coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)

Introduction to Kernel Methods

Some kernel functions

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

Assume k1 and k2 kernels: • k (x, z) = p(k1 (x, z)). p a polynomial with positive

coefficients. • k (x, z) = exp(k1 (x, z)). • k (x, z) = exp(− kx − zk2 /(2σ 2 )). Gaussian kernel. • k (x, z) = k1 (x, z)k2 (x, z)

Introduction to Kernel Methods F. Gonz´ alez Introduction

Embeddings corresponding to kernels

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

• It is possible to calculate the feature space induced by a

Kernel Functions

• This can be done in a constructive way

Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.

Introduction to Kernel Methods F. Gonz´ alez Introduction

Embeddings corresponding to kernels

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

• It is possible to calculate the feature space induced by a

Kernel Functions

• This can be done in a constructive way

Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.

Introduction to Kernel Methods F. Gonz´ alez Introduction

Embeddings corresponding to kernels

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

• It is possible to calculate the feature space induced by a

Kernel Functions

• This can be done in a constructive way

Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

kernel (Mercer’s Theorem) • The feature space can even be of infinite dimension.

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

How to visualize?

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the

feature space: kφ(p0 ) − φ(x )k2F

= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )

• Plot f (x ) = kφ(p0 ) − φ(x )k2F

Introduction to Kernel Methods

How to visualize?

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the

feature space: kφ(p0 ) − φ(x )k2F

= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )

• Plot f (x ) = kφ(p0 ) − φ(x )k2F

Introduction to Kernel Methods

How to visualize?

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

• Choose a point in input space p0 • Calculate the distance from another point x to p0 in the

feature space: kφ(p0 ) − φ(x )k2F

= hφ(p0 ) − φ(x ), φ(p0 ) − φ(x )iF = hφ(p0 ), φ(p0 )iF + hφ(x ), φ(x )iF −2 hφ(p0 ), φ(x )iF = k (p0 , p0 ) + k (x , x ) − 2k (p0 , x )

• Plot f (x ) = kφ(p0 ) − φ(x )k2F

Introduction to Kernel Methods F. Gonz´ alez

Identity kernel

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

k (x , z ) = hx , z i

Introduction to Kernel Methods F. Gonz´ alez

Quadratic kernel (1)

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

k (x , z ) = hx , z i2

Introduction to Kernel Methods F. Gonz´ alez

Identity kernel (2)

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

k (x , z ) = hx , z i2

Introduction to Kernel Methods F. Gonz´ alez

Gaussian kernel

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Mathematical characterisation Visualizing kernels in input space

Kernel Algorithms Kernels in Complex Structured Data

k (x, z) = e −

kx−zk2 2σ 2

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Basic computations in feature space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Means • Distances • Projections • Covariance

Introduction to Kernel Methods

Basic computations in feature space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Means • Distances • Projections • Covariance

Introduction to Kernel Methods

Basic computations in feature space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Means • Distances • Projections • Covariance

Introduction to Kernel Methods

Basic computations in feature space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Means • Distances • Projections • Covariance

Introduction to Kernel Methods F. Gonz´ alez

Classification and regression

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron

Introduction to Kernel Methods F. Gonz´ alez

Classification and regression

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron

Introduction to Kernel Methods F. Gonz´ alez

Classification and regression

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron

Introduction to Kernel Methods F. Gonz´ alez

Classification and regression

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Support Vector Machines • Support Vector Regression • Kernel Fisher Discriminant • Kernel Perceptron

Introduction to Kernel Methods

Dimensionality reduction and clustering

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM

Introduction to Kernel Methods

Dimensionality reduction and clustering

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM

Introduction to Kernel Methods

Dimensionality reduction and clustering

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM

Introduction to Kernel Methods

Dimensionality reduction and clustering

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Kernel PCA • Kernel CCA • Kernel k -means • Kernel SOM

Introduction to Kernel Methods

Outline

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

1 Introduction

Motivation 2 The Kernel Trick

Mapping the input space to the feature space Calculating the dot product in the feature space 3 The Kernel Approach to Machine Learning 4 A Kernel Pattern Analysis Algorithm

Primal linear regression Dual linear regression 5 Kernel Functions

Mathematical characterisation Visualizing kernels in input space 6 Kernel Algorithms 7 Kernels in Complex Structured Data

Introduction to Kernel Methods

Kernels in complex structured data

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Since kernel methods do not require an attribute-based

representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,

dissimilarity measure) • Examples: • • • •

Strings Texts Trees Graphs

Introduction to Kernel Methods

Kernels in complex structured data

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Since kernel methods do not require an attribute-based

representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,

dissimilarity measure) • Examples: • • • •

Strings Texts Trees Graphs

Introduction to Kernel Methods

Kernels in complex structured data

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• Since kernel methods do not require an attribute-based

representation of objects, it is possible to perform learning over complex structured data (or unstructured data) • We only need to define a dot product operation (similarity,

dissimilarity measure) • Examples: • • • •

Strings Texts Trees Graphs

Introduction to Kernel Methods

Problem 2

F. Gonz´ alez Introduction The Kernel Trick

How to do symbolic regression?

The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Σ = {A, C , G, T } f :

Σd ACGTA GTCCA GGTAC CCTGA .. .

→ R 7 → 10.0 7 → 11.3 7 → 1.0 7 → 4.5 .. .. . .

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

1 0

i=1

equal (si , ti )

if si = ti otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

1 0

i=1

equal (si , ti )

if si = ti otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

1 0

i=1

equal (si , ti )

if si = ti otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

i=1

equal (si , ti )

1 if si = ti 0 otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

i=1

equal (si , ti )

1 if si = ti 0 otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Solution

F. Gonz´ alez Introduction

• Define a kernel on strings

The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

k : Σd × Σd → R • Use the kernel along with a kernel learning regression

algorithm to find the regression function • What is a good candidate for k ? • a function that measures string similarity • higher value for similar strings, smaller value for different

strings • k (s1 . . . sd , t1 . . . td ) = (

equal (si , ti ) =

i=1

equal (si , ti )

1 if si = ti 0 otherwise

• k (ACTAG, CCTCG) =? • Is it a kernel?

Pn

Introduction to Kernel Methods

Induced Feature Space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• What is the feature space induced by k ? •

φ : Σd

→ R4d

7→ (x11 , . . . , x41 , x12 , . . . , x42 , . . . , x1d , . . . , x4d )  0 0  (1, 0, 0, 0) if sj = A   (0, 1, 0, 0) if s = 0 C0 j (x1j , . . . , x4j ) = 0 0  (0, 0, 1, 0) if s j = G    (0, 0, 0, 1) if s = 0 T0 j

s1 . . . sd

Introduction to Kernel Methods

Induced Feature Space

F. Gonz´ alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

• What is the feature space induced by k ? •

φ : Σd

→ R4d

7→ (x11 , . . . , x41 , x12 , . . . , x42 , . . . , x1d , . . . , x4d )  0 0  (1, 0, 0, 0) if sj = A   (0, 1, 0, 0) if s = 0 C0 j (x1j , . . . , x4j ) = 0 0  (0, 0, 1, 0) if s j = G    (0, 0, 0, 1) if s = 0 T0 j

s1 . . . sd

Introduction to Kernel Methods F. Gonz´ alez

References

Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm Kernel Functions Kernel Algorithms Kernels in Complex Structured Data

Shawe-Taylor, J. and Cristianini, N. 2004 Kernel Methods for Pattern Analysis. Cambridge University Press.

Introduction to Kernel Methods

4 A Kernel Pattern Analysis Algorithm. Primal linear regression. Dual linear regression. 5 Kernel Functions. Mathematical characterisation. Visualizing kernels in ...

2MB Sizes 2 Downloads 278 Views

Recommend Documents

Chapter 7 An Introduction to Kernel Methods
it may be best to allow a higher number of false positives if this im- proved the true positive .... visualize this as a band or tube of size ¦( -) around the hypothesis function (x) and any points ..... as an important set of tools for machine lear

Chapter 7 An Introduction to Kernel Methods
successful application areas as illustrations: machine vision, handwritten character ... successful area of application including 3D face recognition, pedestrian.

Designing BSD Rootkits - An Introduction to Kernel Hacking~tqw~_ ...
FINAL SCHEDULE AS OF 07/24/17. Page 3 of 164. Designing BSD Rootkits - An Introduction to Kernel Hacking~tqw~_darksiderg.pdf. Designing BSD Rootkits ...

Designing BSD Rootkits - An Introduction to Kernel Hacking~tqw~_ ...
There was a problem previewing this document. Retrying. ... Designing BSD Rootkits - An Introduction to Kernel Hacking~tqw~_darksiderg.pdf. Designing BSD ...

Introduction to Research Methods
Introduction to Research Methods. PU-46864. US/Data/Reference. 5/5 From 660 Reviews. Robert B Burns. *Download PDF | ePub | DOC | audiobook | ebooks.

CSc 3200 Introduction to Numerical Methods
Introduction to Numerical Methods. Instructor. : Fikret Ercal - Office: CS 314, Phone: 341-4857. E-mail & URL : [email protected] http://web.mst.edu/~ercal/index.html. Office Hours : posted on the class website. **If there is no prior notice and the inst

Introduction to Clustering Methods
Oct 15, 2012 - Biology: Clustering has been applied to genomic data to group functionally ... Geological mapping, Bio-informatics, Climate, Web mining. Dr. Bidyut Kr. ... Simple Matching Coefficient (SMC): Let x and y be two N-dimensional binary vect

Kernel Methods for Learning Languages - NYU Computer Science
Dec 28, 2007 - cCourant Institute of Mathematical Sciences,. 251 Mercer Street, New ...... for providing hosting and guidance at the Hebrew University. Thanks also to .... Science, pages 349–364, San Diego, California, June 2007. Springer ...

Kernel Methods for Minimum Entropy Encoding
crucial impact in diverse domains, ranging from bioinformat- ics, medical science ... recently proposed to make it more affordable. Xu et al. [3] for- mulate a ...

Kernel Methods for Learning Languages - Research at Google
Dec 28, 2007 - its input labels, and further optimize the result with the application of the. 21 ... for providing hosting and guidance at the Hebrew University.

A survey of kernel methods for relation extraction
tasks were first formulated, all but one of the systems (Miller et al., 1998) were based on handcrafted ... Hardcom Corporation”. Fig. 1. Example of the .... method first automatically determined a dynamic context-sensitive tree span. (the original

Kernel Methods for Object Recognition - University of Oxford
Jun 20, 2009 - We can define a centering matrix. H = I −. 1 n ..... Define the normalized independence criterion to be the ..... Structured Output Support Vector.

Kernel Methods for Object Recognition - University of Oxford
Jun 20, 2009 - KPCA is maximization of auto-covariance ... Training data consists of images with text captions ..... Perceptron Training with Multiclass Joint.

Nystrom Approximation for Sparse Kernel Methods ...
synthetic data and real-world data sets. Experimental results have indicated the huge acceleration of the Nyström method on training time while maintaining the ...