Guide : Prof. P Bhattacharya

Oct. 2004

Support Vector Machines

Oct. 2004

Motivation

Getting stuck at local minima is a problem Considering only training error may result in overfitting Time required for training Finding out optimal number of hidden units is a problem

:

1

Support Vector Machines

Oct. 2004

Perceptron Training Algorithm

Let Repeat if

then

Until some stopping criterion met

where

So final

:

2

Support Vector Machines

Oct. 2004

Dual Representation

Decision function can be written as

Update rule can now be written as then if

:

represents the amount of information provided by point

3

Support Vector Machines

Oct. 2004

Margin

(a)

(b)

Better generalization is expected in case of fig (b).

:

4

Support Vector Machines

Oct. 2004

Maximum Margin Classifier

Equation of a hyperplane is

Geometric distance of a point from the hyperplane is

then margin

If for closest point

would increase Geometric margin

:

Minimizing

5

Support Vector Machines

Oct. 2004

Maximum Margin Classifier (Contd.)

Quadratic Programming

Minimize

for

subject to

Optimal separating hyperplane is the one with the maximum margin There is a unique optimal hyperplane Greater the margin better the generalization :

6

7 :

, Subject to

where

Maximize

, therefore

, at optimum

Oct. 2004 Support Vector Machines

Primal and Dual Lagrangian

Dual Problem

Support Vector Machines

Oct. 2004

at solution

only when

so

points with

are called support vectors

support vectors are the only points necessary for decision function support vectors in a sense support the decision surface

Decision function is now

for any support vector

Bias can be calculated as

:

8

Support Vector Machines

Oct. 2004

Soft Margin Classifier

Non-linearly separable data - Linear Decision Surface

Minimize Subject to

for

indicates amount of constraint violation by point

indicates how much do we penalize for the violation of constraint controls the trade-off between Generalization error and Training error :

9

Support Vector Machines

Oct. 2004

Contd.

W ξ

Margin

:

10

Support Vector Machines

Oct. 2004

Contd.

Corresponding Dual is

Maximize Subject to

,

at optimum point we have

The decision function is same as in linearly separable case with increased constraint

:

11

Support Vector Machines

Oct. 2004

Non-linear Decision Surfaces

Map input space to high dimensional Feature space Introduce a linear decision surface in High Dimensional space φ

Mapping

:

12

Support Vector Machines

Oct. 2004

XOR example

x3

1

x2

0 x1 1

0

,

,

equation of a hyperbola is changed to equation of a plane :

13

Support Vector Machines

Oct. 2004

Kernel

For solving dual problem we only need to calculate Dot product of vectors in feature space Kernel function does this implicitly !!! Insensitive to dimension of feature space

For Example

:

for

14

Support Vector Machines

as polynomial kernel function.

In general

Oct. 2004

Several powerful kernel functions exist

– Gaussian RBF kernel : – Two-layer perceptron :

Choice of kernel is done by user The decision function is given by

:

15

Support Vector Machines

Oct. 2004

Some snapshots

:

16

Support Vector Machines

Oct. 2004

Contd.

:

17

Support Vector Machines

Oct. 2004

Contd.

:

18

Support Vector Machines

Oct. 2004

Contd.

:

19

Support Vector Machines

Oct. 2004

Training Algorithm Training algorithm for fixed bias

and learning rates

Given training set

Repeat for i = 1 to l

if then else then if end for Until some stopping criterion satisfied return

:

20

Support Vector Machines

Oct. 2004

More about SVMs

For large data sets Chunking ,Sequential Minimal Optimization used to reduce memory requirements In Chunking apply SVM training algorithm on a subset of data and discard all but Support Vectors Add points violating constraints to Support Vectors to form a new chunk and iterate until all points are considered

In SMO modify two to increase value of objective function without violating the constraints

:

21

Support Vector Machines

Oct. 2004

Multiclass SVMs - One-against-All,One-against-One,etc. One-against-All : use

SVMs corresponding to each class

SVMs corresponding to each pair of

One-against-One : use class

SVMs can be used for regression also

:

22

Support Vector Machines

Oct. 2004

Application of SVMs

Text Categorization Intrusion Detection Image Recognition Bioinformatics Handwritten Character Recognition

:

23

Support Vector Machines

Oct. 2004

Application of SVMs(contd.) For example in Text Categorization each dimension of feature space is represented by a stem word ”compute” is a stem word with respect to computers, computation etc. stop word like and, or, the, of etc. are not considered So a document is represented as a point depending upon which stem words are present and in what number Order of words is insignificant in this case

:

24

Support Vector Machines

Oct. 2004

Real Life Example

US Postal Service : Digit Recognition Problem

Training data :7291

Test Data : 2007

Input Space Dimensionality: 256 (16 x 16) :

25

Support Vector Machines

Oct. 2004

Classifier Human Performance Decision tree Best two layer neural network 5- layer neural network SVM with Kernel Polynomial RBF Neural Network

:

Raw Error 2.5 % 16.2% 5.9 % 5.1 %

Number of SV 274 291 254

Raw Error 4.0% 4.1% 4.2%

26

Support Vector Machines

Oct. 2004

Conclusion

SVM performance is sensitive to the choice of kernel Global optimum can be achieved Training in polynomial time Potential Alternative to Neural Network as it performed better in many classification tasks We expect that SVMs will extend its scope of application to more diverse fields :

27

Support Vector Machines

Oct. 2004

References

[1] http://cortex.informatik.tu-ilmenau.de/ koenig/monist/applets/html/AppletSVM. [2] Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998. [3] Nello Cristianini and John Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, 2000. [4] Robert Freund Edgar E. Osuna and Federico Girosi. Support vector :

28

Support Vector Machines

Oct. 2004

machines: Training and applications. Technical report, MIT, AI Lab, 1997. [5] Simon Haykin. Neural Networks. Pearson Education, 2003. [6] Vladimir N. Vapnik. Statistical Learning Theory. Interscience Publication, 1998.

:

A Wiley-

29

Support Vector Machines

Oct. 2004

Proposal for the project

Porting some non-trivial application to SVM tool and analyze

OR Comparison of Neural Network and SVM using tools like SNNS and SVMLight.

:

30

Support Vector Machines

Oct. 2004

Q&A

:

31