Support Vector Machines Saurabh Joshi-(03305R02), Nitin Agrawal-(03305019) and Vaibhav Gupta -(03305903) (sbjoshi,nitina, [email protected])

Guide : Prof. P Bhattacharya

Oct. 2004

Support Vector Machines

Oct. 2004

Motivation

Getting stuck at local minima is a problem Considering only training error may result in overfitting Time required for training Finding out optimal number of hidden units is a problem

:

1

Support Vector Machines

Oct. 2004

Perceptron Training Algorithm



Let Repeat if



 









 



 





 



then











 

 

Until some stopping criterion met 



where

 













So final



:

2

Support Vector Machines

Oct. 2004

Dual Representation

Decision function can be written as 

 

















 



  

 

Update rule can now be written as then if

 



















  

  





 



 



:

represents the amount of information provided by point

3

Support Vector Machines

Oct. 2004

Margin

(a)

(b)

Better generalization is expected in case of fig (b).

:

4

Support Vector Machines

Oct. 2004

Maximum Margin Classifier



Equation of a hyperplane is



 



  

  

Geometric distance of a point from the hyperplane is

 









 

then margin





If for closest point



 

would increase Geometric margin





:







Minimizing

5

Support Vector Machines

Oct. 2004

Maximum Margin Classifier (Contd.)

Quadratic Programming





Minimize



















for









subject to





Optimal separating hyperplane is the one with the maximum margin There is a unique optimal hyperplane Greater the margin better the generalization :

6





































 







 

































 





 



















 

 

























 

   

 



 

















 









7 :

, Subject to

where 

Maximize

, therefore

, at optimum

Oct. 2004 Support Vector Machines

Primal and Dual Lagrangian

Dual Problem

Support Vector Machines

Oct. 2004

at solution 



 

 



 









 



  





 









only when

 



so







points with

 



are called support vectors

support vectors are the only points necessary for decision function support vectors in a sense support the decision surface 















 



















 

 

  

Decision function is now

for any support vector





Bias can be calculated as





:

8

Support Vector Machines

Oct. 2004

Soft Margin Classifier

Non-linearly separable data - Linear Decision Surface







 













Minimize Subject to

 







 













 



for









indicates amount of constraint violation by point

indicates how much do we penalize for the violation of constraint controls the trade-off between Generalization error and Training error :

9

Support Vector Machines

Oct. 2004

Contd.

W ξ

Margin

:

10

Support Vector Machines

Oct. 2004

Contd.

Corresponding Dual is





  

















 





 

Maximize Subject to







   









 

 

,

 



 









 









at optimum point we have



 







The decision function is same as in linearly separable case with increased constraint 







:

11

Support Vector Machines

Oct. 2004

Non-linear Decision Surfaces

Map input space to high dimensional Feature space Introduce a linear decision surface in High Dimensional space φ





Mapping







:

12

Support Vector Machines

Oct. 2004

XOR example

x3

1

x2

0 x1 1

0













 



 



































,







  

,

equation of a hyperbola is changed to equation of a plane :

13

Support Vector Machines

Oct. 2004

Kernel

For solving dual problem we only need to calculate Dot product of vectors in feature space Kernel function does this implicitly !!! Insensitive to dimension of feature space

For Example







 







 



 

 











 

















 











:













for

14

Support Vector Machines

as polynomial kernel function.

 











In general

Oct. 2004









Several powerful kernel functions exist 

 

– Gaussian RBF kernel : – Two-layer perceptron :



 





 













 









Choice of kernel is done by user The decision function is given by  



 























 









:

15

Support Vector Machines

Oct. 2004

Some snapshots

:

16

Support Vector Machines

Oct. 2004

Contd.

:

17

Support Vector Machines

Oct. 2004

Contd.

:

18

Support Vector Machines

Oct. 2004

Contd.

:

19

Support Vector Machines

Oct. 2004

Training Algorithm Training algorithm for fixed bias 

and learning rates











Given training set 

 

Repeat for i = 1 to l 



































 















if then else then if end for Until some stopping criterion satisfied return 























:

20

Support Vector Machines

Oct. 2004

More about SVMs

For large data sets Chunking ,Sequential Minimal Optimization used to reduce memory requirements In Chunking apply SVM training algorithm on a subset of data and discard all but Support Vectors Add points violating constraints to Support Vectors to form a new chunk and iterate until all points are considered 

In SMO modify two to increase value of objective function without violating the constraints

:

21

Support Vector Machines

Oct. 2004

Multiclass SVMs - One-against-All,One-against-One,etc. One-against-All : use

SVMs corresponding to each class 











SVMs corresponding to each pair of



One-against-One : use class

SVMs can be used for regression also

:

22

Support Vector Machines

Oct. 2004

Application of SVMs

Text Categorization Intrusion Detection Image Recognition Bioinformatics Handwritten Character Recognition

:

23

Support Vector Machines

Oct. 2004

Application of SVMs(contd.) For example in Text Categorization each dimension of feature space is represented by a stem word ”compute” is a stem word with respect to computers, computation etc. stop word like and, or, the, of etc. are not considered So a document is represented as a point depending upon which stem words are present and in what number Order of words is insignificant in this case

:

24

Support Vector Machines

Oct. 2004

Real Life Example

US Postal Service : Digit Recognition Problem

Training data :7291

Test Data : 2007

Input Space Dimensionality: 256 (16 x 16) :

25

Support Vector Machines

Oct. 2004

Classifier Human Performance Decision tree Best two layer neural network 5- layer neural network SVM with Kernel Polynomial RBF Neural Network

:

Raw Error 2.5 % 16.2% 5.9 % 5.1 %

Number of SV 274 291 254

Raw Error 4.0% 4.1% 4.2%

26

Support Vector Machines

Oct. 2004

Conclusion

SVM performance is sensitive to the choice of kernel Global optimum can be achieved Training in polynomial time Potential Alternative to Neural Network as it performed better in many classification tasks We expect that SVMs will extend its scope of application to more diverse fields :

27

Support Vector Machines

Oct. 2004

References

[1] http://cortex.informatik.tu-ilmenau.de/ koenig/monist/applets/html/AppletSVM. [2] Christopher J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998. [3] Nello Cristianini and John Shawe-Taylor. An introduction to Support Vector Machines. Cambridge University Press, 2000. [4] Robert Freund Edgar E. Osuna and Federico Girosi. Support vector :

28

Support Vector Machines

Oct. 2004

machines: Training and applications. Technical report, MIT, AI Lab, 1997. [5] Simon Haykin. Neural Networks. Pearson Education, 2003. [6] Vladimir N. Vapnik. Statistical Learning Theory. Interscience Publication, 1998.

:

A Wiley-

29

Support Vector Machines

Oct. 2004

Proposal for the project

Porting some non-trivial application to SVM tool and analyze

OR Comparison of Neural Network and SVM using tools like SNNS and SVMLight.

:

30

Support Vector Machines

Oct. 2004

Q&A

:

31

Support Vector Machines

Porting some non-trivial application to SVM tool and analyze. OR а. Comparison of Neural Network and SVM using tools like SNNS and. SVMLight. : 30 ...

185KB Sizes 1 Downloads 106 Views

Recommend Documents

Privacy Preserving Support Vector Machines in ... - GEOCITIES.ws
public key and a signature can be used. .... authentication code (MAC) which is derived from the ... encryption-decryption and authentication to block the.

Model Selection for Support Vector Machines
New functionals for parameter (model) selection of Support Vector Ma- chines are introduced ... tionals, one can both predict the best choice of parameters of the model and the relative quality of ..... Computer Science, Vol. 1327. [6] V. Vapnik.

SVM. Support Vector Machines. Francisco Javier Cervigon ...
Page 4 of 8. SVM. Support Vector Machines. Francisco Javier Cervigon Ruckauer.pdf. SVM. Support Vector Machines. Francisco Javier Cervigon Ruckauer.pdf.

Privacy Preserving Support Vector Machines in Wireless Sensor ...
mining in Wireless Sensor Networks (WSN) while ... The one of most useful applications in Wireless ..... (Institute of Information Technology Advancement).

Theory of Support Vector Machines - Mark O. Stitson
Department of Computer Science. Egham, Surrey .... From an intuitive viewpoint the best line would be the line. that has the ... The best line or optimal separating hyperplane would then ..... Cummings, Redwood City, California, second edition, 1992.

Parallelizing Support Vector Machines on ... - Research at Google
loads only essential data to each machine to perform parallel computation. Let n denote the number .... is O(np), and hence not practical for very large data set.