design of support vector machines with time frequency ...

Viewer
Transcript

DESIGN OF SUPPORT VECTOR MACHINES WITH TIME FREQUENCY KERNELS FOR CLASSIFICATION OF EEG SIGNALS Anurag Kumar* Aurobinda Routray** Ashok Kumar Pradhan*** Bibhukalyan Prasad Nayak****

* Final Year B.Tech. Student in Electrical Engineering, IIT Kharagpur` West Bengal 721 302, India, Email: [email protected] ** Associate Professor, Department of Electrical Engineering, IIT Kharagpur, West Bengal 721 302, India, Email: [email protected] ***Assistant Professor, Department of Electrical Engineering, IIT Kharagpur, West Bengal 721 302, India, Email: [email protected] **** Research Scholar, Division of Bioengineering, NUS Singapore, [email protected] Abstract: The paper presents a classification method for EEG signals using Support Vector Machines (SVM) with Time-Frequency Kernels. Because of the non-stationary nature, the EEG signals do not exhibit unique characteristics in the frequency domain. Therefore, Time- Frequency transformations have been suggested to extract the common features for a particular mental task performed by different subjects. The Short-TimeFourier-Transform (STFT) and Wigner-Ville type of Time-Frequency Kernels have been chosen for transforming the input data space into the feature space. Experimental results show that SVM classifiers using such feature vectors are very effective for classification of the EEG signals. The data obtained from ten different subjects each performing three different mental tasks, have been used for testing this method. The major contribution of this paper is in testing the different Time-Frequency Kernels belonging to Cohen’s class. A comparative assessment of the classification performance with the conventional Gaussian Kernels in Time as well as Frequency domain has been also performed. Keyword: EEG Classification, machine, Time-Frequency Kernels

support-vector

I. INTRODUCTION Recently significant research is being pursued on alternative methods for communication between human and computer. Brain-computer interface systems are one such research area since such systems require very small amount of physical activity. The classification of EEG signals plays an important role in Brain-Computer Interface (BCI) systems. Most BCI systems make use of mental tasks that lead to distinguishable electroencephalogram (EEG) signals of different classes. However, EEG data are very noisy and have different types of artifacts. Moreover it consists of signal mixtures of several brain sources and noise sources which make the problem of classification even more difficult. Several attempts have been made to classify EEG signals. Methods based on Support Vector Machines(SVM) proposed as by Vapnik[1] have

attracted wide-spread attention. Literature suggests some of its applications to the EEG signal classification problem. In [2] a recursive Feature extraction algorithm based on [3] has been used to identify the features of an EEG signal. Subsequently an SVM has been designed for the classification of these features. In [4] the SVM is applied in the Fourier and Time-Frequency domain for the EEG current source classification task. Principal component analysis has been used to extract features and has been fed into a Hidden Markov Model (HMM) for training in [6]. However, linear classification has a very simple model and things can go terribly wrong if the underlying assumptions do not hold, e.g. in the presence of outliers or strong noise which are the situations very typically encountered in BCI data analysis. On the other hand, Kernel based learning maintains the beneficial properties of linear classification (as linear classification is done in the feature space) while making overall classification nonlinear in the input space (since feature and input space is non-linearly related). In this paper, we have used Support Vector Machines (SVM) for classification of EEG signals. SVM (an effective machine learning method proposed by Vapnik [1] for classification) is a two-class classifier whose goal is to find a hyperplane such that maximum number of points of same class is on the same side while maximizing the distance of either class from the hyperplane. We have shown that SVM with Time-Frequency based kernels give a better classification result than other classifiers. The ShortTime-Fourier Transform (STFT) and Wigner type of time-frequency Kernels have been chosen for transforming the input data space to the feature space. II. SUPPORT VECTOR CLASSIFICATION Let Τ N be set of N labeled data points in an Mdimensional hyper space: TN = ( ( x1 , d1 ) , L , ( x N , d N ) ) ∈ ( X × D )

N

in which xi ∈ X , where X is the input data space and di ∈ D, where D

{−1,1}

is the label space

The problem is formulated to design a function ψ such that ψ:X →D

predicts d from the input x Under normal circumstances X cannot be partitioned by a linear decision boundary. However X can be transformed into an equal or higher dimensional feature space for making it linearly separable (Cover’s Theorem)[15]. Now the problem of finding a nonlinear decision boundary in X has been transformed into a problem of finding the optimal hyperplane for separating the two classes. The hyper plane in this transformed domain (called the feature space, Fig.1) can be parameterized by the vector ( w, b ) as: P

∑ w φ (x) + b = 0 j =1

j

(1)

j

The dual of this optimization problem is formulated as N

maximize Q (α ) = ∑ α − i =1

subject to N

∑α d

(1)

i =1

( 2)

i

i

=0

0 ≤ αi ≤ C

for i = 1, 2, L , N

The optimum weights can be calculated from the optimized values of the dual variables α N

w 0 = ∑ α 0,i di Φ ( xi )

(4)

i =1

where, w 0 is the optimum values of the weight variables the first element being the optimum value of the bias α 0 is the optimum value of the dual variables

φ ( ⋅)

The decission function can be written in the feature space as N

xi

g ( x ) = ∑ α i di K ( xi , x ) + b

φ ( xi )

Input Data Space

g ( x ) > 0 x is in class 1

The mapping φ(⋅) need not be computed explicitly; instead, an inner product Kernel [15] of the form

φ ( xi ) , φ ( x j ) = K ( xi , x j )

g ( x ) < 0 x is in class 2

Feature Space

Fig.1 The transformation

(2)

can be used for finding the optimal hyperplanes. In this paper we will be using Cohen’s group of T-F frequency Kernels as given in [16]. These are discussed in the following section. Generally the patterns in the input space even after transformation are not perfectly separable with linear functions (hyper planes). We will use soft boundaries and try to construct an optimal hyper plane that would also minimize the classification error. The primal optimization problem can be thus set up as: [15] Given a training set TN find the optimum values of the weight vector w and bias b such that they satisfy the constraint di ( w T xi + b ) ≥ 1 − ζ i , for i = 1, 2, L , N and such that the weight vector w and the slack variables ζ i

where, C is a user-specified positive parameter

Gaussian Kernel in Time domain Given two signals s and s' the Kernel is given as 2⎤ ⎡ 1 K ( s, s ' ) = exp ⎢ − Ns − Ns ' ⎥ 2 2 ⎣ 2σ ⎦ where Ns is the normalized signal over the chosen window

s

and Ns ' =

M

∑s k =1

k

(6)

s' M

∑s k =1

' k

M is the window size or the length of the signal

minimize the cost functional N 1 T w w + C∑ζ i 2 i =1

III. KERNEL DESIGN For transforming the input data into a feature space various time frequency Kernels are chosen. • Gaussian Kernel in the Time domain • Gaussian Kernel in the Frequency Domain • Cohen’s Group Time Frequency Kernels[16] o Discrete Short-Time Fourier Transform o Discrete Wigner-Ville Frequency Distribution It can be shown that these Kernels satisfy the Mercer’s conditions as given in [15]

Ns =

ζ i ≥ 0, for all i

(5)

i=1

if

Φ ( w, ζ ) =

1 N N ∑ ∑ α iα j di d j xTi x j 2 i =1 j =1

(3)

The Discrete Wigner-Ville Frequency Distribution

Gaussian Kernel in Frequency domain(Fourier) Given two signals s and s' the Kernel is given as

S ( m, n ) =

∑ S(f )

NS ' ( f ) =

S

'

2N i.e. the frequency axis gets sampled twice the usual rate

(f)

∑ S (f ) '

k

and S ( f ) is the Discrete Fourier Transform of the signal given by 2 M

M −1

∑ s ( k )e

−j

2π kn M

and

(8)

k =0

2π kn

−j 2 M −1 ' (9) s ( k )e M ∑ M k =0 n = 1, 2, L M Cohen’s Group Time-Frequency Kernels (TFR) Given two signals s and s' the Kernel is given as

S' ( fn ) =

1 ⎡M N 2⎤ ∑ ∑ NS (m, n) − NS (m, n) ⎥⎦ (10) 2σ 2 ⎢⎣ m =1 n =1 where the notation NS ( m, n) emphasizes the normalization of the TFR K ( s, s ' ) = exp−

NS (m, n) =

S ( m, n) M

V. RESULTS AND DISCUSSIONS The EEG data for this paper have been obtained from the database as mentioned in [13]. There are 10 subjects carrying out three different mental activities i.e. • Sitting Idle • Doing a Multiplication • Composing a Song There are six electrodes for recording the EEG. The signals from different electrodes are taken to test the accuracy of the classification with each of the above Kernels. There are 2500 samples of the signals recorded at 250Hz sampling frequency from each electrode for a subject carrying out any of the above tasks. These samples are passed on to the modular SVM structure for classification of the above mental tasks as shown in Fig.2. EEG samples

Yes/No

φ ( ⋅)

N

∑ ∑ S (m, n) m =1 n =1

Here,

nf s

the frequency axis is given as f n =

M

k =1

S ( fn ) =

(12)

the time axis is given as tm = mts , m = 0, 2, L N − 1

k

k =1

2N

Even number of samples ( N ) are chosen for the purpose

and

M

− j 2π nk

kn = min {2m, 2 N − 1 − 2m}

normalized frequency spectrum given as NS ( f ) =

k =− kn

)

) (

(

s ⎡ m + k ⎤ s* ⎡ m − k ⎤ e 2⎥ 2⎥ ⎢ ⎢

where, ⎡⎢l ⎥⎤ denotes the greatest integer less than or equal to l

2⎤ ⎡ 1 K ( s, s ' ) = exp ⎢ − NS ( f ) − NS ' ( f ) ⎥ (7) 2 2 ⎣ 2σ ⎦ where NS ( f ) is the

S( f )

kn

∑

The Discrete Short Time Fourier Transform N −1

−j

2π n( m − k ) N

k =0

where h ( ⋅) is the window function Hamming window has been chosen here The coefficients of this window can be computed from the following equation m ⎞ ⎛ h ( m + 1) = 0.54 − 0.46 cos ⎜ 2π , N − 1 ⎠⎟ ⎝ m = 0,1, 2, L , N − 1

Multiplying or Composing

Yes/No

Fig.2 The modular Support Vector Machine

m → time index n → the frequency index

S ( m, n ) = ∑ s ( m − k ) h ( m ) e

SVM Idle or Active

(11)

The recorded signals have been divided into two groups i.e. training and testing respectively. The SVM classifier is designed using the training set and tested with the test cases. It was found that the SVM exhibited very small error for the training set. For both STFT and Wigner-Ville TF kernels it was 100% accurate for the training set. The results for the test sets are shown in Table 1. Table 1 Test results for EEG signals corresponding to different mental tasks (Multiplication/idle) Gaussian Fourier STFT Wigner Kernel→ Accuracy→

50%

50%

80%

85%

It was found that at electrode 6(Fig.3) the result of the classification is most accurate. Table.1 enlists the accuracy of classification with the corresponding Kernels. Wigner-Ville Frequency Distribution type Kernel exhibits the best accuracy while classifying the mechanical tasks such as Multiplication and Idle.

It is also seen that the Wigner-Ville T-F Kernel exhibited an accuracy of 70% in any of the binary classifications between Multiplication/Rotation and Rotation/Idle cases. In each case electrode 6 gave out the most distinct profiles. For Multiplication/Idle binary class electrode 4 exhibited next in the least with an accuracy of 55% with STFT and 70% for WignerVille Kernels. VI. CONCLUSION In this paper the EEG signals have been classified using Support Vector Machines with Time-Frequency Kernels. It has been found that the Wigner-Ville type Time-Frequency Kernels demonstrated better accuracy as compared to the other types of Kernels such as STFT, Gaussian, Fourier etc. All the programs have been coded and executed in MATLAB platform. VII. REFERENCES [1] Vapnik V. The nature of statistical learning theory, New York. Springer-Verlag, 1995. [2] Thomas Navin Lal, Michael Schröder, Thilo Hinterberger, Jason Weston, Martin Bogdan, Niels Birbaumer, Bernhard Scholkopf, Niels Birbaumer, and Bernhard Schölkopf,’ ‘Support Vector Channel Selection in BCI’, IEEE Transactions on Biomedical Engineering, Vol. 51, No. 6, June 2004 pp.1003. [3] Guyon, A. Elisseeff, ‘An introduction to variable and feature selection,’J. Machine Learning Res. Vol. 3, pp. 1157–1182, 2003. [4] Wen-Yn Huang; Xue-Qi Shen; Qing Wu, ‘Classify the number of EEG current sources using support vector machines’, Proceedings of International Conference on Machine Learning and Cybernetics, 2002. Vol. 4 Nov. 2002 pp.1793- 1795. [5] Wen-Yn Huang; Xue-Qi Shen; Qing Wu, ‘Classify the number of EEG current sources using support vector machines’, Proceedings of International Conference on Machine Learning and Cybernetics, 2002. Vol. 4 Nov. 2002 pp.1793- 1795. [6] Garcia, G.N.; Ebrahimi, T.; Vesin, J.-M., ‘Support vector EEG classification in the Fourier and timefrequency correlation domains’, Proceedings of First

IEEE EMBS Conference on Neural Engineering, March 2003, pp. 591 – 594. [7] Garrett, D.; Peterson, D.A.; Anderson, C.W.; Thaut, M.H., ‘Comparison of linear, nonlinear, and feature selection methods for EEG signal classification’, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 11, Issue: 2 , June 2003 pages 141 – 144. [8] Lee, H., Choi, S., ‘PCA-based linear dynamical systems for multichannel EEG classification’, Proceedings of the 9th ICONIP '02. Vol. 2 , Nov. 2002, pp. 745 – 749. [9] Klaus-Robert Müller, Charles W. Anderson, and Gary E. Birch, ‘Linear And Nonlinear Methods For Brain–Computer Interfaces’, IEEE Trans. on Neural Systems and Rehabilitation Engineering, Vol. 11, No. 2, June 2003 pp.165. [10] Steve Gunn, Matlab Support Vector Machine Toolbox. available at http://www.isis.ecs.soton.ac.uk/resources/svminfo/ [11] Pfurtscheller, G., Neuper, C., Schlogl, A., Lugger, K., ‘Separability of EEG signals recorded during right and left motor imagery using adaptive autoregressive parameters’, IEEE Trans. on Neural Systems and Rehabilitation] , Volume: 6 , Issue: 3 , Sept. 1998 Pages:316 – 325. [12] Garrett, D., Peterson, D.A., Anderson, C.W., Thaut, M.H., ‘Comparison of linear, nonlinear, and feature selection methods for EEG signal classification’, IEEE Trans. on Neural Systems and Rehabilitation Engineering, Vol.11 , Issue: 2 , June 2003, pp.141 – 144 [13] http://sccn.ucsd.edu/eeglab [14] Arnaud Delorme, Scott Makeig, ‘EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis’, Journal of Neuroscience Methods 134 (2004), 9-21. [15] S. Haykins, ‘Neural Networks (2nd Ed.), Prentice Hall, 1999. [16] L. Cohen, ‘Time Frequency Analysis’, PrenticeHall, NJ, 1995.

Fig.3 Similarities and Differences at Electrode 6 for three different subjects