Frequency And Ordering Based Similarity Measure For Host Based Intrusion Detection Sanjay Rawat1,2 , V P Gulati2 , and Arun K Pujari1 1

AI Lab, Dept. of Computer and Information Sciences University of Hyderabad, Hyderabad-500046, INDIA [email protected] 2 IDRBT Castle Hills, Road No.1 Masab Tank, Hyderabad-500057, INDIA [email protected], [email protected]

Abstract. This paper discusses a new similarity measure for the anomalybased intrusion detection scheme using sequences of system calls. With the increasing frequency of new attacks, it is getting difficult to update the signatures database for misuse-based intrusion detection system (IDS). Though, anomaly-based IDS has a very important role to play, the high rate of false positives remains a cause for concern. Our work defines a similarity measure that considers the number of similar system calls, frequencies of system calls and ordering of system calls made by the processes to calculate the similarity between the processes. We propose the use of Kendall Tau distance to calculate the similarity in terms of ordering of system calls in the process. The k nearest neighbor (kNN) classifier is used to categorize a process as either normal or abnormal. The experimental results, which we perform on 1998 DARPA Data, are very promising and show that the proposed scheme results in a high detection rate and low rate of false positives. Keywords:Intrusion Detection, Anomaly, System Calls, kNN Classifier.

1

Introduction

Intrusions are attempts at compromising the confidentiality and integrity or bypassing the security mechanisms of a computer or network. Intrusion Detection is the process of monitoring the events in a computer or network and analyzing them for signs of Intrusions. With the rapid growth of attacks on computers, intrusion detection systems (IDS), which are software or hardware products that automate this monitoring and analysis process [2], have become a critical component of security architecture. According to Axelsson [1], an IDS consists of an Audit Collection/Storage Unit, Processing Unit and an Alarm/Response unit. While the Audit Collection/Storage Unit collects data that is to be analyzed for signs of intrusion, the Processing Unit analyses the data received using various techniques to zero in

on Intrusions. The Alarm/Response Unit triggers an alarm on detecting an intrusion and it may execute defensive action too. Based on the various ways of managing these units, different types of IDS are proposed in the literature. There are two types of IDS on the basis of Audit Collection, - network-based system [16], which collects data directly from the network that is being monitored, in the form of packets and the host-based system [2], which collects data from the host being protected. According to the Processing Unit, IDS can too be classified into two types - misuse-based systems and anomaly-based systems. While the first keeps the signatures for known attacks in the database and compares new instances with it to find attacks, the second learns the normal behavior of the monitored system and then looks out for any deviation in it for the signs of Intrusions. In the present work, we propose a novel intrusion detection process for a host-based system. This work draws inspiration from one of the recent proposals by Liao and Vemuri [14] on an anomaly-based intrusion detection system. We observe that any normal execution of a process follows a pattern and hence the normal behavior of a process can be profiled by a set of predictable sequences of system calls. Any deviation in this sequence of system calls is termed as Intrusion in the framework of anomaly-based IDS. The problem of intrusion detection thus boils down to detecting anomalous sequence of system calls, which are measurably different from the normal behavior. We propose a scheme in which we measure the similarity between processes using a similarity measure that considers three main factors - occurrence of the individual system call, frequency of this system call in the process, and the position of this call in the process. This proposal adequately considers frequency as well as the ordering of system calls to determine anomalous process. Adopting Liao’s method, we make use of k nearest neighbor scheme with our new similarity measure, thus gracefully extending the result of Liao and Vemuri [14], which does not consider the ordering of the system calls. We prove that the similarity matrix proposed by Liao et al may result in inaccurate conclusion for intrusion detection and our proposed similarity matrix overcomes this flaw. The major contributions of our work are the following. – We introduce a novel similarity measure for frequency and occurrence. – After determining the anomalies using these measures, we make use of ordering information to detect the intrusion based on a similarity function on sequences. We corroborate claims of a better IDS by experimental analysis. In the following section, we brief some of the relevant work to understand the different approaches. Section 2 gives a brief survey on anomaly-based schemes to understand the different approaches. We explain the scheme proposed by Liao and Vemuri [14] in section 4. Section 4 presents some background and definitions that are used in the construction of our proposed scheme. Section 5 describes the proposed scheme in detail. Experimental results have been shown in the section 6. We conclude our work in section 7.

2

Related Work

Anomaly-based IDS has the capability to identify new attacks, as any attack will differ from the normal activity. However, such systems have a very high rate of false positive [2]. Hence, a lot of research is being done in the area of anomaly-based intrusion detection [1]. The pioneering work in the field of anomaly detection by Denning [5] describes a model for detecting computer abuse by monitoring the system’s audit records. In this approach, profiles of subjects (users) are learnt and statistical methods (means and standard deviation) used to calculate deviations from the normal behavior. Lane et al [12] proposes another approach that captures users’ behavior. A database of sequences of UNIX commands, that a user issues normally, is maintained for each user. Any new command sequence is compared with this database using a similarity measurement. Though the scheme gives good results, it is rather difficult to profile all the users, especially of big organizations. Moreover, since the behavioral pattern of new users is not very stable, such models may give a high rate of false positives. Another approach, initiated by Forrest et al [8][9] [11], captures the normal behavior of processes as programs show a stable behavior over the period of time under normal execution. In this approach, short sequences of system calls are used to profile a process. A similar approach is followed by Lee et al [13] too, but they make use of a rule learner RIPPER, to form the rules for classification. Artificial Neural Networks have also been used for anomaly detection [10] due to their ability to learn behavior and generalize from this learning. In this approach, Ghosh et al use Leaky Bucket algorithm to capture the temporal locality. A new scheme based on the kNN classifier has been proposed by Liao and Vemuri [14][15], in which, each process is treated as a document and each system call as a word in that document. The process is converted into a vector and cosine similarity measurement is used to calculate the similarity among processes. The proposed scheme in this paper also follows a similar approach by using kNN classifier for the classification of process. The following section describes the scheme based on kNN-classifier [14]. We also identify some cases in which the Liao’s scheme produces wrong conclusions.

3

Scheme Based On k-NN Classifier

An approach based on kNN classifier is proposed by Liao and Vemuri [14] where the frequencies of system calls used by a program (process), instead of their temporal ordering, are used to define the program’s behavior. This paper presents an analogy between text categorization and intrusion detection, such that each system call is treated as a word and a set of system calls generated by the process as a document. The processes under normal execution (hereafter called as normal processes) are collected from the DARPA data [4] and thereafter converted into the vectors, consisting of the frequencies of the system calls made by them during the normal execution. The DARPA data provides processes in the form of BSM (Basic security Module) format, which is labeled as normal or abnormal. The complete method of collecting normal processes is described in the

section Experimental Results. From all the normal processes, a matrix A = [aij ] is formed, where aij denotes the frequency of ith system calls in the j th process. In order to categorize a new process P into either normal or abnormal class, the process P is first converted into a vector. The kNN classifier then compares it with all the processes Aj in A to determine the k nearest neighbors, by calculating the cosine similarity CS(P, Aj ),using the cosine formula given by equation 1. P · Aj (1) CS(P, Aj ) = k P k · k Aj k √ where k X k= X · X. The average similarity value of the k nearest neighbors is calculated and a threshold is set. When the average similarity value is above the threshold, process P is considered as normal, and if not, abnormal. Since the similarity measure, given by equation 1, considers only the frequencies of the system calls appearing in the processes, we observe the following two cases in which it may produce wrong results while calculating similarity. Consider the following two processes P1 and P2 . P1 = open close close close close access access access access and P2 = open ioctl mmap pipe access login su su audit audit The similarity measures of the new process P (given below) to each of P1 and P2 using equation 1 are given by P = open close ioctl mmap pipe pipe access access login chmod CS(P, P1 ) = 0.6048, CS(P, P2 ) = 0.5714 We observe that there are only three common system calls out of eight between P and P1 and six common system calls out of eight between P and P2 . Intuitively, P2 is more similar to P than P1 is, but the similarity measures indicate the contrary. This is due to the frequent occurrence of close and access in P1 , and absence of close in P2 . The above example makes it amply clear that while calculating the similarity score, there is no weight accorded to processes having more number of common system calls. We believe that while calculating the similarity score, if we include a factor that is dependent on the number of common calls, such results can be avoided. As noted earlier, during the normal execution, a process follows a sequence of system calls and any significant change in the order of appearance of system calls is considered as an intrusion. But the following example demonstrates that the scheme proposed by Liao and Vemuri [14] does not capture the deviation in the ordering of system calls while calculating the similarity score using equation 1. P1 = rename login open ioctl su chmod close

P2 = open close su ioctl chmod login rename P = open close su ioctl chmod pipe pipe CS(P, P1 )= 0.629 and CS(P, P2 )= 0.629 It may be observed that in P and P2 , till the fifth position, ordering of all the system calls is same while in P and P1 , except ioctl, all the system calls are at different positions. Thus, in this case, while calculating the similarity between processes, the ordering of system calls is not being taken into consideration, which is a very important factor, especially in the case of intrusion detection. In our scheme, we define a similarity measure that depends not only on frequencies of system calls but also on the number of similar system calls and ordering of system calls. In the following section, we define some preliminary results that are used in the construction of our similarity measurement.

4

Preliminary Results

Let S (say, Card(S) = m) be a set of system calls made by all the processes under normal execution. From all the normal processes a matrix A = [aij ] is formed, where aij denotes the frequency of ith system call in the j th process. We also form a matrix B = [bij ] where, bij = 1, if ith system calls is present in the j th process, and if not, it is 0. Thus each process P bj = 0, 1m can be represented as a column in B. For example Let S = {access audit chdir close creat exit fork ioctl}and two normal processes be P1 = access close ioctl access exit, and P2 = ioctl audit chdir chdir access. Then we have:

P 1 P2 Pb Pb   1 2 21 11 0 1 0 1 0 2 0 1     1 0    B = 1 0 A= 0 0 0 0     1 0 1 0     00 00 11 11



We now define similarity measures, which we use in our scheme to calculate the similarity between processes.

4.1

Binary Similarity Measure

We define a similarity score µ(P bi , P bj ) between any two processes P bi and P bj as follows: m X (P bi ∧ P bj )n µ(P bi , P bj ) =

n=1 m X

(2) (P bi ∨ P bj )n

n=1

It may be noticed that 0 ≤ µ ≤ 1 . The value of µ increases when there are more shared system calls between the two processes (due to the numerator) and value of µ decreases when the number of system calls, not shared by both the processes, is more than the shared ones (due to the denominator) in P bi and P bj . 4.2

Frequency Similarity Measure

Another similarity score, known as cosine similarity measure λ(Pi , Pj ) between the processes Pi and Pj , where Pi and Pj are obtained from A, is defined as follows: [3]: Pi · Pj λ(Pi , Pj ) = (3) k Pi k · k Pj k It may be noted that eqn. 3 represents the same similarity measure as used by Liao and Vemuri [14]. We define a new similarity measurement Sim(Pi , Pj ) as follows: Sim(Pi , Pj ) = µ(P bi , P bj ) · λ(Pi , Pj )

(4)

The motive behind multiplying µ and λ is that λ(Pi , Pj ) measures the similarity based on the frequency and µ(P bi , P bj ) is the weight associated with Pi and Pj . In other words, µ(P bi , P bj ) tunes the similarity score λ(Pi , Pj ) according to the number of similar and dissimilar system calls between the two processes. Therefore, the similarity measure Sim(Pi , Pj ) takes frequency and the number of shared system calls into consideration while calculating similarity between two processes. 4.3

Kendall Tau Distance

As mentioned earlier, the order of system calls occurring in a process is very important while classifying a process as normal or abnormal. We make use of the Kendall Tau Distance [6] to determine similarity in terms of ordering. Given a set S of system calls, a process (ordered set of system calls) t with respect to S is an ordering of a subset M of S, i.e. t = [x1 ≥ x2 ≥ . . . ≥ xd ], with each system call xi ∈ M , and x − i ≥ xj implies that in the process t, system call xi appears before the system call xj . If i ∈ S is present in t, then l(i) denotes the position

of i. The Kendall Tau Distance counts the number of pair-wise disagreements between two processes t and m as follows: K(t, m) =

|{i, j} : i < j, t(i) < t(j), m(i) > m(j) or i > j, t(i) > t(j), m(i) < m(j)| ¡|M |¢ 2

(5) In order to calculate the Kendall Tau distance between two processes Pi and Pj , first of all Pi is converted into Pi0 which contains the first occurrence of each system call from Pi . It can be observed that in practice, it is difficult to find the set M such that each process has system calls only from the set M . We, therefore, modify equation 5 to calculate Kendall distance as follows: K(Pi , Pj ) =

|{r, s} : Condition(W ) or Condition(W 0 )| |Pi0 ||Pi0 |

(6)

where Condition(W ) = r < s, Pi0 (r) < Pi0 (s), Pj0 (r) > Pj0 (s) and Condition(W 0 ) = r > s, Pi0 (r) > Pi0 (s), Pj0 (r) < Pj0 (s). All the (r, s) pairs, for comparison, are obtained from the set of system calls S. In the next section, we present the proposed scheme using the results derived above.

5

Proposed Scheme

As discussed above, the matrices A = [aij ] and B = [bij ] are constructed using normal processes and set S. For every new process P , if it contains a system call that is not in S, it is classified as abnormal and if not, it is first converted into the vector. Binary equivalent of this vector P b is also calculated. The similarity score λ(P, Pj ) is calculated for every normal vector Pj by using equation 3, if λ(P, Pj ) = 1, P is classified as normal. Otherwise, using equations 2 and 4, the values of µ(P, Pj ) and Sim(P, Pj ) are calculated. Values of Sim(P, Pj ) are sorted in descending order and the k nearest neighbors (first k highest values) are chosen. We calculate the average value (Avg Sim) of the k nearest neighbors. For each of the k nearest neighbors, we calculate the Kendall Tau distance K(P, Pj ) using equation 6, and then calculate the average value (Avg Dist) for these k Kendall distances. The kNN classifier categorizes the new process P as either normal or abnormal according to the rule given below. If Avg Sim > Sim Threshold and Avg Dist < Dist Threshold, classify P as normal, otherwise P is abnormal, where Sim Threshold and Dist Threshold are predefined threshold values for similarity measurement and Kendall Distance respectively. The pseudo code for the proposed scheme is provided in Figure 1.

6

Experimental Results

We use BSM audit logs from the 1998 DARPA data [4] for the training and testing of our algorithm. After analyzing the whole training data, we extract

Given a set of processes and system calls S, form the matrices A = [aij ] and B = [bij ] for each process P in the test data do if P has some system calls which does not belong to S then P is abnormal; exit. else then for each process Aj in the training data A do calculate Sim(P, Aj ); if Sim(P, Aj ) equals 1.0 then P is normal; exit. end do find first k highest values of Sim(P, Aj ); calculate Avg Sim for k nearest neighbors so obtained; for each k nearest neighbor, calculate Kendall Distance; calculate Avg Dist for k nearest neighbors; if Avg Sim is greater than Sim Threshold and Avg Dist is less than Dist Threshold then P is normal; else then P is abnormal; end do

Fig. 1. Pseudo code of the proposed scheme

the 50 unique system calls that appear in the training data. All the 50 system calls are shown in table 1. For each day of data, a separate BSM file is provided with the ’BSM List File’. Each line of this file contains the information about one session like time, service, source IP and destination IP. A ’0’ at the end of the line shows that the session is normal and the presence of a ’1’ at the end of the line declares the session intrusive. Any process associated with the normal session is considered as normal process. All the intrusive sessions are labeled with the name of the attacks launched during the sessions. We make use of the BSM commands auditreduce and praudit and couples of scripts to extract the data that can be used in our algorithm.

access, audit, auditon, chdir, chmod, chown, close, creat, execve, exit, fchdir, fchown, fcntl, fork, fork1, getaudit, getmsg, ioctl, kill, link, login, logout, lstat, memcntl, mkdir, mmap, munmap, nice, open, pathconf, pipe, putmsg, readlink, rename, rmdir, setaudit, setegid, seteuid, setgid, setgroups, setpgrp, setrlimit, setuid, stat, statvfs, su, sysinfo, unlink, utime, vfork

Table 1. List of 50 unique system calls

On analyzing the whole BSM logs (list files) carefully, we locate the five days which are free of any type of attacks - Tuesday of the third week, Thursday of the fifth week and Monday, Tuesday and Wednesday of the seventh week. We choose the first four days for our training data and the fifth one for the testing of the normal data to determine the false positive rate. There is a total of 1937 normal sessions reported in the four days data. We carefully extract the processes occurring during these days and our training data set consists of 769 unique processes. There are 412 normal sessions on the fifth day and we extract 4443 normal processes from these sessions. We use these 4443 normal processes for the testing data. In order to test the detection capability of our method, we incorporate 40 intrusive sessions into our testing data. These sessions consist of all most all types of attacks launched on the victim Solaris machine (in the simulated DARPA setup) during seven weeks of data and two weeks of testing data and that can be detected using BSM logs. An intrusive session is said to be detected if any of the processes associated with this session is classified as abnormal. Thus detection rate is defined as the number of intrusive sessions detected, divided by the total number of intrusive sessions. We perform the experiments with the different values of k- 5, 10 and 15. Tables 2 and 3 show the results for k= 5 and 10 respectively. We have not shown the results for k=15 as we do not find any significant difference with the case k=10. Threshold 0.90, 0.80, 0.79, 0.78, 0.75, 0.70, 0.63, 0.60,

0.15 0.15 0.15 0.20 0.25 0.25 0.25 0.25

False Rate 0.092 0.042 0.040 0.039 0.019 0.007 0.005 0.002

Positive Detection Rate 1.00 0.90 0.90 0.87 0.85 0.85 0.80 0.75

Table 2. False Positive Rate vs Detection Rate for k= 5

The first column in the table provides the threshold values for the experiments. The choice of threshold values is on trial and error basis. We calculate the similarity values of few normal processes first and then, according to the observed values of similarity, we fix a range for threshold values. After that different values within that range are chosen to calculate the results. Entries in column two are the rate of false positive, which is equal to the number of normal processes detected as abnormal divided by the total number of normal processes. Column three details the detection rate as defined above. We also experiment with the Liao’s scheme because in our experimental setup, we choose different number of training processes and testing processes from those reported in [14]. This gives the comparison a valid meaning. The results for Liao’s scheme are shown in the table 4. Though our scheme performs better than Liao’s scheme for both the case k=5 and k=10, we show the com-

Threshold 0.80, 0.80, 0.79, 0.78, 0.75, 0.70, 0.68, 0.63, 0.60, 0.58, 0.56, 0.50,

0.06 0.20 0.20 0.20 0.25 0.25 0.25 0.25 0.25 0.25 0.35 0.35

False Rate 0.147 0.089 0.087 0.065 0.061 0.040 0.039 0.019 0.015 0.010 0.006 0.003

Positive Detection Rate 1.00 0.90 0.90 0.87 0.87 0.85 0.82 0.80 0.72 0.70 0.70 0.68

Table 3. False Positive Rate vs Detection Rate for k= 10

Threshold 0.992 0.990 0.985 0.980 0.970 0.950 0.930 0.900 0.780

False Rate 0.211 0.181 0.129 0.058 0.054 0.035 0.008 0.007 0.002

Positive Detection Rate 1.00 0.95 0.95 0.92 0.83 0.73 0.70 0.68 0.68

Table 4. False Positive Rate vs Detection Rate for Liao and Vemuri scheme

parison only between the ROC curves for k=5 and Liao’s scheme in figure 2. This ROC curve is a graph between detection rate and false positive rate. For each threshold value we get the detection rate and false positive rate. The proposed scheme reaches at 100% detection rate with false positive rate is as low as 9% (Table II) whereas Liao’s scheme reaches 100% detection rate at 21% false positive (Table III). It can be seen in the figure 3 that detection rate in both of the cases k=5 and k=10 is high, at k=5 we get a very low false positive rate as compared to k=10. This can be due to the reason that in the data, there is a high variation among the normal processes. Therefore aggregation over k=10 suppresses this variation and kNN classifier may produce wrong results. In our setup, all the experiments have been performed on a PC, running Windows 2000 Professional with Intel-III and 256 MB RAM. As our similarity measurement is more complex than the one used in Liao’s scheme (Liao, 2002a), total time taken by our algorithm in classifying a process as normal or abnormal is higher than Liao’ scheme. In order to classify 4443 processes, our scheme takes around 820 seconds, whereas it is around 235 seconds in the case of Liao’s scheme.

1

0.95

Detection Rate

0.9

Prososed scheme Liao’s scheme

0.85

0.8

0.75

0.7

0.65

0

0.05

0.1 0.15 False Positive Rate

0.2

0.25

Fig. 2. ROC curves for the proposed Scheme and Liao’s Schemes

1

0.95

0.9

Detection Rate

k=5 k=10 0.85

0.8

0.75

0.7

0.65

0

0.02

0.04

0.06

0.08 0.1 False Positive Rate

0.12

0.14

0.16

Fig. 3. ROC curves for the proposed Scheme at k=5 and k=10

7

Conclusions and Future Work

All Anomaly-Based Intrusion Detection Systems work on the assumption that normal activities differ from the abnormal activities (intrusions) substantially. In case of IDS models that learn programs-behavior, these differences may come in the frequency of system calls or the ordering of system calls used by the processes under normal and abnormal execution. Our scheme considers all these factors while classifying a new process as normal or abnormal. The use of similarity score on the binary forms of the processes gives weight to the processes that contain more similar system calls. By using Kendall Tau Distance, two processes can be compared on the basis of the ordering of the system calls present in the processes. By combining all these techniques in our scheme, we arrive at very promising results. In the present study, we use the BSM audit logs from 1998 DARPA Data Set. In future, we intend to test our scheme on real online data. Presently, all the system calls are treated equally, but there may be some system calls, the presence of which in the process is suspicious. We are trying to identify such patterns for a better Intrusion Detection System.

References 1. Axelsson S.: Research in intrusion detection systems: A survey. Technical Report No. 98-17, Dept. of Computer Engineering, Chalmers University of Technology, Gteborg, Sweden. (1999) 2. Bace R., Mell P.: NIST special publication on intrusion detection system. SP80031, NIST, Gaithersburg, MD. (2001) 3. Chan Z., Zhu B.: Some Formal Analysis of the Rocchio’s Similarity-based Relevance Feedback Algorithm. Technical Report CS-00-22, Dept. of Computer Science, University of Texas-Pan American, Edinburg, TX. (2000) 4. DARPA 1998 Data, MIT Lincoln Laboratory, http://www.ll.mit.edu/IST/ideval/data/data index.html 5. Denning D. E.: An Intrusion-Detection Model. In:Proceedings of the 1986 IEEE Symposium on Security and Privacy (SSP ’86). IEEE Computer Society Press. (1990) 118-133 6. Dwork C., Kumar R., Naor M., Sivakumar D.: Rank Aggregation Methods for the Web. In: Proceedings of the tenth International World Wide Web Conference-2001. (2001) 613-622 7. Eskin E., Arnold A., Prerau M., Portnoy L., Stolfo S.: A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data. In: Barbara D, Jajodia S (eds.) Applications of Data Mining in Computer Security. Kluwer academics Publishers. (2002) 77-102 8. Forrest S., Hofmeyr S. A., Somayaji A., Longstaff T. A.: A Sense of Self for Unix Processes. In: Proceedings of the 1996 IEEE Symposium on Research in Security and Privacy. Los Alamitos, CA. IEEE Computer Society Press. (1996) 120-128 9. Forrest S., Hofmeyr S. A., Somayaji A.: Computer Immunology. Communications of the ACM. 40(10) (1997) 88-96

10. Ghosh A. K., Schwartzbard A.: A Study in Using Neural Networks for Anomaly and Misuse Detection. In: Proceedings of the 8th USENIX security Symposium. Aug. 23-26, Washington D C USA. (1999) 141-151 11. Hofmeyr S. A., Forrest. A., Somayaji A.: Intrusion Detection Using Sequences of System Calls. Journal of Computer Security. 6 (1998) 151-180 12. Lane T., Brodly C. E.: An Application of Machine Learning to Anomaly Detection. In: Proceeding of the 20th National Information System Security Conference. Baltimore, MD. (1997) 366-377 13. Lee W., Stolfo S., Chan P.: Learning Patterns from Unix Process Execution Traces for Intrusion Detection. In: Proceedings of the AAAI97 workshop on AI methods in Fraud and risk management. AAAI Press. (1997) 50-56 14. Liao Y., Vemuri V. R.: Use of K-Nearest Neighbor Classifier for Intrusion Detection. Computers & Security. 21(5)(2002a) 439-448 15. Liao Y., Vemuri V. R. Using Text Categorization Techniques for Intrusion Detection. In: Proceedings USENIX Security 2002. San Francisco (2002b) 51-59 16. Mukherjee B., Heberlein L. T., Levitt K. N.: Network Intrusion Detection. IEEE Network. 8(3) (1994) 26-41

Frequency And Ordering Based Similarity Measure For ...

the first keeps the signatures for known attacks in the database and compares .... P= open close ioctl mmap pipe pipe access access login chmod. CS(P, P1) ... Let S (say, Card(S) = m) be a set of system calls made by all the processes.

174KB Sizes 2 Downloads 260 Views

Recommend Documents

Model-Based Similarity Measure in TimeCloud
Our experimental results suggest that the approach proposed is efficient and scalable. Keywords: similar measure, time-series, cloud computing. 1 Introduction.

Refinement-based Similarity Measure over DL ...
2 Computer Science Department. Drexel University ..... queries that can be represented, it also simplifies to a large degree the similarity assessment process that ...

Model-Based Similarity Measure in TimeCloud
trajectories from moving objects, and scientific data. Despite the ... definitions, and establishes the theoretical foundations for the kNN query process presented ...

A vector similarity measure for linguistic approximation: Interval type-2 ...
interval type-2 fuzzy sets (IT2 FSs), the CWW engine's output can also be an IT2 FS, eA, which .... similarity, inclusion, proximity, and the degree of matching.''.

Cross-Lingual Semantic Similarity Measure for ...
users. So the quality of the translations may vary from a user to another. ... WIKI and EuroNews corpora were collected and aligned at article level in [19]. WIKI is collected from. Wikipedia website8 and EuroNews is collected from EuroNews ..... of

Novel Similarity Measure for Comparing Spectra
20. 30. 40. 50 false positive true negative true positive false negative. 0.0. 0.2. 0.4. 0.6. 0.8. 1.0. S. Num ber of c as es. S false positive true negative true positive.

A vector similarity measure for linguistic approximation
... Institute, Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, ... Available online at www.sciencedirect.com.

Learning Similarity Measure for Multi-Modal 3D Image ...
The most popular approach in multi-modal image regis- ... taken within a generative framework [5, 4, 8, 13]. ..... //www.insight-journal.org/rire/view_results.php ...

Condition-based spares ordering for critical components
Jan 18, 2011 - instantaneous, both for the complete unit and for the repair kit. .... Note that in general, for this class of items (very expensive items) the cost of ... The decision to order is thus automatic if the decision rule in Eq. (14) holds.

Condition-based spares ordering for critical components
Jan 18, 2011 - Age (hr). Conditional reliability at inspection epoch t1 = 0 hr, z(t1) = 0 ... A control-limit policy and software for condition-based maintenance ...

Similarity-Based Theoretical Foundation for Sparse ...
Similarity-Based Theoretical Foundation for Sparse Parzen Window. Prediction. Maria-Florina Balcan [email protected]. Avrim Blum [email protected]. Computer Science Department, Carnegie Mellon University ... doing so is by minimizing a loss (here the

SPEC Hashing: Similarity Preserving algorithm for Entropy-based ...
This paper presents a novel and fast algorithm for learning binary hash ..... the hypothesis space of decision stumps, which we'll call. H, is bounded. .... One way to optimize the search .... Conference on Computer Vision, 2003. [11] A. Torralba ...

Mutual Information Based Extrinsic Similarity for ...
studies. The use of extrinsic measures and their advantages have been previously stud- ied for various data mining problems [5,6]. Das et al. [5] proposed using extrin- sic measures on market basket data in order to derive similarity between two prod

Perceptual Similarity based Robust Low-Complexity Video ...
block means and therefore has extremely low complexity in both the ..... [10] A. Sarkar et al., “Efficient and robust detection of duplicate videos in a.

Query Expansion Based-on Similarity of Terms for ...
expansion methods and three term-dropping strategies. His results show that .... An iterative approach is used to determine the best EM distance to describe the rel- evance between .... Cross-lingual Filtering Systems Evaluation Campaign.

Query Expansion Based-on Similarity of Terms for Improving Arabic ...
same meaning of the sentence. An example that .... clude: Duplicate white spaces removal, excessive tatweel (or Arabic letter Kashida) removal, HTML tags ...

Comparison of Similarity Metrics for Thumbnail Based ...
pressed domain so as to cater to the constraints imposed ... tion: Euclidean distance is not always the best metric. The ... but also on good similarity measures.

A Content-based Similarity Search for Monophonic ...
Nov 10, 2008 - by a feature vector contain statistical information about the notes and ..... [6] Burgess, C.J.C.: A tutorial on support vector machines for pattern ...

Similarity-Based Perceptual Reasoning for Perceptual ...
Dongrui Wu, Student Member, IEEE, and Jerry M. Mendel, Life Fellow, IEEE. Abstract—Perceptual reasoning (PR) is ... systems — fuzzy logic systems — because in a fuzzy logic system the output is almost always a ...... in information/intelligent