A Framework for Real Time Detection of ... - IJRIT

Viewer
Transcript

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

International Journal of Research in Information Technology (IJRIT)

www.ijrit.com

ISSN 2001-5569

A Framework for Real Time Detection of Malicious Content Distribution in Twitter K. Mohan Kumar1, S.Saritha2, B.Srikanth3 1

Student, Department of CSE & JNTUK, Gandhiji Institute of Science & Technology, Bhimavaram (V), Near Jaggayyapet, Krishna Dist. - 521 178, Andhra Pradesh, India [email protected]

2

HOD, Department of CSE & JNTUK, Gandhiji Institute of Science & Technology, Bhimavaram (V), Near Jaggayyapet, Krishna Dist. - 521 178, Andhra Pradesh, India [email protected] 3

Department of CSE & JNTUK, Gandhiji Institute of Science & Technology, Bhimavaram (V), Near Jaggayyapet, Krishna Dist. - 521 178, Andhra Pradesh, India [email protected]

______________________________________________ ABSTRCT Virtual communities exist over Internet and the trend is more with social networking sites. Twitter is one of the widely used social networking sites. Adversaries can use it for various ill-intended activities such as malware distribution, phishing, and spam. The mechanisms in Twitter have detection schemes with certain features. Conventional techniques for suspicious URLs that come part of malicious content distribution make use of futures like dynamic behavior, URL redirection, and HTML content. Recently Lee et al. proposed a scheme for detecting suspicious content distribution through Twitter streams. They used methodology that collects tweets from public timeline and build a classifier to detect malicious ones. They also built a tool to do this activity. In this paper we explore the concepts of detecting suspicious content over Twitter streams. We built a prototype application that demonstrates the proof of concept. The empirical results are encouraging. INDEX TERMS --Suspicious content detection, classification, Twitter, conditional redirection

1. Introduction Twitter is one of the famous OSN sites that provide a platform for virtual communities. When a user makes a tweet, it goes to all followers. This way it is a tool for mass communication. This feature of OSN is exploited by hackers who will make use of the service for sharing malicious content. The malicious content might be spam, phishing attack or malicious URL that will cause harm to the user who clicks it. There are many tweet detection schemes in Twitter that can prevent malicious attacks. However, the conventional detection schemes are not able to protect the users from malicious content as it was proved earlier. The most common attacks through OSN include malware distribution attacks, scam, phishing, spam and other attacks. The attackers launch such attacks for monetary and other gains. These attacks are to be prevented in order to ensure that the communications of communities over specific OSN are secure. There were many solutions that can try to detect malicious URLs. They include Wepawet [1], HoneyMonkey [2], and Capture-HPC [3]. In web applications the usage of JavaScript, Flash script also cause

K. Mohan Kumar, IJRIT- 179

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

malicious attack possibilities as they are vulnerable to attacks. Recently in [4] a mechanism was proposed for detecting suspicious URLs. Correlations of URL chains concept was explored instead of studying each URL. Frequent shared URLs concept was explored for malicious content detection. There are many features used to detect suspicious URLs. Architecture was proposed as shown in Figure 2. In this paper we built a prototype application that demonstrates the practical aspects of the approach presented in [4]. We built a web based interface to demonstrate the proof of concept. Our empirical study revealed that the application is able to show the difference between benign content from malicious content. Many experiments are made in presence of malicious attacks and the results reveal that the malicious content is having statistics higher than the benign ones. The remainder of the paper is structured as follows. Section II presents review of literature. Section III provides proposed system. Section IV provides prototype application. Section IV provides empirical results while section VI concludes the paper.

2. Related Work Many techniques were found in the literature for detecting spam [5], [6]. The other solutions include honey profiles [7], [8] for confusing users, blacklisting URLs for security reasons [9], [10] and tools for reporting spam [11]. Suspicious URL detection techniques are explored in [12], [13] and [14]. Dynamic detection systems that make use of virtual machines were explored by many researchers as explored in [1], [2], [3], and [15]. In order to detect drive by downloads, ARROW was proposed in [16] which also make use of correlated URLs. HTTP traces are also used for completing experiments. Between malware binaries and malicious landing pages, HTTP traces act as direct chains that can be used to solve the problem of detecting malicious URLs. These problems occur in information sharing services [17]. Google released a technical report recently that reveals the fact that Google is thinking to enhance the present malware detection schemes with up to data schemes [18]. In this paper we built a simulation mechanism in which the prototype application demonstrates the proof of concept with empirical studies.

3. Proposed Solution We built a web based application that demonstrates the concept of real time detection of suspicious content over Twitter streams. We used the mechanisms proposed in [42] for malicious content detection in Twitter tweets. The proposed system has functionalities as presented in Figure 1 which is the use case model of the system.

view followers tweets follow and unfollow the users

view followers login

view users follow users Register

normal browsers urls

Followers

blocked urls

crawlers browsers urls

domain wise urls

detect attacker urls worning bird

redirect chain urls

data colletion urls block user accounts who posted blocked urls

User

view blocked url posted users View user tweets

tweets

delete user tweets posts tweet Following user view all tweets users follow users mytweets delete tweets

viewtweets

K. Mohan Kumar, IJRIT- 180

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

Figure 1 – Use case model of the system The architecture of the system is as presented in Figure 2. It has many layers such as data collection, feature extraction, and training. After training phase, it performs classification in order to detect suspicious URLs.

Figure 2 – Architectural overview of the system [4] As can be seen in Figure 2, it is evident that data collection layer has a Tweet Queue into which tweets are pushed from time to time. From the twitter stream tweets containing URLs are collected and URL redirections are crawled in order to identify the tweets to be studied and they are pushed into the queue. The feature extraction layer takes the tweets from queue and finds entry point of URLs. Then feature vectors are constructed. The training phase takes feature vectors as input and builds a classifier. The classifier is capable of identifying malicious URLs. In the proposed application, from user point of view, Figure 2 shows the flow.

Figure 3 – Flow of data with respect to processes As can be seen in Figure 3, it is evident that the flow diagram illustrates the sequence of events before finally getting the malicious URLs detected.

K. Mohan Kumar, IJRIT- 181

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

4. Protype Application We built a prototype application that demonstrates the proof of concept pertaining to malicious URL detection in Twitter tweets. The environment used to build the application is a PC with 4 GB RAM, core 2 dual processor running Windows 7 operating system. The application is built using Servlets and JSP and JDBC. The prototype is able to demonstrate the tweets concept and the identification of malicious URLs.

Figure 4 – UI with tweets As can be seen in Figure 4, it is evident that the proposed application facilitates tweeting. The tweets obtained on the user wall are shown. Such tweets can be deleted if they are not required. However, the aim of the system is to detect malicious tweets that have suspicious URLs.

Figure 5 – UI showing following users in social network

K. Mohan Kumar, IJRIT- 182

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

As can be seen in Figure 5, it is evident that the user can view following users and choose unfollow in order to perform the respective task. However, the tweets posted with URLs that might have suspicious content are the study here.

Figure 6 – UI with detection functionality As can be seen in Figure 6, the application is able to detect tweets that containing URLs besides having malicious nature. The classifier as explored in the previous section classifies the tweets as such so that the system can detect them with ease.

5. Experimental Results Experiments are made in terms of malicious and benign behaviors demonstrated. The results reveal the dynamics between month and length, month and frequency, month and number of init URLs, account creation date, month and follower-friend, and month and number of source applications. The average values of all these features and their variations are presented in the following graphs.

4 3.5 L 3 e 2.5 n 2 g 1.5 t 1 h 0.5 0

malicious benign

May June July

Aug

Sep

Oct

Nov

Month Figure 7 -Variations of average values of length

K. Mohan Kumar, IJRIT- 183

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

As can be seen in Figure 7, it is evident that the benign and malicious simulations are presented. The horizontal axis represents month while the vertical axis represents length. The length feature of malicious content is more.

1.2 F r e q u e n c y

1 0.8 0.6 malicious

0.4

benign

0.2 0 May

June

July

Aug

Sep

Oct

Nov

Month

Figure 8 -Variations of average values of frequency As can be seen in Figure 8, it is evident that the benign and malicious simulations are presented. The horizontal axis represents month while the vertical axis represents frequency. The frequency feature of malicious content is more.

# i n i t U R L S

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

malicious benign

May

June

July

Aug

Sep

Oct

Nov

Month

Figure 9 -Variations of average values of # init URLs As can be seen in Figure 9, it is evident that the benign and malicious simulations are presented. The horizontal axis represents month while the vertical axis represents # init URLs. The # init URLs feature of malicious content is more.

K. Mohan Kumar, IJRIT- 184

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

16 14 A c c o u n t

c r e a t i o n

12 10 d a t e

8

malicious

6 benign

4 2 0 May June July Aug Sep Oct Nov Month

Figure 10 -Variations of average values of account creation date As can be seen in Figure 10, it is evident that the benign and malicious simulations are presented. The horizontal axis represents month while the vertical axis represents account creation date. The account creation date feature of malicious content is more. 0.2 F 0.15 F o r 0.1 l r i l e -0.05 o n w 0 d e

malicious benign

May June July Aug Sep Oct Nov Month

Figure 11 -Variations of average values of follower-friend As can be seen in Figure 11, it is evident that the benign and malicious simulations are presented. The horizontal axis represents month while the vertical axis represents follower-friend. The follower-friend feature of malicious content is more.

# S o u r c e

0.7 0.6 0.5 A0.4 p0.3 p0.2 0.1 0

malicious benign

May June July Aug Sep Oct Nov Month Figure 12 -Variations of average values of # source app

K. Mohan Kumar, IJRIT- 185

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

As can be seen in Figure 12, it is evident that the benign and malicious simulations are presented. The horizontal axis represents month while the vertical axis represents # source app. The #source app feature of malicious content is more.

6. Conclusion and Future Work In this paper we studied the problem of detecting suspicious URLs in OSN user walls. Since Twitter is one of the famous OSN, our study was focused on this. The conventional detection mechanisms are inadequate to detect malicious content over Twitter tweets. The operations like URL redaction, conditional redirection can be improved further as explored in [4]. In this paper we implement those concepts and mechanism in a user-friendly fashion. We built a prototype application that demonstrates the proof of concept. We followed architecture with multiple layers where each later takes care of certain functionality. Finally the tweets with malicious content are classified and the results reveal that the prototype application is useful. The empirical results reveal the different between malicious and benign content as classified by the experiments. In future we will improve the system with more robust classifiers that exploit additional rules or policies.

REFERENCES [1] M. Cova, C. Kruegel, and G. Vigna, “Detection and analysis of drive-by-download attacks and malicious JavaScript code,” in Proc. WWW, 2010. [2] Y.-M. Wang, D. Beck, X. Jiang, R. Roussev, C. Verbowski, S. Chen, and S. King, “Automated web patrol with Strider HoneyMonkeys: Finding web sites that exploit browser vulnerabilities,” in Proc. NDSS, 2006. [3] Capture-HPC, https://projects.honeynet.org/capture-hpc. [4] Sangho Lee and Jong Kim, WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream, IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. X, NO. Y, JANUARY 2012, p1-14. [5] F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida, “Detecting spammers on Twitter,” in Proc. CEAS, 2010. [6] A. Wang, “Don’t follow me: Spam detecting in Twitter,” in Proc. SECRYPT, 2010. [7] K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers: Social honeypots + machine learning,” in Proc. ACM SIGIR, 2010. [8] G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proc. ACSAC, 2010. [9] C. Yang, R. Harkreader, and G. Gu, “Die free or live hard? empirical evaluation and new design for fighting evolving Twitter spammers,” in Proc. RAID, 2011. [10] C. Grier, K. Thomas, V. Paxson, and M. Zhang, “@spam: The underground on 140 characters or less,” in Proc. ACM CCS, 2010. [11] J. Song, S. Lee, and J. Kim, “Spam filtering in Twitter using sender receiver relationship,” in Proc. RAID, 2011. [12] ——, “Identifying suspicious URLs: An application of large-scale online learning,” in Proc. ICML, 2009. [13] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond blacklists: Learning to detect malicious web sites from suspicious URLs,” in Proc. ACM KDD, 2009. [14] D. K. McGrath and M. Gupta, “Behind phishing: An examination of phisher modi operandi,” in Proc. USENIX LEET, 2008. [15] C. Whittaker, B. Ryner, and M. Nazif, “Large-scale automatic classification of phising pages,” in Proc. NDSS, 2010. [16] J. Zhang, C. Seifert, J. W. Stokes, and W. Lee, “ARROW: Generating signatures to detect drive-by downloads,” in Proc. WWW, 2011. [17] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or a news media?” in Proc. WWW, 2010. [18] M. A. Rajab, L. Ballard, N. Jagpal, P. Mavrommatis, D. Nojiri, N. Provos, and L. Schmidt, “Trends in circumventing webmalware detection,” Google, Tech. Rep., 2011.

K. Mohan Kumar, IJRIT- 186

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 1, January 2015, Pg. 179-187

AUTHORS:

Moahn Kumar.K is student of GANDHIJI INSTITUTE OF SCIENCE AND TECHNOLOGY, Jaggayyapet, AP, INDIA. He has received B.Tech Degree Computer Science and Eengineering and M.Tech Degree in Computer Science and Engineering. His main research interest includes Cloud Computing, Databases and DWH.

S.Saritha is working as a HOD of Computer Science and Engineering department in GANDHIJI INSTITUTE OF SCIENCE AND TECHNOLOGY, Jaggayyapet, AP, INDIA. She has received B.Tech Degree Computer Science and Engineering, M.Tech Degree in Computer Science and Engineering. Her main research interest includes Cloud Computing and DWH.

B.Srikanth is working as a Associate Professor in GANDHIJI INSTITUTE OF SCIENCE AND TECHNOLOGY, Jaggayyapet, AP, INDIA. He has received B.Tech Degree Computer Science and Engineering, M.Tech Degree in Computer Science and Engineering. His main research interests includes Cloud Computing and Networking.

K. Mohan Kumar, IJRIT- 187

A Framework for Real Time Detection of ... - IJRIT

Real-Time Detection of Malware Downloads via - UGA Institute for ...

REAL-TIME DETECTION OF MOVING OBJECTS IN A ...

Sigma-F Protocol Access Control for Real Time System - IJRIT

Real-time Transmission of Layered MDC Video over Relay ... - IJRIT

Implementing a Real-Time Seizure Detection ...

A model-based framework for the detection of ...

An intelligent real-time vision system for surface defect detection ...

Real-time Detection of Anomalous Taxi Trajectories ...

Real-Time Detection, Tracking, and Monitoring of ...

Various possibilities of Clone Detection in Software's: A Review - IJRIT

Detection of Sudden Pedestrian Crossings for Driving ... - IJRIT

Various possibilities of Clone Detection in Software's: A Review - IJRIT

Detection of Sudden Pedestrian Crossings for Driving ... - IJRIT

A Framework for Malware Detection Using Ensemble Clustering and ...

A framework for visual-context-aware object detection ...

Time-Suboptimal Real Time Path Planner for a ...