Combining Visualization and Interaction for Scalable ...

Viewer
Transcript

Combining Visualization and Interaction for Scalable Detection of Anomalies in Network Data *Robert F. Erbacher Department of Computer Science, UMC 4205 Utah State University Logan, UT, 84322 Phone: 435-797-3291 Fax: 435-797-3265 E-Mail: [email protected] Karen A. Forcht Department of Business Education North Carolina A&T State University Merrick Hall Room 324 1601 East Market Street Greensboro, NC 27411 Phone: 336-334-7657 ext. 7025 Fax: 336-256-2276 Email: [email protected]

Combining Visualization and Interaction for Scalable Detection of Anomalies in Network Data Abstract— This paper examines the application of visualization to identify and analyze sophisticated network attacks. Given the size and chaotic nature of this type of data needing to be analyzed in order to identify such attacks, novel integrations of visualization and interaction are required. Essentially, the design of the visualization technique had to be performed hand in hand with interaction techniques to ensure that should clusters of activity be identified and need analysis then the user would be able to interact with those clusters. This differs from most visualization work which does not allow for such direct manipulation and thus greatly limits the usability of many techniques for this type of data. This paper discusses the developed visualization techniques and present real world data examples in which both injected and actual attacks are identified. This identification required the examination and removal from consideration, activity deemed to be innocuous. Keywords: Visualization, Network Data, Visual Analytics, Computer Security, Sophisticated Attacks

I. INTRODUCTION Intrinsically, identifying unexpected events (anomalies) within a large database requires the integration of effective visualization and interaction techniques. Typically, researchers have focused on designing visualization techniques to be as effective as possible for a given dataset or range of datasets. It is after the design of the visualization techniques that interaction techniques will be designed and integrated in order to make more effective use of the visualization techniques. This is most commonly done through a detail view or probing. This typical process fails with modern datasets which may be terabytes or petabytes in size.

Such datasets will not only include large numbers of data elements but large numbers of parameters needing representation. With this scale of data there are two fundamental issues: • Visualization techniques simply cannot represent even a reasonable number of data elements compared to the size of the data. Research into clustering, scalability, large displays, etc. is geared towards reducing this bottleneck. Data collection, however, is growing at a far faster rate than representation and analysis capability. • Direct manipulation of data elements with this volume of data becomes difficult with visualization techniques designed independent of interaction. Selecting, probing, or filtering data elements can be tedious. The goal of this work has focused on developing integrated visualization and interaction techniques from the ground up to be effective with large-scale databases. While the target of this work has been on the detection of sophisticated network scans, the approach and specific techniques are subject to generalization. A. Process Model Example Consider, for instance, the need to monitor the network data for attacks at a primary router of an organization. Data on such a primary router can arrive at up to 1Gb/s or higher with current hardware and a single interface. Obviously the goal is to analyze the data and identify attacks, especially sophisticated attacks, within the arriving data. This process would entail analyzing available data, removing innocuous data from consideration, and identifying the details of identified attacks. Given the rate at which attacks are occurring [2] there is a need to analyze this volume of data continuously and rapidly. With respect to the problem domain, visually representing even a subset of such large amounts of data simultaneously isn’t feasible. This would lead to enormous occlusion long before the

entire data set could be represented. The visualization environment, therefore, must allow for subsets of the data to be rapidly displayed, analyzed, and resolved through direct manipulation. Resolution will necessarily incorporate filtering of resolved data elements; thus allowing the user to focus on the remaining or newly arriving elements. Resolution may simply be the determination that the data is either innocuous, a naïve attack of no concern, or a sophisticated attack requiring immediate measures. II. APPLICATION DOMAIN Attacks on computer networks generally consist of five stages: reconnaissance, probing/scanning, attack, compromise/digging in, and migration. Given the sensitivity of many of today’s networked computer systems, any form of successful compromise is unacceptable and considered too late in the protection and defense of such systems. For example, a compromise of 911 facilities, power plant control systems, etc. would be disastrous. Thus, detection and response must take place before a compromise can be completed. This requires that detection occur at the earlier stages, particularly during the scanning and attack stages.

The

reconnaissance stage generally takes place without the attacker accessing the target network. The attacker attempts to gain insight into the network organization and implementation in order to reduce the possibility of detection during the actual attack. Therefore, detection and prevention during the probing/scanning stage (or the earliest part of an attack) are the only viable means of thwarting an attack. This work focuses on the identification and analysis of network scans. This differs significantly from prior work related to applying visualization to network security, which has focused on situational awareness [10][11], large-scale events [6][13], small-scale port analysis [3][9], IDS alarms [1], host processes [5], task-based approaches [17], etc. Instead, by building on previous work [4] the technique presented here has focused on detecting the low and

slow scanning portion of an attack and the most sophisticated and challenging of such scans. A. Scanning In terms of scanning, the research described in this paper is concerned with the sophisticated scans often indicative of a more capable attacker. Sophisticated scans are troublesome as they cannot be feasibly detected algorithmically without large numbers of false positives and negatives. This is in contrast with naïve scans which are easily detected and are automatically blocked by most firewalls, both hardware and software based. Thus, not only are naïve scans indicative of an unsophisticated attacker, but they also should not provide any useful information to the attacker, assuming the network has been configured with a modicum of protection. Thus, security analysts, network analysts, system administrators, etc. are not concerned with naïve scans and do not wish to be notified of them. This lack of desire for notification also results from the sheer number of such naïve scans occurring on a daily basis. Thus, the focus of this work is on visualization and interaction techniques directed at the identification and analysis of sophisticated network scans. Particular emphasis is placed on low and slow scans; though examples of other sophisticated scans are also shown. Given the unfeasibility of purely algorithmic techniques, this focus has proven especially well suited for visualization techniques; i.e. there are currently no general solutions to allow detection and blocking of sophisticated scans. This paper describes the developed visualization technique and associated analysis capabilities integrated into the extensible and scalable visualization environment developed at Utah State University, termed AdviseAid (Analysis and Detection Visualization Environment for Attack and Intrusion Defense). This paper, further, shows examples of the effectiveness of the developed capabilities with real unadulterated network traffic data, which is typical of what would need to

be analyzed in a live setting. All network scans have similar characteristics. Such network scans are the result of a program sending a number of network packets to a target machine. In this way, the program derives characteristics of that machine from the responses received from the transmitted packets. In effect, the scanning program will send a connection request to ports that are known to provide typical network-based services. At the very least, a successful scan will provide the attacker with information as to which services are running on the target machine, perhaps even the version of the service daemon. More extensive scanning allows the attacker to identify what operating system, and perhaps what version of the operating system, is being run. This information can be used by an attacker to determine if known vulnerabilities exist on the target machine. This can greatly simplify the attack process as success may be achieved with fewer attack attempts. Nmap is the most commonly used tool for performing such a network scan; in fact, it is the tool used as the basis for the research described here. 1) Naïve Scanning

Naive scanning generally sends the probing packets in a rapid fashion, allowing the scan to complete very quickly. Additionally, these scanning packets are generally conspicuous or merely camouflaged in trivial ways. The speed with which such scans are performed is what leads to their primary ease in detection. Firewalls merely need to examine port accesses occurring in a short window of time and block accesses from an identified remote host that attempts to access multiple ports. B. Sophisticated Scanning In contrast with naïve scans, a sophisticated scan takes care to avoid detection by applying more sophistication. Typical techniques to avoid detection include:

• Fragmentation – By fragmenting packets, especially the packet header, the firewall will not be able to determine if the packets are destined for different ports or originate from the same host [15]. • Low and slow scans – Low and slow scans reduce the rate at which packets are transmitted [7]. Nmap provides automatic timing options to temporally distribute a scan such that a packet is transmitted every 0.4, 15, and 300 seconds. The research presented here focused on the 300 second interval. This timing is most indicative of the greatest threat and validates the scalability of the visualization techniques and associated analysis techniques. This requires the firewall’s window to be too large to algorithmically detect; automated algorithms would need to relate every packet (namely their source IP, destination IP and target port) that is generated within 300 seconds in order to identify a progression in the scan. • Distributed scans – A distributed scan takes a scan and distributes it among multiple attack machines [[7]]. Thus, the accesses to multiple ports do not come from a single machine, meanwhile the scan avoids detection. • Application of non-standard protocols – By using TCP SYN packets, FIN packets, etc., rather than typical TCP connect packets, it can be more difficult for the target system to respond correctly without providing the attacker with information. For example, based on RFC 793 [19], closed ports (a port not providing a service) must respond with a RST packet while open ports must ignore the packets. Thus, any response or lack thereof provides the attacker with information as to the nature of the port, closed vs. open or filtered. • Focused Scans – The reason naïve scans are so easily detected is their volume of activity. A focused scan reduces this by only sending a limited number of scan packets. Thus, instead of trying to identify all systems on the network and what services they have available through

one massive scan, a focused scan will take information garnered elsewhere and only target a few specific machines or one specific port in order to identify targeted service vulnerabilities. This reduced activity makes such scans more difficult to detect. This is by no-means a complete list of techniques that may be applicable. Of these four techniques, fragmentation and non-standard protocols can be prevented through the use of statebased firewalls at the network perimeter. However, such firewalls are resource intensive. While the research presented in this paper focuses on low and slow scans, the developed visualization and analysis capabilities are also applicable to distributed and focused scans, examples of which are shown. III. THE VISUALIZATION TECHNIQUE The visualization design is exemplified in Fig. 1a. The idea behind this technique is to represent network activity as efficiently and concisely as possible. This involves handling as much activity as possible and beginning to resolve some of the many scalability issues inherent in IDS (Intrusion Detection System) data. Additionally, this technique is designed intrinsically to be effective for interaction with large-scale data as exemplified by network traffic data. The visualization technique has characteristics similar to other techniques such as Flowtag [12] which uses two axes of a parallel coordinate plot for the presentation of relationship. The technique presented in this paper allows for the representation of more parameters, the temporal localization of activity, reduced occlusion, more efficient use of screen real-estate, etc.

IP Address (Remote)

IP Address (Local)

Top Semi-Circle

First time unit Second time unit Tenth time unit

P O R T

Right Semi-Circle

Left Semi-Circle

#

P O R T #

Bottom Semi-Circle IP Address (Remote)

(a) (b) Fig. 1: Left: Basic diagram of the network activity monitoring visualization technique. Right: Basic visualization technique showing network activity from a raw pcap (packet capture) file. Notice the full range of remote IP addresses are duplicated on the top and bottom edges of the display. Likewise, local ports are duplicated along the left and right edges of the display. The developed visualization technique in its default form begins by representing the local IP (Internet Protocol) address of a connection around the radius of the internal circles.

The

technique then represents remote IP addresses along the top and bottom edges of the window. This redundancy aids in reducing clutter and line crossings. The top edge is used if the local IP address appears in the top semicircle and the bottom edge is used if the local IP address appears in the bottom semicircle. Similarly, port numbers are represented on the left and right edges of the window. If the local IP address appears in the right semi-circle then the right edge of the screen is used and if the local IP address appears in the left semi-circle then the left edge of the screen is used. Essentially, this technique are using the inherent rectilinear nature of the display to provide segregation between the port numbers and remote IP addresses. The outer rings represent the must current data while each inner ring represents data m*n time units older, where

m is the ring number and n is a user configurable parameter. Each of these mappings is user configurable. Thus, the user interface allows the user to quickly and dynamically change the data parameters mapped to each of the visual attributes. The behavior of the visualization itself remains intrinsically the same. By default, ten rings are presented. The number of rings is also user configurable. Thus, the multiple rings provide for the persistence of the display, allowing extensive periods of time to be represented simultaneously. The use of multiple rings allows for the differentiation of activity within and between temporal periods. In other words, this technique can differentiate: the most recent activity, long-term activity, and old activity. This allows for more extensive interpretation and analysis of the identified activity than if the visualization had only a single ring. An example of this visualization technique in action is also shown in Fig. 1b. This image shows an example applying actual network traffic data acquired from the Air Force Research Lab in Rome, NY. As can be seen from this display, the rings associated with older data have lower intensity to reduce their impact while maintaining the information for the analyst. Line crossings other than crossing ring boundaries are significantly limited, not only in number but also in visual impact. A future goal will be to reduce these even further. Analysis of this display shows that there is consistent activity from several remote hosts. This display is limited in that it does not indicate if the activity exhibited from the remote hosts are related. The duration of time represented is still limited. A. Improving Persistence The first enhancement critical to the visualization technique is geared towards handling the temporal scale of the data. In particular, the time units of the rings described previously are intrinsically linear. While the duration of a time unit is adjustable, making the time unit too long

will inhibit the analysis of current activity and temporally segregate and analyze the data. The temporal association of this data is too critical to be cast aside so readily. Thus, this research has begun to explore other scales; namely, the scale can be changed from its default linear scale to an exponential one. The impact of this exponential scale is exhibited in Fig. 2a. Clearly, this will allow the representation of far more data, and far longer durations of time than would otherwise be possible. This enhancement will also have the advantage of collating information over far longer periods of time. It is this enhancement that will provide analysts with the ability to identify low and slow scans as the inner rings will accumulate the activity, making the activity far more readily observable with far less effort than would otherwise be required. IP Address (Remote)

IP Address (Local)

First time unit (Time units)1.25

P O R T

(Time units)3.25

#

P O R T #

IP Address (Remote)

(a) (b) Fig. 2: Left: Basic diagram for the network activity monitoring visualization technique, with an exponential increment of 0.25. Right: Representation modified to provide an exponential time scale for each of the rings as opposed to the default linear scale shown in Fig. 1. Basic visualization technique showing network activity from a raw pcap file. The same data shown in Fig. 1b is represented again in Fig. 2b, this time using the exponential

scale. The exponential scale clearly accumulates the data. Even further, it can be clearly seen that there are two port scans occurring to two distinct local hosts in rapid succession. The fact that the two overlapping scans appear in different rings maintains the temporal relationship that would otherwise be lost. It is also clear that the port scan on the left side of the screen (Port scan #1) is far more active at the displayed time point given it encompasses two rings with far more ports being hit. Since these scans were not visible in the first representation, this example has in essence validated the effectiveness of the exponential representation in aiding resolution of the persistence problem. In terms of quality of the visualization, the port scans stand out quite clearly from the background activity and noise. The visualization works well to bring out the anomalous activity perceptually. By creating such a strong solid region the user’s attention is quickly drawn to such activity, making identification nearly instantaneous, such that it can be analyzed and resolved appropriately. Examining such data using traditional techniques would take much longer. Thus, the exponential technique will be effective even for low and slow scans indicative of a sophisticated attacker, which an analyst or systems administrator would want to be aware of. B. Enhancing Fidelity The visualization techniques provide, in essence, temporal information through the ring-based metaphor in conjunction with either the linear or exponential scale. Switching between these modes provides the user with the ability to control the persistence vs. fidelity tradeoff. In order to improve the fidelity of the visualization further the developed test environment allows the user to modify two additional characteristics of the visualization. First, the exponent increment can be specified. Rather than limiting the increment to integer values, the increment

can be set to any positive fractional value; i.e. the examples shown here use an exponential increment of 0.25. Second, the number of rings can be increased or decreased. Increasing the number of rings allows the exponent increment to be decreased while still preserving the accumulation of temporal information. In addition, it allows the user, with either a linear or exponential scale, to more finely examine the temporal relationships between elements. Combined, these two capabilities provide the analyst with the ability to tightly control how the display responds to temporal activity and improves the analyst’s ability to comprehend and interpret the temporallybased activity. Since the exponential time scale and number of rings can be adjusted on the fly, a system administrator or analyst can quickly adjust the associated parameters instantly depending upon current needs. More often than not the analyst will leave the setting to allow for the display of long time durations in order to detect long running attacks. Shorter durations will be used intermittently to analyze more short-term attacks. Ultimately, analysts will know their network and what types of attacks they receive on their network and thus will have an instinctual comprehension of the time durations of greatest use to them. IV. ANALYTIC CAPABILITIES Given the visualization in Fig. 2b, several port scans have clearly been identified. However, additional capabilities are required to garner meaning out of this anomalous activity. At the most basic level, selection and probing capabilities are provided, as exhibited in Fig. 3a. Merely clicking on a connection selects and highlights that connection and provides detailed specifics of that connection. Multiple selections are allowed with each selection being given a unique random color to allow that selection to be followed independently. Details are provided

on the screen as to the nature of the most recently selected connection: local IP address, remote IP address, and local port.

(a) (b) Fig. 3: Basic visualization technique showing network activity from a raw pcap file. Three selections are presented with the associated linkages re-colored light green, dark green, and blue. The most recent selection also provides probing detail. This interface component (right) shows the detailed packet header contents for the selected packets (left). The entire packet header is provided, along with the highlight color of the selected packet. Prior selections can be examined through the tabbed menu. Additionally, the full details of the packet are provided in the main control window, Fig. 3b. In this display, each selected packet is shown relative to the time at which the packet was generated. The color of the selection is provided for reference. All header information is provided with respect to the packet in this display. Multiple ways of removing the selection are provided, including a remove button for each individual packet as well as a “Clear All Highlights” menu item. In addition to the ability to select individual packets, the developed test environment provides

the ability to select multiple packets, Fig. 4a. Selecting multiple packets is performed by dragging a bounding rectangle over the desired connections. This aids analysis by focusing attention on a select subset of activity. This is particularly helpful in reducing the impact of noise from background activity.

(a) (b) Fig. 4: A cluster of anomalous activity is selected by dragging a bounding rectangle over the packets. The fan effect applied to the local IP address indicates a focused scan. The identical ports hit on all machines indicate a specific service is being sought out; with duplication of ports on both the left and right edges of the display. The details of the selected packets are presented in the interface dialog on the right. This interface dialog shows a summary of the selected packets for analysis, filtering, and detailed display. When multiple connections are selected in this fashion a distinct dialog box is displayed to summarize the information relevant to analysis, Fig. 4b. Information present in this display includes: the timestamp, remote IP (RIP), remote port (RPort), local IP (LIP), and local port (LPort). Interface components allow individual elements to be removed or displayed in full detail. This summary is sufficient to identify the overall meaning behind the activity within the

selection. This will be explained in detail in the examples section. V. FILTERING AND NOISE REMOVAL The data sets being examined in this paper are complete and full. They were captured through the use of snort with no pre-filtering applied. In essence, network traffic data was collected in its entirety. Thus, there is an enormous amount of information in these datasets which aren’t of value and which analysts and system administrators have no interest in. Typically, some data will be removed by careful tuning of snort rule sets. Additionally, analysts will remove additional data elements through pre-filtering and analytical filtering, removing data elements deemed not valuable during the analysis process. Given the volume and noise inherent to network traffic data, the use of such complete data sets more accurately models the real-world scenario analysts must deal with, even when some form of pre-filtering is applied. The extent of noise present in typical data sets can be seen in Fig. 4a. This is typical of network traffic data and visual analysis capabilities designed for such raw data must be able to handle such noise. To this end, the environment provides extensive filtering capabilities to aid more rapid analysis. At the most basic level, the user can filter a connection from the visualization display by right clicking on it, Fig. 5. This allows the user to immediately create a filter through direct manipulation based on: the remote IP address, the local IP address, the local port, or any combination thereof. The combination filters requires that a connection match all the elements for it to be hidden. Removing a filter at a later time will re-enable display of those hidden connections instantaneously.

Fig. 5: A popup window activated with the right mouse button allows instant access to filtering options for a selected packet. Alternatively, the user can specify filters using the checkboxes on the right side of the multiple selection dialog box, Fig. 4b. This allows the user to examine a set of activity, identify the meaning of the activity, and with minimal interference to the analytical process hide undesirable elements form the display. It is this capability that will allow the analyst to examine the display, determine that anomalous activity is a naïve scan, and remove it from the display without further ado. Finally, a separate interface is provided to enter known lists of filters, Fig. 6a. This list will ultimately be savable and restorable, allowing the user to specify a set list of filters for a network environment that can then be easily reused. Such filters would be based on the known configuration of the network. For example, if no web servers are installed in the local network, then accesses to port 80 aren’t of relevance. Thus, knowledge of the local network configuration is critical to the setup of filters. Additionally, known good or trusted connections or connection types can be removed from the display, greatly improving the analytic process.

(a) (b) Fig. 6: The dialog on the left allows the user to enter filter rules. This relies on knowledge of the network environment and what would be considered anomalous or threatening. Examples of active filter rules are shown in the dialog on the right. The filter rules may be adjusted within this dialog. The current list of active filters may be displayed and edited at any time, Fig. 6b. This display shows a typical set of filters the authors used during analysis of their data. No web server was available so accesses to port 80 were not of importance. Additionally, as all of the systems in the lab were using samba, there was enormous volumes of irrelevant activity on ports 137-139 and 445. After applying a realistic set of filters much of the noise from Fig. 4 is greatly reduced, Fig. 7. This results in a much clearer more easily comprehensible display. Thus, any technology designed to analyze raw network traffic data will result in the need to handle noise similar to that in Fig. 4. Clearly, the developed capabilities provide for this in a successful fashion.

Fig. 7: The less cluttered display after many elements deemed to be unnecessary have been filtered. This includes port 80 (web) traffic, for which no service is provided locally and samba activity for which there is a large volume of local activity. The extensive filtering and ability to enable it through direct manipulation are critical to allow for the online analysis of network traffic data. Without these interaction capabilities sophisticated attacks cannot be discovered in the morass of data. VI. SCANNING EXAMPLES AND ANALYSIS Examples of several scans have been shown previously, figures 2 and 4. In order to analyze this scan, a number of packets are selected from the anomalous activity, Fig. 8. As can be seen from this figure the target or local ports appear sequential, all of the packets have the same local IP address and remote IP address, and the time stamps indicate a very narrow period of time between packets. Most packets are received within 2/1000 of a second after the last packet. This is a classic example of a naïve port scan. It is this narrow window between packets that allows them to be so easily detected algorithmically.

Fig. 8: This scan analysis shows the resolution of an anomalous cluster of activity (selected and highlighted in yellow). This example clearly demonstrates a naïve port scan that can be filtered from the display without further ado. A. Uncovering a Hidden Attack A second scan was exhibited in Fig. 4a. This scan is easily perceived due to the fan like nature of the network activity. The fan in this case is centered on the local IP address, in contrast with the previous scan which was centered on the local ports. The results of the selection are provided in Fig. 4b. This scan differs in that the scan is only sending one or two packets to each target system and all systems in the local network appear to be targeted by the scan. Again all of the packets are originating from the same remote host and within a very narrow period of time. Another difference is that all of the packets seem to be targeting the same port, namely port 3306. A quick lookup indicates that port 3306 is used by MySQL. Clearly this is a virus or a worm attempting to identify systems with an active MySQL service as there are many known vulnerabilities in this service. Identification of this scan allows the administrator to either ignore the warning if no such services are running or harden any systems that may have such a service running.

It is this rapid identification and analysis through application of not just the visualization but the interaction techniques to uncover the hidden activity. While the examples of low and slow scans were injected into the dataset, this is a real attack identified in the dataset which bypassed the firewall due to its low volume and changing target system. Without the combined visualization and interaction this type of activity could not have been identified. Essentially all the developed capabilities were required for its identification, including: visualization, filtering, selection, and detail view. B. Uncovering a Low and Slow Scan The principal focus of this work was to be able to detect low and slow scans. In correspondence with this a data set was created by collecting real network traffic data and running nmap against a single host using its ‘Paranoid’ timing option which forces the scan packets to be sent at a rate of one every five minutes. This data set was previously exemplified in Fig. 4a. One anomalous cluster of activity resulting in a scan for MySQL services was just discussed. An additional anomalous cluster of activity appears in the lower right corner of the visualization. This is highlighted in Fig. 9a.

(a)

(b)

Fig. 9: The display on the left shows the selection of an anomalous cluster of activity (again selected and highlighted in yellow). The analysis details are shown in on the right. The ability to back trace the scan is of particular value in the analysis. The consistency of the remote IP address and local IP address and the complete set of local ports accessed clearly indicate this is a port scan. The timestamp, with five minute intervals between packets, clearly identifies this as a sophisticated, low and slow, scan. The analysis of this cluster is shown in Fig. 9b. Clearly, this resembles the first scanning example with one significant difference. Namely, the timestamps differ by far more than 2/1000 of a second, with approximately five minutes between packets. This is the low and slow scan initiated as a test for this research. The rings encompassed by the scan makes it clear visually that the scan began a significant time ago and is continuing. Thus, many scans can be identified and analyzed visually. The interaction provides full analysis and resolution of identified anomalies extremely rapidly. Also of note in this example is that the selection allows us to back trace to the originating system. Should this originating system be within the local network then an analyst would assume it had been compromised and the visualization would provide a roadmap for identifying what other systems may be effected. Again, uncovering the sophisticated scan required all of the incorporated capabilities, as with the MySQL scan. However, identification of this scan also required correlation of activity over a large volume of data. Given only one packet is being sent every five minutes and it takes a number of packets before the activity is identifiable, an enormous volume of data will be collected before the scan can be undeniably detected. When examining much of the activity through the visualization it may appear that much of the activity appears as noise. The last two scan examples analyzed two clusters of activity. However,

many more clusters of activity appear in this example data set. Fig. 10 shows an example of such a cluster. However, this scenario has identified a wide range of hosts connecting to a wide range of ports over a significant period of time. The tight grouping of hosts or ports that is typically seen with a port scan, as exemplified by the examples in this section, does not appear in this cluster. This must therefore be considered innocuous. This highlights the focus of the developed technology: Given the typical noisy nature of network traffic data, the technique presented in this paper requires just a few clicks by the user to fully analyze the data at hand to the end of applying an appropriate response.

Fig. 10: A cluster of activity, selected to aid analysis. The range of IP’s along with the range of ports without significant clusters along any axis identifies this as innocuous activity. VII. ACHIEVING INFINITE SCALABILITY An additional reason for the lack of effective algorithmic techniques for identifying low and slow scans is the ability for attackers to quickly change the rate at which the scanning is done. For example, while this research focused on collecting a data set in which the scan packets were

sent every five minutes as this is the most challenging scan nmap provides default options for, an attacker could easily increase the delay between packets to further avoid detection. In fact, Greene et al. [7] indicate that the most dedicated attackers will initiate scans in which only two scan packets are sent per day. The unique design of the described visualization technique allows for the technique to handle such increases. At most the exponential increment or number of rings would need to be increased The final obstacle to handling an infinite range of scalability is the resource and compute issue. Here, the unique visualization design offers a solution. Namely, the goal will be to provide the ability to allow the inner most ring to be generated using off-screen renderers. This can be done through: accumulator buffers, pbuffers, or render to textures. Thus, the resource and compute consumption of data elements represented within this inner most ring become fixed, allowing the system to handle any volume of data over any period of time. In order to reduce clutter in this inner most rings, the off-screen renderer can be updated whenever the user interacts with the environment and identifies new filters. At this point performance is less of an issue due to the relatively slow speed of interaction as compared to the performance of the off-screen renderer to re-compute the display portion. Additionally, rules can be created to specify how frequently to remove elements from the inner most ring that have aged beyond the timing parameters for the inner most ring. By only clearing the inner most ring infrequently, excessive compute delays associated with the removal of very old packets is avoided. It is the unique ring design of the visualization technique that allows for this infinite scalability; a factor not applicable to most visualization techniques.

VIII. ANALYSIS Ultimately, the environment provides the capabilities needed to handle large volumes of network traffic data similar to what a system administrator would realistically have to deal with. While some information may be filtered, there will still be an enormous amount of data needing analysis. Automated (algorithmic) analysis can be used to identify naïve port scans and such information can be readily removed from the data in an automated fashion. The remainder of the data requires analysis by a system administrator to determine the meaning and impact of the activity. Thus, the environment provides many capabilities to quickly analyze portions of data currently displayed and remove them from the display (filtering) when resolved, readily handling large volumes of data. Since the original data isn’t modified by the visualization environment, only the display of the information is filtered, the original data can be retrieved should it be deemed necessary for a future analysis. The rapidity with which the environment can handle large volumes of data allows for the techniques to be applied to large volumes of stored data. For example, should an administrator need to examine data from a weekend, the environment has been shown to be able to parse and display three days worth of data in under an hour. Thus, the limitation will be the time required to perform the analysis. Currently, algorithmic techniques have such high false positive and false negative rates that while they can provide guidance to the administrator they cannot be left to their own devices. For example, future research will incorporate the results of such automated techniques as an additional input source. This will provide additional information to the analyst to assist them; to which the analyst can assign appropriate confidence measures when analyzing the data. Given the rapidly changing paradigms of attackers, it is unlikely the situation will improve any time soon. It is up to the administrator or analyst to carefully examine network activity to determine

whether it is innocuous or not. Thus administrators must spend inordinate amounts of time examining and analyzing activity for potential threats. System administrators will more often than not miss sophisticated attacks due to the time involved. The discussed capabilities are designed to reduce this problem. IX. CONCLUSIONS This research has shown that analyst’s can easily identify different types of scans, including sophisticated scans such as low and slow scans. The analysis capabilities of the environment complement the identification capabilities of the visualization techniques to derive meaning from identified anomalies very quickly. Thus, the meaning of all activity can be resolved through just a few clicks of the mouse. This is critical as typical network traffic data is so noisy with so much suspect activity that visualization alone is insufficient. But visualization combined with effective analysis capabilities makes the environment extremely effective and valuable. The unique visualization design also allows for infinite scalability, a capability sorely needed for the ever increasing sophistication of attackers. The visualization technique provides benefits over other techniques such as parallel coordinates [8] by more distinctly representing and segregating each axis and placing the axes of primary analyst concern (local IP addresses) at the center of the user’s focus. The effectiveness and efficiency of the visualization and interaction capabilities allow for rapid analysis and resolution of network activity. The type of rapid direct manipulation is not feasible with most visualization techniques. For instance, pixel-based [13][14] or graph-based techniques do not allow for such efficient selection and interaction of related visual elements. While a data set wasn’t created to verify the ability of the environment to detect distributed scans, the ease with which low and slow scans are identified in conjunction with the

visualization’s ability to back trace such scans suggests that distributed scans will also be identifiable with this technique. In essence, the fan effect will be centered on the remote host in contrast with the given examples which focus the fan effect on either the local IP address or the port. The ease of identification of such sophisticated attacks is clearly not feasible algorithmically and has not been shown through other visualization techniques, especially at this level of scalability. The visualization technique thus provides a truly unique capability. In this fashion, the typical fan like pattern focused on the local ports will be identifiable. However, selection of these connections for analysis will exhibit a clean distribution of connections over a range of distinct remote hosts. The identified ease with which attacks and scans can be identified through this visualization technique, and likely other visualization techniques, provides for needed educational and training needs [2]. This is applicable in two fashions. First, new analysts should be able to pick up the visualization technique very quickly and comprehend better what is occurring on the network and how attacks appear. Second, teaching students about the importance of computer security can be made easier by showing them the attacks that are occurring and the rapidity with which the attacks occur. X. FUTURE RESEARCH While this paper showed the feasibility of the developed capabilities with a real-world data set in which scan packets are distributed to occur one in every five minutes, future research will revolve around collecting a data set in which scan packets occur once every hour. Not only will this improve validation of the scalability of the environment but will test the data with beyond nominal timing requirements for a low and slow scan. Future research will continue to generate

test data sets to validate ever widening temporal scales. Future research will likely result in support for other data sources to allow others to use the environment with their data. In association with this support for additional fusion capabilities must be incorporated. Additionally, as additional data sources are included it will become necessary to provide facilities to reduce the dimensionality of the dataset, for example through Self Organizing Maps [16]. Future research must also add support for additional visualization techniques within a coordinated and collaborative view framework [18]. The software architecture was designed around an extensible paradigm which should allow for both the addition of new visualization techniques and new data sources to progress quickly and smoothly. The collaborative capability would allow experienced analysts to work with inexperienced analysts without necessarily needing to be present locally. Finally, future research must look at the insider threat arena. In such a scenario, a zombie PC, infected with a virus, can be used to scan the network from the wrong side of the firewall. This allows a passive scan to be performed. XI. ACKNOWLEDGEMENTS Portions of this work were supported on AFRL’s summer faculty research program. A portion of the presented data was provided by AFRL during the term of the summer faculty research program as well. The evaluation and feedback provided by numerous affiliates of AFRL are also appreciated. REFERENCES [1] Abdullah, K., Lee, C., Conti, G., Copeland, J.A., and Stasko, j., “IDS RainStorm: Visualizing IDS Alarms,” in Proceedings of Visualization for Computer Security ’05, 2005, pp. 1-10.

[2] Anderson J.E. and Schwager P.H., “Security in the Information Systems Curriculum: Identification & Status of Relevant Issues,” Journal of Computer Information Systems, 32:3, 2002, pp. 16-24. [3] Erbacher R.F. and Garber M., “Real-Time Interactive Visual Port Monitoring and Analysis,” Proceedings of the International Conference on Security and Management, June 2005, pp. 228-234. [4] Erbacher, R.F., Christensen, K., and Sundberg, A., “Designing Visualization Capabilities for IDS Challenges,” Proceedings of the 2005 VizSec Workshop, Minneapolis, MN, October 2005. [5] Fink, G.A., Muessig, P., and North, C., “Visual Correlation of Host Processes and Network Traffic,” in Proceedings of Visualization for Computer Security ’05, 2005, pp. 11-20. [6] Fischer, F., Mansmann, F., Keim, D.A., Pietzko, S., and Waldvogel, M., “Large-scale Network Monitoring for Visual Analysis of Attacks,” Proceedings of the 5th International Workshop on Visualization for Computer Security, Lecture Notes in Computer Science, Vol. 5210, 2008, pp. 111-118. [7] Green, J., Marchette, D., Northcutt, S., and Ralph, B., “Analysis Techniques for Detecting Coordinated Attacks and Probes,” Proceedings of the Workshop on Intrusion Detection and Network Monitoring, Santa Clara, CA, April 9-12, 1999, pp. 1-9. [8] Inselberg, A., “The plane with parallel coordinates,” The Visual Computer, Vol. 1, pp. 69-91, 1985. [9] Irwin, B., and van Riel, J.P., “An Interactive Attack Graph Cascade and Reachability Display,” Proceedings of the 5th International Workshop on Visualization for Computer Security, Springer, 2008, pp. 221-236.

[10]Lakkaraju, K., Lee, A.J., and Yurcik, W., “Nvisionip: netflow visualizations of system state for security situational awareness,” In Proceedings of CCS Workshop on Visualization and Data Mining for Computer Security, ACM Conference on Computer and Communications Security, October 29, 2004. [11]Livnat, Y., Agutter, J., Moon, S. Erbacher, R.F., and Foresti, S., “A Visualization Paradigm for Network Intrusion Detection,” Proceedings of the IEEE Systems, Man and Cybernetics Information Assurance Workshop, June 2005, pp. 92-99. [12]Lee, C.P., Copeland, J.A., “Flowtag: a collaborative attack-analysis, reporting, and sharing tool for security researchers,” Proceedings of the 3rd international workshop on Visualization for computer security, Alexandria, Virginia, 2006, pp. 103-108. [13]McPherson, J., Ma, K., Krystosek, P., Bartoletti, T., and Christensen, M., “PortVis: A Tool for Port- Based Detection of Security Events,” Proceedings of CCS Workshop on Visualization and Data Mining for Computer Security, ACM Conference on Computer and Communications Security, October 29, 2004. [14]Oberheide, J., Karir, M., and Blazakis, D., “VAST: visualizing autonomous system topology,” Proceedings of the 3rd international workshop on Visualization for computer security, Alexandria, Virginia, 2006, pp. 71-80. [15]Ptacek, T.H., and Newsham., T.N., “Insertion, Evasion, And Denial Of Service: Eluding Network Intrusion Detection,” Technical Report, Secure Networks, Inc., January 1998. [16]Sedbrook, T. A., “Visualizing Changing Requirements with Self-Organizing Maps,” Journal of Computer Information Systems, , Vol. 45; No. 2, 2005, pp. 63-72.

[17]Suo, X., Zhu, Y., and Owen, S., “A Task Centered Framework for computer Security Data Visualization,” Proceedings of the 5th International Workshop on Visualization for Computer Security, Lecture Notes in Computer Science, Vol. 5210, 2008, pp. 87-94. [18]Walsh, K. R., and Pawlowski, S. D., “Collaboration and visualization: Integrative opportunities,” Journal of Computer Information Systems, Vol. 44, No. 2, 2004, pp. 58-64. [19] “RFC

793:

TRANSMISSION

http://www.faqs.org/rfcs/rfc793.html

CONTROL

PROTOCOL,”

Alkhateeb_COMM14_MIMO Precoding and Combining Solutions for ...