Anomaly Detection
One of the major themes of my research has been to investigate how previously unknown attacks against applications can be detected. To this end, my work has demonstrated how anomaly-based techniques can be used to detect attacks against the global routing infrastructure, web applications, system calls, and relational databases. To understand why anomaly detection is useful, however, it's instructive to first compare it to misuse-based detection.
Misuse-based Detection
-
Misuse detector with an x86 NOP-sled signature. -
Evading an x86 NOP-sled signature with an alternate instruction sequence.
Traditional intrusion detection systems (IDS) have mainly focused on misuse detection. Misuse-based detection focuses on specifying malicious behavior in the form of signatures, and producing alerts if any event stream matches one or more signatures. All other event sequences are considered benign.
Misuse-based detection has been the predominant model used by IDS developers because it exhibits low false positive rates. This is because it's relatively easy to specify signatures that are very specific to the manifestations of a known attack given a sample of the attack. That is, IDS developers can craft signatures that will only match specific features of an attack, and test these signatures to ensure that benign behavior will not trigger alerts.
On the other hand, misuse detection has some serious drawbacks. In particular, misuse signatures exhibit poor generalization to new attacks. Attack signatures are usually specific to a particular vulnerability, such that even if a very similar attack is used against a different vulnerability, a misuse-based IDS will be useless until the signature database has been updated to include a signature for it. This can be exacerbated by the historical tendency for IDS developers to create signatures that are specific to attacks rather than vulnerabilities, so that an attack can be slightly mutated to evade the signature and still successfully exploit a vulnerability.
Anomaly-based Detection
-
Anomaly detector training a character distribution model. -
Anomaly detector alerting on an anomalous character distribution.
Anomaly detection takes the inverse approach to detecting attacks. Instead of specifying known malicious behavior, anomaly detection involves specifying models that capture the benign behavior of the application or system to be protected. As long as event sequences match the models of good behavior, no alerts are raised. If, however, a sequence of events does not match any models, the events are considered anomalous and, with high likelihood, evidence of malicious behavior.
Anomaly detection therefore has the immediate benefit of being able to detect previously unknown attacks. This is because the anomaly model of detection does not care what the manifestation of an attack is; instead, the only consideration is whether the events do not appear normal.
Anomaly detection is not without its limitations, however. The main criticism of anomaly detection systems is their relatively high false positive rate when compared to misuse-based detectors, which can be directly attributed to the difficulty of correctly specifying normal behavior. To understand why this occurs, it's important to understand how anomaly detectors operate.
Most anomaly detectors use some form of machine learning to automatically generate their models, and operate in one of two modes. First, an anomaly detector observes normal events during a training phase, and use a learning algorithm to build a statistical model from these events (e.g., a Markov chain). Then, during a detection phase, these models are matched against events to determine their likelihood.
These models, however, are naturally only an approximation of the full range of normal behavior. Therefore, if the models are not sophisticated enough, or if some aspect of normal behavior was not observed during the training of the anomaly detector, the generated models can be an incomplete specification of benign behavior, leading to false positives.
An often-encountered barrier to deploying anomaly detection is the lack of suitable training data. As we have seen, the training data may not cover the totality of normal behavior that a monitored system can exhibit. One approach to addressing this is for developers to synthetically generate a comprehensive training data set. For instance, if a web application was to be monitored by an anomaly detector, the developers of the application could use a testing framework to exercise the application and record the resulting network traffic. This is not often done, however, due to the overhead of completely testing the application. (As an aside, another approach is to substitute similar anomaly models from other detectors if they are deemed suitable.)
Consequently, training data is often collected by sampling real events. To continue the web application example, the traffic generated by real users interacting with the application might be recorded and used to train an anomaly detector. A significant drawback to this approach is that the data may contain attacks. If this occurs, then the models generated during the training phase may recognize attacks observed during the detection phase as normal, leading to false negatives.
Despite the advantages of anomaly-based detection, these issues have prevented its widespread use in the real world. In general, reducing the false positive rate of anomaly detection is an open research problem.