Publications Academic publication list.
Conference
author = {Theodoor Scholte and William Robertson and Davide Balzarotti and Engin Kirda},
title = {{Preventing Input Validation Vulnerabilities in Web Applications through Automated Type Analysis}},
booktitle = {{Proceedings of the IEEE Computer Software and Applications Conference}},
month = {07},
year = {2012},
address = {{Izmir, TR}},
}
Web applications have become an integral part of the daily lives of millions of users. Unfortunately, web applications are also frequently targeted by attackers, and criticial vulnerabilities such as XSS and SQL injection are still common. As a consequence, much effort in the past decade has been spent on mitigating web application vulnerabilities.
Current techniques focus mainly on sanitization: either on automated sanitization, the detection of missing sanitizers, the correctness of sanitizers, or the correct placement of sanitizers. However, these techniques are either not able to prevent new forms of input validation vulnerabilities such as HTTP Parameter Pollution, come with large runtime overhead, lack precision, or require significant modifications to the client and/or server infrastructure.
In this paper, we present IPAAS, a novel technique for preventing the exploitation of XSS and SQL injection vulnerabilities based on automated data type detection of input parameters. IPAAS automatically and transparently augments otherwise insecure web application development environments with input validators that result in significant and tangible security improvements for real systems. We implemented IPAAS for PHP and evaluated it on five real-world web applications with known XSS and SQL injection vulnerabilities. Our evaluation demonstrates that IPAAS would have prevented 83% of XSS vulnerabilities and 65% of SQL injection vulnerabilities while incurring no developer burden.
author = {Theodoor Scholte and William Robertson and Davide Balzarotti and Engin Kirda},
title = {{An Empirical Analysis of Input Validation Mechanisms in Web Applications and Languages}},
booktitle = {{Proceedings of the ACM Symposium on Applied Computing}},
month = {03},
year = {2012},
address = {{Trento, IT}},
}
Web applications have become an integral part of the daily lives of millions of users. Unfortunately, web applications are also frequently targeted by attackers, and attacks such as XSS and SQL injection are still common. In this paper, we present an empirical study of more than 7,000 input validation vulnerabilities with the aim of gaining deeper insights into how these common web vulnerabilities can be prevented. In particular, we focus on the relationship between the specific programming language used to develop web applications and the vulnerabilities that are commonly reported. Our findings suggest that most SQL injection and a significant number of XSS vulnerabilities can be prevented using straight-forward validation mechanisms based on common data types. We elaborate on these common data types, and discuss how support could be provided in web application frameworks.
author = {William Robertson and Federico Maggi and Christopher Kruegel and Giovanni Vigna},
title = {{Effective Anomaly Detection with Scarce Training Data}},
booktitle = {{Proceedings of the Network and Distributed System Security Symposium (NDSS)}},
month = {02},
year = {2010},
address = {{San Diego, CA USA}},
}
Learning-based anomaly detection has proven to be an effective black-box technique for detecting unknown attacks. However, the effectiveness of this technique crucially depends upon both the quality and the completeness of the training data. Unfortunately, in most cases, the traffic to the system (e.g., a web application or daemon process) protected by an anomaly detector is not uniformly distributed. Therefore, some components (e.g., authentication, payments, or content publishing) might not be exercised enough to train an anomaly detection system in a reasonable time frame. This is of particular importance in real-world settings, where anomaly detection systems are deployed with little or no manual configuration, and they are expected to automatically learn the normal behavior of a system to detect or block attacks.
In this work, we first demonstrate that the features utilized to train a learning-based detector can be semantically grouped, and that features of the same group tend to induce similar models. Therefore, we propose addressing local training data deficiencies by exploiting clustering techniques to construct a knowledge base of well-trained models that can be utilized in case of undertraining. Our approach, which is independent of the particular type of anomaly detector employed, is validated using the realistic case of a learning-based system protecting a pool of web servers running several web applications such as blogs, forums, or Web services. We run our experiments on a real-world data set containing over 58 million HTTP requests to more than 36,000 distinct web application components. The results show that by using the proposed solution, it is possible to achieve effective attack detection even with scarce training data.
author = {Federico Maggi and William Robertson and Christopher Kruegel and Giovanni Vigna},
title = {{Protecting a Moving Target: Addressing Web Application Concept Drift}},
booktitle = {{Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID)}},
month = {09},
year = {2009},
address = {{Saint-Malo, Brittany FR}},
}
Because of the ad hoc nature of web applications, intrusion detection systems that leverage machine learning techniques are particularly well-suited for protecting websites. The reason is that these systems are able to characterize the applications' normal behavior in an automated fashion. However, anomaly-based detectors for web applications suffer from false positives that are generated whenever the applications being protected change. These false positives need to be analyzed by the security officer who then has to interact with the web application developers to confirm that the reported alerts were indeed erroneous detections.
In this paper, we propose a novel technique for the automatic detection of changes in web applications, which allows for the selective retraining of the affected anomaly detection models. Our technique identifies changes in both the interface of the components of a web application and its navigational structure. By correctly identifying legitimate changes in web applications, we can reduce false positives and allow for the automated retraining of the anomaly models.
We have evaluated our approach by analyzing a number of real-world applications. Our analysis shows that web applications indeed change substantially over time, and that our technique is able to effectively detect changes and automatically adapt the anomaly detection models to the new structure of the changed web applications.
author = {William Robertson and Giovanni Vigna},
title = {{Static Enforcement of Web Application Integrity Through Strong Typing}},
booktitle = {{Proceedings of the USENIX Security Symposium}},
month = {08},
year = {2009},
address = {{Montreal, Quebec CA}},
}
Security vulnerabilities continue to plague web applications, allowing attackers to access sensitive data and co-opt legitimate web sites as a hosting ground for malware. Accordingly, researchers have focused on various approaches to detecting and preventing common classes of security vulnerabilities in web applications, including anomaly-based detection mechanisms, static and dynamic analyses of server-side web application code, and client-side security policy enforcement.
This paper presents a different approach to web application security. In this work, we present a web application framework that leverages existing work on strong type systems to statically enforce a separation between the structure and content of both web documents and database queries generated by a web application, and show how this approach can automatically prevent the introduction of both cross-site scripting and SQL injection vulnerabilities. We present an evaluation of the framework, and demonstrate both the coverage and correctness of our sanitization functions. Finally, experimental results suggest that web applications developed using this framework perform competitively with applications developed using traditional frameworks.
author = {Davide Balzarotti and Greg Banks and Marco Cova and Viktoria Felmetsger and William Robertson and Fredrik Valeur and Giovanni Vigna and Richard Kemmerer},
title = {{Are Your Votes Really Counted? Testing the Security of Real-world Voting Systems}},
booktitle = {{Proceedings of the International Symposium on Software Testing and Analysis (ISSTA)}},
month = {07},
year = {2008},
address = {{Seattle, WA USA}},
}
Electronic voting systems play a critical role in today's democratic societies, as they are responsible for recording and counting the votes of citizens. Unfortunately, there is an alarming number of reports describing the malfunctioning of these systems, suggesting that their quality is not up to the task. Recently, there has been a focus on the security testing of voting systems to determine if they can be compromised in order to control the results of an election. We have participated in two large-scale projects, sponsored by the Secretaries of State of California and Ohio, whose respective goals were to perform the security testing of the electronic voting systems used in those two states. The testing process identified major flaws in all the systems analyzed, and resulted in substantial changes in the voting procedures of both states. In this paper, we describe the testing methodology that we used in testing two real-world electronic voting systems, the findings of our analysis, and the lessons we learned.
author = {Davide Balzarotti and William Robertson and Christopher Kruegel and Giovanni Vigna},
title = {{Improving Signature Testing Through Dynamic Data Flow Analysis}},
booktitle = {{Proceedings of the Annual Computer Security Applications Conference (ACSAC)}},
month = {12},
year = {2007},
address = {{Miami Beach, FL USA}},
}
The effectiveness and precision of network-based intrusion detection signatures can be evaluated either by direct analysis of the signatures (if they are available) or by using black-box testing (if the system is closed-source). Recently, several techniques have been proposed to generate test cases by automatically deriving variations (or mutations) of attacks. Even though these techniques have been useful in identifying “blind spots” in the signatures of closed-source, network-based intrusion detection systems, the generation of test cases is performed in a random, unguided fashion. The reason is that there is no information available about the signatures to be tested. As a result, identifying a test case that is able to evade detection is difficult.
In this paper, we propose a novel approach to drive the generation of test cases by using the information gathered by analyzing the dynamic behavior of the intrusion detection system. Our approach applies dynamic data flow analysis techniques to the intrusion detection system to identify which parts of a network stream are used to detect an attack and how these parts are matched by a signature. The result of our analysis is a set of constraints that is used to guide the black-box testing process, so that the mutations are applied to only those parts of the attack that are relevant for detection. By doing this, we are able to perform a more focused generation of the test cases and improve the process of identifying an attack variation that evades detection.
author = {Darren Mutz and William Robertson and Giovanni Vigna and Richard Kemmerer},
title = {{Exploiting Execution Context for the Detection of Anomalous System Calls}},
booktitle = {{Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID)}},
month = {09},
year = {2007},
address = {{Gold Coast, Queensland AUS}},
}
Attacks against privileged applications can be detected by analyzing the stream of system calls issued during process execution. In the last few years, several approaches have been proposed to detect anomalous system calls. These approaches are mostly based on modeling acceptable system call sequences. Unfortunately, the techniques proposed so far are either vulnerable to certain evasion attacks or are too expensive to be practical. This paper presents a novel approach to the analysis of system calls that uses a composition of dynamic analysis and learning techniques to characterize anomalous system call invocations in terms of both the invocation context and the parameters passed to the system calls. Our technique provides a more precise detection model with respect to solutions proposed previously, and, in addition, it is able to detect data modification attacks, which cannot be detected using only system call sequence analysis.
author = {William Robertson and Giovanni Vigna and Christopher Kruegel and Richard Kemmerer},
title = {{Using Generalization and Characterization Techniques in the Anomaly-based Detection of Web Attacks}},
booktitle = {{Proceedings of the Network and Distributed System Security Symposium (NDSS)}},
month = {02},
year = {2006},
address = {{San Diego, CA USA}},
}
The custom, ad hoc nature of web applications makes learning-based anomaly detection systems a suitable approach to provide early warning about the exploitation of novel vulnerabilities. However, anomaly-based systems are known for producing a large number of false positives and for providing poor or non-existent information about the type of attack that is associated with an anomaly.
This paper presents a novel approach to anomaly-based detection of web-based attacks. The approach uses an anomaly generalization technique that automatically translates suspicious web requests into anomaly signatures. These signatures are then used to group recurrent or similar anomalous requests so that an administrator can easily deal with a large number of similar alerts.
In addition, the approach uses a heuristics-based technique to infer the type of attacks that generated the anomalies. This enables the prioritization of the attacks and provides better information to the administrator. Our approach has been implemented and evaluated experimentally on real-world data gathered from web servers at two universities.
author = {Christopher Kruegel and Engin Kirda and Darren Mutz and William Robertson and Giovanni Vigna},
title = {{Polymorphic Worm Detection Using Structural Information of Executables}},
booktitle = {{Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID)}},
month = {09},
year = {2005},
address = {{Seattle, WA USA}},
}
Network worms are malicious programs that spread automatically across networks by exploiting vulnerabilities that affect a large number of hosts. Because of the speed at which worms spread to large computer populations, countermeasures based on human reaction time are not feasible. Therefore, recent research has focused on devising new techniques to detect and contain network worms without the need of human supervision. In particular, a number of approaches have been proposed to automatically derive signatures to detect network worms by analyzing a number of worm-related network streams. Most of these techniques, however, assume that the worm code does not change during the infection process. Unfortunately, worms can be polymorphic. That is, they can mutate as they spread across the network. To detect these types of worms, it is necessary to devise new techniques that are able to identify similarities between different mutations of a worm.
This paper presents a novel technique based on the structural analysis of binary code that allows one to identify structural similarities between different worm mutations. The approach is based on the analysis of a worm's control flow graph and introduces an original graph coloring technique that supports a more precise characterization of the worm's structure. The technique has been used as a basis to implement a worm detection system that is resilient to many of the mechanisms used to evade approaches based on instruction sequences only.
author = {Christopher Kruegel and Engin Kirda and Darren Mutz and William Robertson and Giovanni Vigna},
title = {{Automating Mimicry Attacks Using Static Binary Analysis}},
booktitle = {{Proceedings of the USENIX Security Symposium}},
month = {07},
year = {2005},
address = {{Baltimore, MD USA}},
}
Intrusion detection systems that monitor sequences of system calls have recently become more sophisticated in defining legitimate application behavior. In particular, additional information, such as the value of the program counter and the configuration of the program's call stack at each system call, has been used to achieve better characterization of program behavior. While there is common agreement that this additional information complicates the task for the attacker, it is less clear to which extent an intruder is constrained.
In this paper, we present a novel technique to evade the extended detection features of state-of-the-art intrusion detection systems and reduce the task of the intruder to a traditional mimicry attack. Given a legitimate sequence of system calls, our technique allows the attacker to execute each system call in the correct execution context by obtaining and relinquishing control of the application's execution flow through manipulation of code pointers.
We have developed a static analysis tool for Intel x86 binaries that uses symbolic execution to automatically identify instructions that can be used to redirect control flow and to compute the necessary modifications to the environment of the process. We used our tool to successfully exploit three vulnerable programs and evade detection by existing state-of-the-art system call monitors. In addition, we analyzed three real-world applications to verify the general applicability of our techniques.
author = {Darren Mutz and Christopher Kruegel and William Robertson and Giovanni Vigna and Richard Kemmerer},
title = {{Reverse Engineering of Network Signatures}},
booktitle = {{Proceedings of the Annual Asia Pacific Information Technology Security Conference (AusCERT)}},
month = {05},
year = {2005},
address = {{Gold Coast, Queensland AUS}},
}
Network-based intrusion detection systems analyze network traffic looking for evidence of attacks. The analysis is usually performed using signatures, which are rules that describe what traffic should be considered as malicious. If the signatures are known, it is possible to either craft an attack to avoid detection or to send synthetic traffic that will match the signature to over-stimulate the network sensor causing a denial of service attack. To prevent these attacks, commercial systems usually do not publish their signature sets and their analysis algorithms. This paper describes a reverse engineering process and a reverse engineering tool that are used to analyze the way signatures are matched by network-based intrusion detection systems. The results of the analysis are used to either generate variations of attacks that evade detection or produce non-malicious traffic that over-stimulates the sensor. This shows that security through obscurity does not work. That is, keeping the signatures secret does not necessarily increase the resistance of a system to evasion and over-stimulation attacks.
author = {Christopher Kruegel and William Robertson and Giovanni Vigna},
title = {{Detecting Kernel-Level Rootkits Through Binary Analysis}},
booktitle = {{Proceedings of the Annual Computer Security Applications Conference (ACSAC)}},
month = {12},
year = {2004},
address = {{Tuscon, AZ USA}},
}
A rootkit is a collection of tools used by intruders to keep the legitimate users and administrators of a compromised machine unaware of their presence. Originally, rootkits mainly included modified versions of system auditing programs (e.g., ps or netstat on a Unix system). However, for operating systems that support loadable kernel modules (e.g., Linux and Solaris), a new type of rootkit has recently emerged. These rootkits are implemented as kernel modules, and they do not require modification of user space binaries to conceal malicious activity. Instead, the rootkit operates within the kernel, modifying critical data structures such as the system call table or the list of currently-loaded kernel modules.
This paper presents a technique that exploits binary analysis to ascertain, at load time, if a module's behavior resembles the behavior of a rootkit. Through this method, it is possible to provide additional protection against this type of malicious modification of the kernel. Our technique relies on an abstract model of module behavior that is not affected by small changes in the binary image of the module. Therefore, the technique is resistant to attempts to conceal the malicious nature of a kernel module.
author = {Giovanni Vigna and Davide Balzarotti and William Robertson},
title = {{Testing Network-based Intrusion Detection Signatures Using Mutant Exploits}},
booktitle = {{Proceedings of the ACM Conference on Computer and Communications Security (CCS)}},
month = {10},
year = {2004},
address = {{Washington DC USA}},
}
Misuse-based intrusion detection systems rely on models of attacks to identify the manifestation of intrusive behavior. Therefore, the ability of these systems to reliably detect attacks is strongly affected by the quality of their models, which are often called “signatures.” A perfect model would be able to detect all the instances of an attack without making mistakes, that is, it would produce a 100% detection rate with 0 false alarms. Unfortunately, writing good models (or good signatures) is hard. Attacks that exploit a specific vulnerability may do so in completely different ways, and writing models that take into account all possible variations is very difficult. For this reason, it would be beneficial to have testing tools that are able to evaluate the “goodness” of detection signatures.
This work describes a technique to test and evaluate misuse detection models in the case of network-based intrusion detection systems. The testing technique is based on a mechanism that generates a large number of variations of an exploit by applying mutant operators to an exploit template. These mutant exploits are then run against a victim host protected by a network-based intrusion detection system. The results of the systems in detecting these variations provide a quantitative basis for the evaluation of the quality of the corresponding detection model.
author = {Christopher Kruegel and William Robertson and Fredrik Valeur and Giovanni Vigna},
title = {{Static Disassembly of Obfuscated Binaries}},
booktitle = {{Proceedings of the USENIX Security Symposium}},
month = {08},
year = {2004},
address = {{San Diego, CA USA}},
}
Disassembly is the process of recovering a symbolic representation of a program's machine code instructions from its binary representation. Recently, a number of techniques have been proposed that attempt to foil the disassembly process. These techniques are very effective against state-of-the-art disassemblers, preventing a substantial fraction of a binary program from being disassembled correctly. This could allow an attacker to hide malicious code from static analysis tools that depend on correct disassembler output (such as virus scanners).
The paper presents novel binary analysis techniques that substantially improve the success of the disassembly process when confronted with obfuscated binaries. Based on control flow graph information and statistical methods, a large fraction of the program's instructions can be correctly identified. An evaluation of the accuracy and the performance of our tool is provided, along with a comparison to several state-of-the-art disassemblers.
author = {Christopher Kruegel and Darren Mutz and William Robertson and Fredrik Valeur},
title = {{Bayesian Event Classification for Intrusion Detection}},
booktitle = {{Proceedings of the Annual Computer Security Applications Conference (ACSAC)}},
month = {12},
year = {2003},
address = {{Las Vegas, NV USA}},
}
Intrusion detection systems (IDSs) attempt to identify attacks by comparing collected data to predefined signatures known to be malicious (misuse-based IDSs) or to a model of legal behavior (anomaly-based IDSs). Anomaly-based approaches have the advantage of being able to detect previously unknown attacks, but they suffer from the difficulty of building robust models of acceptable behavior which may result in a large number of false alarms. Almost all current anomaly-based intrusion detection systems classify an input event as normal or anomalous by analyzing its features, utilizing a number of different models. A decision for an input event is made by aggregating the results of all employed models.
We have identified two reasons for the large number of false alarms, caused by incorrect classification of events in current systems. One is the simplistic aggregation of model outputs in the decision phase. Often, only the sum of the model results is calculated and compared to a threshold. The other reason is the lack of integration of additional information into the decision process. This additional information can be related to the models, such as the confidence in a model's output, or can be extracted from external sources. To mitigate these shortcomings, we propose an event classification scheme that is based on Bayesian networks. Bayesian networks improve the aggregation of different model outputs and allow one to seamlessly incorporate additional information. Experimental results show that the accuracy of the event classification process is significantly improved using our proposed approach.
author = {Giovanni Vigna and William Robertson and Vishal Kher and Richard Kemmerer},
title = {{A Stateful Intrusion Detection System for World-Wide Web Servers}},
booktitle = {{Proceedings of the Annual Computer Security Applications Conference (ACSAC)}},
month = {12},
year = {2003},
address = {{Las Vegas, NV USA}},
}
Web servers are ubiquitous, remotely accessible, and often misconfigured. In addition, custom web-based applications may introduce vulnerabilities that are overlooked even by the most security-conscious server administrators. Consequently, web servers are a popular target for hackers. To mitigate the security exposure associated with web servers, intrusion detection systems are deployed to analyze and screen incoming requests. The goal is to perform early detection of malicious activity and possibly prevent more serious damage to the protected site. Even though intrusion detection is critical for the security of web servers, the intrusion detection systems available today only perform very simple analyses and are often vulnerable to simple evasion techniques. In addition, most systems do not provide sophisticated attack languages that allow a system administrator to specify custom, complex attack scenarios to be detected.
This paper presents WebSTAT, an intrusion detection system that analyzes web requests looking for evidence of malicious behavior. The system is novel in several ways. First of all, it provides a sophisticated language to describe multi-step attacks in terms of states and transitions. In addition, the modular nature of the system supports the integrated analysis of network traffic sent to the server host, operating system-level audit data produced by the server host, and the access logs produced by the web server. By correlating different streams of events, it is possible to achieve more effective detection of web-based attacks.
author = {William Robertson and Christopher Kruegel and Darren Mutz and Fredrik Valeur},
title = {{Run-time Detection of Heap-based Overflows}},
booktitle = {{Proceedings of the USENIX Large Installations Systems Administration Conference (LISA)}},
month = {10},
year = {2003},
address = {{San Diego, CA USA}},
}
Buffer overflows belong to the most common class of attacks on today's Internet. Although stack-based variants are still by far more frequent and well-understood, heap-based overflows have recently gained more attention. Several real-world exploits have been published that corrupt heap management information and allow arbitrary code execution with the privileges of the victim process.
This paper presents a technique that protects heap management information and allows for run-time detection of heap-based overflows. We discuss the structure of these attacks and our proposed detection scheme that has been implemented as a patch to the GNU libc. We report the results of our experiments, which demonstrate the detection effectiveness and performance impact of our approach. In addition, we discuss different mechanisms to deploy the memory protection.
author = {Christopher Kruegel and Darren Mutz and William Robertson and Fredrik Valeur},
title = {{Topology-based Detection of Anomalous BGP Messages}},
booktitle = {{Proceedings of the International Symposium on Recent Advances in Intrusion Detection (RAID)}},
month = {09},
year = {2003},
address = {{Pittsburgh, PA USA}},
}
The Border Gateway Protocol (BGP) is a fundamental component of the current Internet infrastructure. Due to the inherent trust relationship between peers, control of a BGP router could enable an attacker to redirect traffic allowing man-in-the-middle attacks or to launch a large-scale denial of service. It is known that BGP has weaknesses that are fundamental to the protocol design. Many solutions to these weaknesses have been proposed, but most require resource intensive cryptographic operations and modifications to the existing protocol and router software. For this reason, none of them have been widely adopted. However, the threat necessitates an effective, immediate solution.
We propose a system that is capable of detecting malicious inter-domain routing update messages through passive monitoring of BGP traffic. This approach requires no protocol modifications and utilizes existing monitoring infrastructure. The technique relies on a model of the autonomous system connectivity to verify that route advertisements are consistent with the network topology. By identifying anomalous update messages, we prevent routers from accepting invalid routes. Utilizing data provided by the Route Views project, we demonstrate the ability of our system to distinguish between legitimate and potentially malicious traffic.
Journal
author = {Davide Balzarotti and Marco Cova and Viktoria Felmetsger and Richard Kemmerer and William Robertson and Fredrik Valeur and Giovanni Vigna},
title = {{An Experience in Testing the Security of a Real-World Electronic Voting System}},
journal = {{IEEE Transactions on Software Engineering}},
volume = {36},
issue = {4},
month = {07},
year = {2010},
}
Voting is the process through which a democratic society determines its government. Therefore, voting systems are as important as other well-known critical systems, such as air traffic control systems or nuclear plant monitors. Unfortunately, voting systems have a history of failures that seems to indicate that their quality is not up to the task. Because of the alarming frequency and impact of the malfunctions of voting systems, in recent years a number of vulnerability analysis exercises have been carried out against voting systems to determine if they can be compromised in order to control the results of an election. We have participated in two such large-scale projects, sponsored by the Secretaries of State of California and Ohio, whose goals were to perform the security testing of the electronic voting systems used in their respective states. As the result of the testing process, we identified major vulnerabilities in all the systems analyzed. We then took advantage of a combination of these vulnerabilities to generate a series of attacks that would spread across the voting systems and would “steal” votes by combining voting record tampering with social engineering approaches. As a response to the two large-scale security evaluations, the Secretaries of State of California and Ohio recommended changes to improve the security of the voting process. In this paper, we describe the methodology that we used in testing the two real-world electronic voting systems we evaluated, the findings of our analysis, our attacks, and the lessons we learned.
author = {Giovanni Vigna and Fredrik Valeur and Davide Balzarotti and William Robertson and Christopher Kruegel and Engin Kirda},
title = {{Reducing Errors in the Anomaly-based Detection of Web-based Attacks Through the Combined Analysis of Web Requests and SQL Queries}},
journal = {{Journal of Computer Security}},
volume = {17},
issue = {3},
month = {05},
year = {2009},
}
Web-based applications have become a popular means of exposing functionality to large numbers of users by leveraging the services provided by web servers and databases. The wide proliferation of custom-developed web-based applications suggests that anomaly detection could be a suitable approach for providing early warning and real-time blocking of application-level exploits. Therefore, a number of research prototypes and commercial products that learn the normal usage patterns of web applications have been developed. Anomaly detection techniques, however, are prone to both false positives and false negatives. As a result, if anomalous web requests are simply blocked, it is likely that some legitimate requests would be denied, resulting in decreased availability. On the other hand, if malicious requests are allowed to access a web application's data stored in a back-end database, security-critical information could be leaked to an attacker.
To ameliorate this situation, we propose a system composed of a web-based anomaly detection system, a reverse HTTP proxy, and a database anomaly detection system. Serially composing a web-based anomaly detector and a SQL query anomaly detector increases the detection rate of our system. To address a potential increase in the false positive rate, we leverage an anomaly-driven reverse HTTP proxy to serve anomalous-but-benign requests that do not require access to sensitive information.
We developed a prototype of our approach and evaluated its applicability with respect to several existing web-based applications, showing that our approach is both feasible and effective in reducing both false positives and false negatives.
author = {Christopher Kruegel and William Robertson and Giovanni Vigna},
title = {{A Multi-Model Approach to the Detection of Web-based Attacks}},
journal = {{Journal of Computer Networks}},
volume = {48},
issue = {5},
month = {07},
year = {2005},
}
author = {Christopher Kruegel and Giovanni Vigna and William Robertson},
title = {{Using Alert Verification to Identify Successful Intrusion Attempts}},
journal = {{Journal of Practice in Information Processing and Communication (PIK)}},
volume = {27},
issue = {4},
month = {08},
year = {2004},
}
Intrusion detection systems monitor protected networks and attempt to identify evidence of malicious activity. When an attack is detected, an alert is produced, and, possibly, a countermeasure is executed. A perfect intrusion detection system would be able to identify all the attacks without raising any false alarms. In addition, a countermeasure would be executed only when an attack is actually successful. Unfortunately, false alarms are commonplace in intrusion detection systems, and perfectly benign events are interpreted as malicious. In addition, non-relevant alerts are also common. These are alerts associated with attacks that were not successful. Such alerts should be tagged appropriately so that their priority can be lowered.
The process of identifying alerts associated with successful attacks is called alert verification. This paper describes the different issues involved in alert verification and presents a tool that performs real-time verification of attacks detected by an intrusion detection system. The experimental evaluation of the tool shows that verification can dramatically reduce both false and non-relevant alerts.
Workshop
author = {Christopher Kruegel and William Robertson},
title = {{Alert Verification: Determining the Success of Intrusion Attempts}},
booktitle = {{Proceedings of the Workshop on the Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA)}},
month = {07},
year = {2004},
address = {{Dortmund, North Rhine-Westphalia GER}},
}
Dissertation
author = {William Robertson},
title = {{Detecting and Preventing Attacks Against Web Applications}},
school = {{UC Santa Barbara}},
month = {06},
year = {2009},
}
The World Wide Web has evolved from a system for serving an interconnected set of static documents to what is now a powerful, versatile, and largely democratic platform for application delivery and information dissemination. Unfortunately, with the web’s explosive growth in power and popularity has come a concomitant increase in both the number and impact of web application-related security incidents. The magnitude of the problem has prompted much interest within the security community towards researching mechanisms that can mitigate this threat. To this end, intrusion detection systems have been proposed as a potential means of identifying and preventing the successful exploitation of web application vulnerabilities. The current state-of-the-art, however, has failed to deliver on the promise of intrusion detection. Misuse-based detection systems are unable to generalize to previously unknown attacks for which no signatures exist. In the context of the web, this is especially problematic in light of the wide proliferation of unique, custom-written web applications. On the other hand, anomaly-based intrusion detection systems seem well-suited for detecting attacks against web applications. Existing anomaly detection techniques, however, have heretofore proven unfeasible due to several factors: unacceptably high false positive rates, susceptibility to evasion, an inability to adapt to changes in monitored applications, and a lack of explanatory power.
In this dissertation, I present WEBANOMALY, an advanced black-box anomaly detection system that accurately detects attacks against web applications with low performance overhead. WEBANOMALY addresses several of the aforementioned fundamental challenges to anomaly detection using a combination of novel techniques. In particular, the relatively high rate of false positives and lack of explanatory power is ameliorated using anomaly signatures, a technique for clustering related anomalies and classifying the type of attack they represent. The problem of local training data scarcity is addressed through the use of global knowledge bases of well-trained profiles collected from other web applications. Changes in web application behavior over time, known as concept drift, are addressed by treating the web application itself as an oracle of legitimate change. Finally, a novel framework for developing web applications that are secure by construction against many common classes of attacks is presented.