The ease of development and ubiquity of the browser platform has led to a modern renaissance over the last decade in how applications are developed and deployed to users. Web applications have become the de facto means by which applications are provided to users for these reasons. Unfortunately, web applications have also become a prime target for exploitation, due to a proliferation of vulnerabilities and the large client base of these applications. Much of my research has focused on two related lines of investigation. First, I have studied how existing web applications can be protected, using both misuse detection as well as anomaly detection techniques. Second, I have studied how new web applications can be developed under frameworks that are secure by default against a range of popular attacks. In the following, I'll outline my research in the areas of anomaly detection and secure web application development frameworks.

Web Application Anomaly Detection

  • Example hidden Markov model for an HTTP request parameter.

Anomaly detection has been an intense area of research for quite some time, mainly due to its promise of detecting previously unknown attacks against monitored applications and systems. Anomaly detection is particularly well-suited to the context of web applications, due to the often custom nature of web application deployments. Whereas misuse-based techniques would require application developers to specify a custom set of detection signatures for each application to be developed, anomaly detection can bring to bear machine learning techniques that automatically learn the normal behavior of a web application, and therefore treat any anomalous behavior as evidence of malicious activity.

One effective approach to web application anomaly detection is to learn the normal behavior of inputs to web applications; that is, to characterize various statistical features of HTTP request parameters that are normally sent to web applications. In this work, I applied a set of novel anomaly models to HTTP GET and POST parameters, allowing the resulting detection system to automatically detect zero-day attacks including cross-site scripting (XSS), cross-site request forgery (CSRF), and SQL injection. This work later became the basis for a commercial web application anomaly detector developed by WebWise Security, a startup I founded with my thesis advisors during my Ph.D.

Later work along this line of inquiry has mainly focused on reducing the false positive rate of anomaly detection, as this is the main barrier to succesful deployment of anomaly detection systems. One approach introduced the idea of anomaly signatures, where anomaly models could be generalized and the types of attacks detected by the models characterized using a set of simple heuristics to improve the reporting and false positive rates of anomaly detectors. Another approach involved the incorporation of anomaly models for SQL queries invoked by a web application in response to user requests. Since the normal behavior of a web application can change over time (a phenomenon known as concept drift in the machine learning community), another paper reported on how anomaly models can respond to legitimate change in web application behavior without exposing the detection system to model poisoning attacks. Finally, I investigated the problem of training data scarcity for web applications by studying how similar anomaly models from other web applications could be used to supplement the set of models for an undertrained anomaly detector.

Secure Web Application Development Frameworks

  • Injection containment due to framework-enforced separation
    of control information and user data.

Anomaly detection can be effective for detecting attacks against mature web applications where the cost of redesigning and reimplementing the application with security in mind would be prohibitive. In the case where a new web application is to be developed, however, it is desirable to provide developers a way to write applications that are not vulnerable to the major classes of web application attacks such as XSS or SQL injection. An observation that can be made about XSS and SQL injection attacks in particular is that they are often the result of string interpolation vulnerabilities, where an attacker is able to inject control data into a web page (i.e., <script> tags) or a SQL query (i.e., replacing a WHERE clause).

Therefore, one approach to securing new web applications against injection vulnerabilities is to enforce a separation between trusted control information and untrusted user data in web pages and SQL queries. In particular, I proposed using a language type system to automatically prevent XSS and SQL injection in applications using a secure web framework. In this system, application developers reason about web pages and SQL queries in terms of Haskell algebraic data types instead of as strings. This allows the framework to ensure that unwanted HTML tags and SQL keywords cannot be injected into the resulting web pages and SQL queries generated by the application, lifting the burden of worrying about such vulnerabilities from the developer.