. Third, sophisticated tools for automating the process of fingerprinting the user’s browser, obfuscating the exploit code, and delivering it to the victim, are easily obtainable (e.g., NeoSploit, and LuckySploit [15]). The mix of widespread, vulnerable targets and effective attack
mechanisms has made drive-by downloads the technique of choiceto compromise large numbers of end-user machines. In 2007, Provos et al. [28] found more than three million URLs that launched drive-by-download attacks. Even more troubling, malicious URLs are found both on rogue web sites, that are set up explicitly for
the purpose of attacking unsuspecting users, and on legitimate web
sites, that have been compromised or modified to serve the malicious content (high-profile examples include the Department of
Homeland Security and the BusinessWeek news outlet [10, 11]).
A number of approaches have been proposed to detect malicious web pages. Traditional anti-virus tools use static signatures
to match patterns that are commonly found in malicious scripts [2].
Unfortunately, the effectiveness of syntactic signatures is thwarted
by the use of sophisticated obfuscation techniques that often hide
the exploit code contained in malicious pages. Another approach
is based on low-interaction honeyclients, which simulate a regular browser and rely on specifications to match the behavior, rather
than the syntactic features, of malicious scripts (for example, invoking a method of an ActiveX control vulnerable to buffer overflows
with a parameter longer than a certain length) [14, 23]. A problem with low-interaction honeyclients is that they are limited by the
coverage of their specification database; that is, attacks for which a
specification is not available cannot be detected. Finally, the stateof-the-art in malicious JavaScript detection is represented by highinteraction honeyclients. These tools consist of full-featured web
browsers typically running in a virtual machine. They work by
monitoring all modifications to the system environment, such as
files created or deleted, and processes launched [21, 28, 37, 39]. If
any unexpected modification occurs, this is considered as the manifestation of an attack, and the corresponding page is flagged as
malicious. Unfortunately, also high-interaction honeyclients have
limitations. In particular, an attack can be detected only if the vulnerable component (e.g., an ActiveX control or a browser plugin)
targeted by the exploit is installed and correctly activated on the de-
tection system. Since there exist potentially hundreds of such vulnerable components, working under specific combinations of operating system and browser versions, the setup of a high-interaction
honeyclient and its configuration is difficult and at risk of being incomplete. As a consequence, a significant fraction of attacks may
go undetected. (Indeed, Seifert, the lead developer of a popular
high-interaction honeyclient, says, “high-interaction client honeypots have a tendency to fail at identifying malicious web pages,
producing false negatives that are rooted in the detection mechanism” [32].)
In this paper, we propose a novel approach to the automatic detection and analysis of malicious web pages. For this, we visit web
pages with an instrumented browser and record events that occur
during the interpretation of HTML elements and the execution of
JavaScript code. For each event (e.g., the instantiation of an ActiveX control via JavaScript code or the retrieval of an external resource via an iframe tag), we extract one or more features whose
values are evaluated using anomaly detection techniques. Anomalous features allow us to identify malicious content even in the case
of previously-unseen attacks. Our features are comprehensive and
model many properties that capture intrinsic characteristics of attacks. Moreover, our system provides additional details about the
attack. For example, it identifies the exploits that are used and the
unobfuscated version of the code, which are helpful to explain how
the attack was executed and for performing additional analysis.
We implemented our approach in a tool called JSAND (JavaScript
Anomaly-based aNalysis and Detection), and validated it on over
140,000 web pages. In our experiments, we found that our tool
performed significantly better than existing approaches, detecting
more attacks and raising a low number of false positives. We also
made JSAND available as part of an online service called Wepawet
(at http://wepawet.cs.ucsb.edu), where users can submit URLs and files that are automatically analyzed, delivering detailed reports about the type of observed attacks and the targeted
vulnerabilities. This service has been operative since November
2008 and analyzes about 1,000 URLs per day submitted from users
across the world.
In summary, our main contributions include:
A novel approach that has the ability to detect previouslyunseen drive-by downloads by using machine learning and
anomaly detection.
The identification of a set of ten features that characterize
intrinsic events of a drive-by download and allow our system
to robustly identify web pages containing malicious code.
An analysis technique that automatically produces the deobfuscated version of malicious JavaScript code, characterizes
the exploits contained in the code, and generates exploit signatures for signature-based tools.
An online service that offers public access to our tool
đang được dịch, vui lòng đợi..
