The 1999 DARPA IDS data set was collected at MIT Lincoln Labs to evaluate intrusion
detection systems. All the network traffic including the entire payload of each packet was
recorded in tcpdump format and provided for evaluation. In addition, there are also audit
logs, daily file system dumps, and BSM (Solaris system call) logs. The data consists of
three weeks of training data and two weeks of test data. In the training data there are two
weeks of attack-free data and one week of data with labeled attacks.
This dataset has been used in many research efforts and results of tests against this data
have been reported in many publications. Although there are problems due to the nature
of the simulation environment that created the data, it still remains a useful set of data to
compare techniques. The top results were reported by [39].
In our experiment on payload anomaly detection we only used the inside network traffic
data which was captured between the router and the victims. Because most public applications on the Internet use TCP (web, email, telnet, and ftp), and to reduce the complexity
of the experiment, we only examined the inbound TCP traffic to the ports 0-1023 of the
hosts 172.016.xxx.xxx which contains most of the victims, and ports 0-1023 which covers
the majority of the network services. For the DARPA 99 data, we conducted experiments using each packet as the data unit and each connection as the data unit. We used tcptrace
to reconstruct the TCP connections from the network packets in the tcpdump files. We also
experimented the idea of “truncated payload”, both for each packet and each connection.
For truncated packets, we tried the first N bytes and the tail N bytes separately, where N is
a parameter. Using truncated payload saves considerable computation time and space. We
report the results for each of these models.
We trained the payload distribution model on the DARPA dataset using week 1 (5 days,
attack free) and week 3 (7 days, attack free), then evaluate the detector on weeks 4 and 5,
which contain 201 instances of 58 different attacks, 177 of which are visible in the inside
tcpdump data. Because we restrict the victims’ IP and port range, there are 14 others we
ignore in this test.
In this experiment, we focus on TCP traffic only, so the attacks using UDP, ICMP,
ARP (address resolution protocol) and IP only cannot be detected. They include: smurf
(ICMP echo-reply flood), ping-of-death (over-sized ping packets), UDPstorm, arppoison
(corrupts ARP cache entries of the victim), selfping, ipsweep, teardrop (mis-fragmented
UDP packets). Also because our payload model is computed from only the payload part of
the network packet, those attacks that do not contain any payload are impossible to detect
with the proposed anomaly detector. Thus, there are in total 97 attacks to be detected by
our payload model in weeks 4 and 5 evaluation data.
After filtering there are in total 2,444,591 packets, and 49556 connections, with nonzero length payloads to evaluate. We build a model for each payload length observed in the
training data for each port between 0-1023 and for every host machine. The smoothing factor is set to 0.001 which gives the best result for this dataset (see the discussion in Section
3.2). This helps avoid over-fitting and reduces the false positive rate. Also due to having an
inadequate number of training examples in the DARPA99 data, we apply clustering to the
models as described previously. Clustering the models of neighboring length bins means
that similar models can provide more training data for a model whose training data is too
sparse thus making it less sensitive and more accurate. But there is also the risk that the
đang được dịch, vui lòng đợi..
