As the human population grew in number, so did the data about them. Businesses and various other fields like medicine and others, needed to analyze this data to understand the requirements of people and enhance their services. Statistics was one way of analyzing the available data and obtain results. But with the growing amount of data and advent of computing in various fields, extracting useful information from this data using various sophisticated mathematical models and statistics became possible. This extraction of useful information from large high dimensional databases came to be known as “Data Mining”.
Data mining is the analysis of observational dataset to find unsuspected relationship and to summarize large amounts of data in novel ways that are both understandable and useful to data owner in proactive decision making. Data Mining is now possible due to advances in computer science and machine learning. Data Mining delivers new algorithms that can automatically sift deep into your data at the individual record level to discover patterns, relationships, factors, clusters, associations, profiles, and predictions— that were previously “hidden”. Using normal reports, Data mining can produce decisions and create alerts when action is required. Data Mining is being widely used in various fields, such as in business for Customer Relationship Management, Marketing, etc, in medicine for laboratory research, clinical trials, pharmacology, etc, in forecasting of weather, traffic, etc, in aviation for pilot assistance and in research in the areas of astrophysics, medicine, business, security, etc . In order to apply the techniques to information security we needed datasets. We used a commonly applied dataset in information security research: The network intrusion dataset from the KDD archive popularly referred to as the KDD 99 Cup dataset. The KDD 99 Cup consists of 41 attributes that is 10% of original dataset means 500,000 rows.
đang được dịch, vui lòng đợi..