Anomaly Detection / Outlier Detection in Security Applications
by Aleksandar Lazarevic
Despite the enormous amount of data being collected in many scientific and commercial applications, particular events of interests are still quite rare. These rare events, very often called outliers or anomalies, are defined as events that occur very infrequently (their frequency ranges from 5% to less than 0.01% depending on the application). Detection of anomalies (outliers or rare events) has recently gained a lot of attention in many security domains, ranging from video surveillance and security systems to intrusion detection and fraudulent transactions. For example, in video surveillance applications, video trajectories that represent suspicious and/or unlawful activities (e.g. identification of traffic violators on the road, detection of suspicious activities in the vicinity of objects) represent only a small portion of all video trajectories. Similarly, in the network intrusion detection domain, the number of cyber attacks on the network is typically a very small fraction of the total network traffic. Although anomalies (outliers or rare events) are by definition infrequent, in each of these examples, their importance is quite high compared to other events, making their detection extremely important.
There are many anomaly detection algorithms proposed in the literature that differ according to the information used for analysis and according to techniques that are employed to detect deviations from normal behavior. In this section, we provide classification of anomaly detection techniques based on employed techniques into the following five groups: (i) statistical methods; (ii) rule based methods; (iii) distance based methods (iv) profiling methods and (v) model based approaches. Although anomaly detection algorithms are quite diverse in nature, and thus may fit into more than one proposed category, this classification attempts to find the most suitable category for all described anomaly detection algorithms.
Statistical Methods. Statistical methods monitor the user or system behavior by measuring certain variables over time (e.g. login and logout time of each session in intrusion detection domain). The basic models keep averages of these variables and detect whether thresholds are exceeded based on the standard deviation of the variable. More advanced statistical models also compare profiles of long-term and short-term user activities.
Distance based Methods. Distance based approaches attempt to overcome limitations of statistical outlier detection approaches and they detect outliers by computing distances among points. Several distance based outlier detection algorithms have been recently proposed for detecting anomalies in network traffic. These techniques are based on computing the full dimensional distances of points from one another using all the available features, and on computing the densities of local neighborhoods.
Rule based systems. Rule based systems used in anomaly detection characterize normal behavior of users, networks and/or computer systems by a set of rules.
Profiling Methods. In profiling methods, profiles of normal behavior are built for different types of network traffic, users, programs etc., and deviations from them are considered as intrusions. Profiling methods vary greatly ranging from different data mining techniques to various heuristic-based approaches. In this section, we provide an overview of several distinguished profiling methods for anomaly detection.
Model based approaches. Many researchers have used different types of models to characterize the normal behavior of the monitored system. In the model-based approaches, anomalies are detected as deviations for the model that represents the normal behavior. Very often, researchers have used data mining based predictive models such as replicator neural networks or unsupervised support vector machines.
Anomaly Detection in Computer Security, University of New Mexico
Protocol Anomaly Detection for Network-based Intrusion Detection, SANS Institute
IBM Proventia Network Anomaly Detection System (ADS)
The State of Anomaly Detection, Security Focus
The Mazu Network Behavior Analysis (NBA) system, Mazu Networks
My Publications in Anomaly/Outlier Detection for Security Applications
Kumar, V., Srivastava, J., Lazarevic, A. (Editors): “Managing Cyber Threats: Issues, Approaches and Challenges”, Springer, May 2005. Book chapters: 1. Lazarevic, A., Data Mining for Intrusion Detection, Encyclopedia of Data Warehousing and Mining, Idea Group, June 2005.
1. Lazarevic, A., Data Mining for Intrusion Detection, Encyclopedia of Data Warehousing and Mining, Idea Group, June 2005.
2. Lazarevic, A., Srivastava, J., Kumar, V: A Survey of Intrusion Detection techniques, book “Managing Cyber Threats: Issues, Approaches and Challenges”, Kluwer Academic Publishers, May 2005
3. Ertoz, L., Eilertson, E., Lazarevic, A., Tan, P., Srivastava, J., Kumar, V., Dokas, P.: The MINDS – Minnesota Intrusion Detection System, book “Next Generation Data Mining”, 2004.
1. Lazarevic, A., Pokrajac, D., Latecki, L., “Incremental Local Outlier Detection for Data Streams”, Proc. IEEE Symposium on Computational Intelligence and Data Mining, Honolulu, HI, April 2007.
2. Latecki, L., Lazarevic, A., Pokrajac, D., Anomaly Detection with Kernel Density Functions, International Conference on Machine Learning and Data Mining – MLDM 2007, Leipzig, Germany, July 2007
3. Lazarevic, A., Kumar, V.: “Feature Bagging for Outlier Detection”, Proc. ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, August 2005.
4. Lazarevic, A., Ertoz, L., Ozgur, A., Srivastava, J., Kumar, V.: "Evaluation of Outlier Detection Schemes for Detecting Network Intrusions", Proc. Third SIAM International Conference on Data Mining, San Francisco, CA, May 2003.