Sequential anomaly detection in a batch with growing number of tests: Application to network intrusion detection

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

For high (N)-dimensional feature spaces, we consider detection of an unknown, anomalous class of samples amongst a batch of collected samples (of size T), under the null hypothesis that all samples follow the same probability law. Since the features which will best identify the anomalies are a priori unknown, several common detection strategies are: 1) evaluating atypicality of a sample (its p-value) based on the null distribution defined on the full N-dimensional feature space; 2) considering a (combinatoric) set of low order distributions, e.g. all singletons and all feature pairs, with detections made based on the smallest p-value yielded over all such low order tests. The first approach relies on accurate estimation of the joint distribution, while the second may suffer from increased false alarm rates as N and T grow. Alternatively, inspired by greedy feature selection commonly used in supervised learning, we propose a novel sequential anomaly detection procedure with a growing number of tests. Here, new tests are (greedily) included only when they are needed, i.e., when their use (on currently undetected samples) will yield greater aggregate statistical significance of (multiple testing corrected) detections than obtainable using the existing test cadre. Our approach thus aims to maximize aggregate statistical significance of all detections made up until a finite horizon. Our method is evaluated, along with supervised methods, for a network intrusion domain, detecting Zeus bot (intrusion) packet flows embedded amongst (normal)Web flows. It is shown that judicious feature representation is essential for discriminating Zeus from Web.

Original languageEnglish (US)
Title of host publication2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012
DOIs
StatePublished - Dec 12 2012
Event2012 22nd IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2012 - Santander, Spain
Duration: Sep 23 2012Sep 26 2012

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Other

Other2012 22nd IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2012
CountrySpain
CitySantander
Period9/23/129/26/12

Fingerprint

Intrusion detection
Supervised learning
Feature extraction
Testing

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Signal Processing

Cite this

Miller, D. J., Kocak, F., & Kesidis, G. (2012). Sequential anomaly detection in a batch with growing number of tests: Application to network intrusion detection. In 2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012 [6349793] (IEEE International Workshop on Machine Learning for Signal Processing, MLSP). https://doi.org/10.1109/MLSP.2012.6349793
Miller, David J. ; Kocak, Fatih ; Kesidis, George. / Sequential anomaly detection in a batch with growing number of tests : Application to network intrusion detection. 2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012. 2012. (IEEE International Workshop on Machine Learning for Signal Processing, MLSP).
@inproceedings{8f36f3e7b5c144b7b90fd78b7bffc515,
title = "Sequential anomaly detection in a batch with growing number of tests: Application to network intrusion detection",
abstract = "For high (N)-dimensional feature spaces, we consider detection of an unknown, anomalous class of samples amongst a batch of collected samples (of size T), under the null hypothesis that all samples follow the same probability law. Since the features which will best identify the anomalies are a priori unknown, several common detection strategies are: 1) evaluating atypicality of a sample (its p-value) based on the null distribution defined on the full N-dimensional feature space; 2) considering a (combinatoric) set of low order distributions, e.g. all singletons and all feature pairs, with detections made based on the smallest p-value yielded over all such low order tests. The first approach relies on accurate estimation of the joint distribution, while the second may suffer from increased false alarm rates as N and T grow. Alternatively, inspired by greedy feature selection commonly used in supervised learning, we propose a novel sequential anomaly detection procedure with a growing number of tests. Here, new tests are (greedily) included only when they are needed, i.e., when their use (on currently undetected samples) will yield greater aggregate statistical significance of (multiple testing corrected) detections than obtainable using the existing test cadre. Our approach thus aims to maximize aggregate statistical significance of all detections made up until a finite horizon. Our method is evaluated, along with supervised methods, for a network intrusion domain, detecting Zeus bot (intrusion) packet flows embedded amongst (normal)Web flows. It is shown that judicious feature representation is essential for discriminating Zeus from Web.",
author = "Miller, {David J.} and Fatih Kocak and George Kesidis",
year = "2012",
month = "12",
day = "12",
doi = "10.1109/MLSP.2012.6349793",
language = "English (US)",
isbn = "9781467310260",
series = "IEEE International Workshop on Machine Learning for Signal Processing, MLSP",
booktitle = "2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012",

}

Miller, DJ, Kocak, F & Kesidis, G 2012, Sequential anomaly detection in a batch with growing number of tests: Application to network intrusion detection. in 2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012., 6349793, IEEE International Workshop on Machine Learning for Signal Processing, MLSP, 2012 22nd IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2012, Santander, Spain, 9/23/12. https://doi.org/10.1109/MLSP.2012.6349793

Sequential anomaly detection in a batch with growing number of tests : Application to network intrusion detection. / Miller, David J.; Kocak, Fatih; Kesidis, George.

2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012. 2012. 6349793 (IEEE International Workshop on Machine Learning for Signal Processing, MLSP).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Sequential anomaly detection in a batch with growing number of tests

T2 - Application to network intrusion detection

AU - Miller, David J.

AU - Kocak, Fatih

AU - Kesidis, George

PY - 2012/12/12

Y1 - 2012/12/12

N2 - For high (N)-dimensional feature spaces, we consider detection of an unknown, anomalous class of samples amongst a batch of collected samples (of size T), under the null hypothesis that all samples follow the same probability law. Since the features which will best identify the anomalies are a priori unknown, several common detection strategies are: 1) evaluating atypicality of a sample (its p-value) based on the null distribution defined on the full N-dimensional feature space; 2) considering a (combinatoric) set of low order distributions, e.g. all singletons and all feature pairs, with detections made based on the smallest p-value yielded over all such low order tests. The first approach relies on accurate estimation of the joint distribution, while the second may suffer from increased false alarm rates as N and T grow. Alternatively, inspired by greedy feature selection commonly used in supervised learning, we propose a novel sequential anomaly detection procedure with a growing number of tests. Here, new tests are (greedily) included only when they are needed, i.e., when their use (on currently undetected samples) will yield greater aggregate statistical significance of (multiple testing corrected) detections than obtainable using the existing test cadre. Our approach thus aims to maximize aggregate statistical significance of all detections made up until a finite horizon. Our method is evaluated, along with supervised methods, for a network intrusion domain, detecting Zeus bot (intrusion) packet flows embedded amongst (normal)Web flows. It is shown that judicious feature representation is essential for discriminating Zeus from Web.

AB - For high (N)-dimensional feature spaces, we consider detection of an unknown, anomalous class of samples amongst a batch of collected samples (of size T), under the null hypothesis that all samples follow the same probability law. Since the features which will best identify the anomalies are a priori unknown, several common detection strategies are: 1) evaluating atypicality of a sample (its p-value) based on the null distribution defined on the full N-dimensional feature space; 2) considering a (combinatoric) set of low order distributions, e.g. all singletons and all feature pairs, with detections made based on the smallest p-value yielded over all such low order tests. The first approach relies on accurate estimation of the joint distribution, while the second may suffer from increased false alarm rates as N and T grow. Alternatively, inspired by greedy feature selection commonly used in supervised learning, we propose a novel sequential anomaly detection procedure with a growing number of tests. Here, new tests are (greedily) included only when they are needed, i.e., when their use (on currently undetected samples) will yield greater aggregate statistical significance of (multiple testing corrected) detections than obtainable using the existing test cadre. Our approach thus aims to maximize aggregate statistical significance of all detections made up until a finite horizon. Our method is evaluated, along with supervised methods, for a network intrusion domain, detecting Zeus bot (intrusion) packet flows embedded amongst (normal)Web flows. It is shown that judicious feature representation is essential for discriminating Zeus from Web.

UR - http://www.scopus.com/inward/record.url?scp=84870676776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84870676776&partnerID=8YFLogxK

U2 - 10.1109/MLSP.2012.6349793

DO - 10.1109/MLSP.2012.6349793

M3 - Conference contribution

AN - SCOPUS:84870676776

SN - 9781467310260

T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP

BT - 2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012

ER -

Miller DJ, Kocak F, Kesidis G. Sequential anomaly detection in a batch with growing number of tests: Application to network intrusion detection. In 2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012. 2012. 6349793. (IEEE International Workshop on Machine Learning for Signal Processing, MLSP). https://doi.org/10.1109/MLSP.2012.6349793