Detecting anomalous latent classes in a batch of network traffic flows

Research output: Contribution to conferencePaper

4 Citations (Scopus)

Abstract

We focus on detecting samples from anomalous latent classes, 'buried' within a collected batch of known ('normal') class samples. In our setting, the number of features for each sample is high. We posit and observe to be true that careful 'feature selection' within unsupervised anomaly detection may be needed to achieve the most accurate results. Our approach effectively selects features (tests), even though there are no labeled anomalous examples available to form a basis for standard (supervised) feature selection. We form pairwise feature tests based on bivariate Gaussian mixture null models, with one test for every pair of features. The mixtures are estimated using known class samples (null 'training set'). Then, we obtain p-values on the test batch samples under the null hypothesis. Subsequently, we calculate approximate joint p-values for candidate anomalous clusters, defined by (sample subset, test subset) pairs. Our approach sequentially detects the most significant clusters of samples in a networking context. We compare our 'p-value clustering algorithm', using ROC curves, with alternative p-value based methods and with the one-class SVM. All the competing methods make sample-wise detections, i.e. they do not jointly detect anomalous clusters. The anomalous class was either an HTTP bot (Zeus) or peer-to-peer (P2P) traffic. Our p-value clustering approach gives promising results for detecting the Zeus bot and P2P traffic amongst Web.

Original languageEnglish (US)
DOIs
StatePublished - Jan 1 2014
Event2014 48th Annual Conference on Information Sciences and Systems, CISS 2014 - Princeton, NJ, United States
Duration: Mar 19 2014Mar 21 2014

Other

Other2014 48th Annual Conference on Information Sciences and Systems, CISS 2014
CountryUnited States
CityPrinceton, NJ
Period3/19/143/21/14

Fingerprint

Feature extraction
HTTP
Clustering algorithms

All Science Journal Classification (ASJC) codes

  • Information Systems

Cite this

Kocak, F., Miller, D. J., & Kesidis, G. (2014). Detecting anomalous latent classes in a batch of network traffic flows. Paper presented at 2014 48th Annual Conference on Information Sciences and Systems, CISS 2014, Princeton, NJ, United States. https://doi.org/10.1109/CISS.2014.6814181
Kocak, Fatih ; Miller, David Jonathan ; Kesidis, George. / Detecting anomalous latent classes in a batch of network traffic flows. Paper presented at 2014 48th Annual Conference on Information Sciences and Systems, CISS 2014, Princeton, NJ, United States.
@conference{3b5b5e98d9d34d979d30dc810e071e5c,
title = "Detecting anomalous latent classes in a batch of network traffic flows",
abstract = "We focus on detecting samples from anomalous latent classes, 'buried' within a collected batch of known ('normal') class samples. In our setting, the number of features for each sample is high. We posit and observe to be true that careful 'feature selection' within unsupervised anomaly detection may be needed to achieve the most accurate results. Our approach effectively selects features (tests), even though there are no labeled anomalous examples available to form a basis for standard (supervised) feature selection. We form pairwise feature tests based on bivariate Gaussian mixture null models, with one test for every pair of features. The mixtures are estimated using known class samples (null 'training set'). Then, we obtain p-values on the test batch samples under the null hypothesis. Subsequently, we calculate approximate joint p-values for candidate anomalous clusters, defined by (sample subset, test subset) pairs. Our approach sequentially detects the most significant clusters of samples in a networking context. We compare our 'p-value clustering algorithm', using ROC curves, with alternative p-value based methods and with the one-class SVM. All the competing methods make sample-wise detections, i.e. they do not jointly detect anomalous clusters. The anomalous class was either an HTTP bot (Zeus) or peer-to-peer (P2P) traffic. Our p-value clustering approach gives promising results for detecting the Zeus bot and P2P traffic amongst Web.",
author = "Fatih Kocak and Miller, {David Jonathan} and George Kesidis",
year = "2014",
month = "1",
day = "1",
doi = "10.1109/CISS.2014.6814181",
language = "English (US)",
note = "2014 48th Annual Conference on Information Sciences and Systems, CISS 2014 ; Conference date: 19-03-2014 Through 21-03-2014",

}

Kocak, F, Miller, DJ & Kesidis, G 2014, 'Detecting anomalous latent classes in a batch of network traffic flows' Paper presented at 2014 48th Annual Conference on Information Sciences and Systems, CISS 2014, Princeton, NJ, United States, 3/19/14 - 3/21/14, . https://doi.org/10.1109/CISS.2014.6814181

Detecting anomalous latent classes in a batch of network traffic flows. / Kocak, Fatih; Miller, David Jonathan; Kesidis, George.

2014. Paper presented at 2014 48th Annual Conference on Information Sciences and Systems, CISS 2014, Princeton, NJ, United States.

Research output: Contribution to conferencePaper

TY - CONF

T1 - Detecting anomalous latent classes in a batch of network traffic flows

AU - Kocak, Fatih

AU - Miller, David Jonathan

AU - Kesidis, George

PY - 2014/1/1

Y1 - 2014/1/1

N2 - We focus on detecting samples from anomalous latent classes, 'buried' within a collected batch of known ('normal') class samples. In our setting, the number of features for each sample is high. We posit and observe to be true that careful 'feature selection' within unsupervised anomaly detection may be needed to achieve the most accurate results. Our approach effectively selects features (tests), even though there are no labeled anomalous examples available to form a basis for standard (supervised) feature selection. We form pairwise feature tests based on bivariate Gaussian mixture null models, with one test for every pair of features. The mixtures are estimated using known class samples (null 'training set'). Then, we obtain p-values on the test batch samples under the null hypothesis. Subsequently, we calculate approximate joint p-values for candidate anomalous clusters, defined by (sample subset, test subset) pairs. Our approach sequentially detects the most significant clusters of samples in a networking context. We compare our 'p-value clustering algorithm', using ROC curves, with alternative p-value based methods and with the one-class SVM. All the competing methods make sample-wise detections, i.e. they do not jointly detect anomalous clusters. The anomalous class was either an HTTP bot (Zeus) or peer-to-peer (P2P) traffic. Our p-value clustering approach gives promising results for detecting the Zeus bot and P2P traffic amongst Web.

AB - We focus on detecting samples from anomalous latent classes, 'buried' within a collected batch of known ('normal') class samples. In our setting, the number of features for each sample is high. We posit and observe to be true that careful 'feature selection' within unsupervised anomaly detection may be needed to achieve the most accurate results. Our approach effectively selects features (tests), even though there are no labeled anomalous examples available to form a basis for standard (supervised) feature selection. We form pairwise feature tests based on bivariate Gaussian mixture null models, with one test for every pair of features. The mixtures are estimated using known class samples (null 'training set'). Then, we obtain p-values on the test batch samples under the null hypothesis. Subsequently, we calculate approximate joint p-values for candidate anomalous clusters, defined by (sample subset, test subset) pairs. Our approach sequentially detects the most significant clusters of samples in a networking context. We compare our 'p-value clustering algorithm', using ROC curves, with alternative p-value based methods and with the one-class SVM. All the competing methods make sample-wise detections, i.e. they do not jointly detect anomalous clusters. The anomalous class was either an HTTP bot (Zeus) or peer-to-peer (P2P) traffic. Our p-value clustering approach gives promising results for detecting the Zeus bot and P2P traffic amongst Web.

UR - http://www.scopus.com/inward/record.url?scp=84901431171&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84901431171&partnerID=8YFLogxK

U2 - 10.1109/CISS.2014.6814181

DO - 10.1109/CISS.2014.6814181

M3 - Paper

ER -

Kocak F, Miller DJ, Kesidis G. Detecting anomalous latent classes in a batch of network traffic flows. 2014. Paper presented at 2014 48th Annual Conference on Information Sciences and Systems, CISS 2014, Princeton, NJ, United States. https://doi.org/10.1109/CISS.2014.6814181