A flow classifier with tamper-resistant features and an evaluation of its portability to new domains

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Flow classification by application type is motivated by on-line anomaly detection, off-line network planning, and on-line enforcement of terms-of-use policies by public ISPs or by administrators of private-enterprise networks. Both signature matching and a variety of feature-based pattern recognition methods have been applied to address this problem. In this paper, we propose a TCP flow classifier that employs neither packet header information that is protocol-specific (including port numbers) nor packet-payload information. Techniques based on the former are readily evadable, while detailed yet scalable inspection of packet payloads is difficult to achieve, may violate privacy laws, and is defeated by data encryption. Our classifier is tested on two contemporary publicly available datasets recorded in similar networking contexts. We consider the often encountered scenario where ground-truth labels, necessary for supervised classifier training, are unavailable for a domain where flow classification needs to be applied. In this case, one must "port over" a classifier trained on one domain to make decisions on another. We address issues in reconciling differences in class definitions between the two domains. We also demonstrate by our results that domain differences in the class-conditional feature distributions, which will exist in practice, can lead to substantial losses in classification accuracy on the new domain. Finally, we also propose and evaluate a hypothesis testing approach to detect port spoofing by exploiting confusion matrix statistics.

Original languageEnglish (US)
Article number5963163
Pages (from-to)1449-1460
Number of pages12
JournalIEEE Journal on Selected Areas in Communications
Volume29
Issue number7
DOIs
StatePublished - Aug 1 2011

Fingerprint

Classifiers
Cryptography
Pattern recognition
Labels
Inspection
Statistics
Planning
Testing
Industry

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Cite this

@article{5e0801ed1c0749769cd7872a2cc8a5bc,
title = "A flow classifier with tamper-resistant features and an evaluation of its portability to new domains",
abstract = "Flow classification by application type is motivated by on-line anomaly detection, off-line network planning, and on-line enforcement of terms-of-use policies by public ISPs or by administrators of private-enterprise networks. Both signature matching and a variety of feature-based pattern recognition methods have been applied to address this problem. In this paper, we propose a TCP flow classifier that employs neither packet header information that is protocol-specific (including port numbers) nor packet-payload information. Techniques based on the former are readily evadable, while detailed yet scalable inspection of packet payloads is difficult to achieve, may violate privacy laws, and is defeated by data encryption. Our classifier is tested on two contemporary publicly available datasets recorded in similar networking contexts. We consider the often encountered scenario where ground-truth labels, necessary for supervised classifier training, are unavailable for a domain where flow classification needs to be applied. In this case, one must {"}port over{"} a classifier trained on one domain to make decisions on another. We address issues in reconciling differences in class definitions between the two domains. We also demonstrate by our results that domain differences in the class-conditional feature distributions, which will exist in practice, can lead to substantial losses in classification accuracy on the new domain. Finally, we also propose and evaluate a hypothesis testing approach to detect port spoofing by exploiting confusion matrix statistics.",
author = "Guixi Zou and George Kesidis and Miller, {David J.}",
year = "2011",
month = "8",
day = "1",
doi = "10.1109/JSAC.2011.110810",
language = "English (US)",
volume = "29",
pages = "1449--1460",
journal = "IEEE Journal on Selected Areas in Communications",
issn = "0733-8716",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "7",

}

TY - JOUR

T1 - A flow classifier with tamper-resistant features and an evaluation of its portability to new domains

AU - Zou, Guixi

AU - Kesidis, George

AU - Miller, David J.

PY - 2011/8/1

Y1 - 2011/8/1

N2 - Flow classification by application type is motivated by on-line anomaly detection, off-line network planning, and on-line enforcement of terms-of-use policies by public ISPs or by administrators of private-enterprise networks. Both signature matching and a variety of feature-based pattern recognition methods have been applied to address this problem. In this paper, we propose a TCP flow classifier that employs neither packet header information that is protocol-specific (including port numbers) nor packet-payload information. Techniques based on the former are readily evadable, while detailed yet scalable inspection of packet payloads is difficult to achieve, may violate privacy laws, and is defeated by data encryption. Our classifier is tested on two contemporary publicly available datasets recorded in similar networking contexts. We consider the often encountered scenario where ground-truth labels, necessary for supervised classifier training, are unavailable for a domain where flow classification needs to be applied. In this case, one must "port over" a classifier trained on one domain to make decisions on another. We address issues in reconciling differences in class definitions between the two domains. We also demonstrate by our results that domain differences in the class-conditional feature distributions, which will exist in practice, can lead to substantial losses in classification accuracy on the new domain. Finally, we also propose and evaluate a hypothesis testing approach to detect port spoofing by exploiting confusion matrix statistics.

AB - Flow classification by application type is motivated by on-line anomaly detection, off-line network planning, and on-line enforcement of terms-of-use policies by public ISPs or by administrators of private-enterprise networks. Both signature matching and a variety of feature-based pattern recognition methods have been applied to address this problem. In this paper, we propose a TCP flow classifier that employs neither packet header information that is protocol-specific (including port numbers) nor packet-payload information. Techniques based on the former are readily evadable, while detailed yet scalable inspection of packet payloads is difficult to achieve, may violate privacy laws, and is defeated by data encryption. Our classifier is tested on two contemporary publicly available datasets recorded in similar networking contexts. We consider the often encountered scenario where ground-truth labels, necessary for supervised classifier training, are unavailable for a domain where flow classification needs to be applied. In this case, one must "port over" a classifier trained on one domain to make decisions on another. We address issues in reconciling differences in class definitions between the two domains. We also demonstrate by our results that domain differences in the class-conditional feature distributions, which will exist in practice, can lead to substantial losses in classification accuracy on the new domain. Finally, we also propose and evaluate a hypothesis testing approach to detect port spoofing by exploiting confusion matrix statistics.

UR - http://www.scopus.com/inward/record.url?scp=80051494033&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051494033&partnerID=8YFLogxK

U2 - 10.1109/JSAC.2011.110810

DO - 10.1109/JSAC.2011.110810

M3 - Article

AN - SCOPUS:80051494033

VL - 29

SP - 1449

EP - 1460

JO - IEEE Journal on Selected Areas in Communications

JF - IEEE Journal on Selected Areas in Communications

SN - 0733-8716

IS - 7

M1 - 5963163

ER -