Multivariate stream data classification using simple text classifiers

Sungbo Seo, Jaewoo Kang, Dongwon Lee, Keun Ho Ryu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Naïve Bayes Model and SVM, and for unsupervised, we tested Jaccard, TFIDF, Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed the other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

Original languageEnglish (US)
Title of host publicationDatabase and Expert Systems Applications - 17th International Conference, DEXA 2006, Proceedings
PublisherSpringer Verlag
Pages420-429
Number of pages10
ISBN (Print)3540378715, 9783540378716
StatePublished - Jan 1 2006
Event17th International Conference on Database and Expert Systems Applications, DEXA 2006 - Krakow, Poland
Duration: Sep 4 2006Sep 8 2006

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4080 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other17th International Conference on Database and Expert Systems Applications, DEXA 2006
CountryPoland
CityKrakow
Period9/4/069/8/06

Fingerprint

Data Classification
Classifiers
Classifier
TF-IDF
Classification Algorithm
Data Streams
Unsupervised Classification
N-gram
Text Classification
Supervised Classification
Sliding Window
Bayes
Preprocessing
Strings
Classify
Attribute
Text
Experiment
Experiments
Model

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Seo, S., Kang, J., Lee, D., & Ryu, K. H. (2006). Multivariate stream data classification using simple text classifiers. In Database and Expert Systems Applications - 17th International Conference, DEXA 2006, Proceedings (pp. 420-429). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4080 LNCS). Springer Verlag.
Seo, Sungbo ; Kang, Jaewoo ; Lee, Dongwon ; Ryu, Keun Ho. / Multivariate stream data classification using simple text classifiers. Database and Expert Systems Applications - 17th International Conference, DEXA 2006, Proceedings. Springer Verlag, 2006. pp. 420-429 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{1d23cde56fff41e0b4f051a2f93c3c4a,
title = "Multivariate stream data classification using simple text classifiers",
abstract = "We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Na{\"i}ve Bayes Model and SVM, and for unsupervised, we tested Jaccard, TFIDF, Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed the other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.",
author = "Sungbo Seo and Jaewoo Kang and Dongwon Lee and Ryu, {Keun Ho}",
year = "2006",
month = "1",
day = "1",
language = "English (US)",
isbn = "3540378715",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "420--429",
booktitle = "Database and Expert Systems Applications - 17th International Conference, DEXA 2006, Proceedings",
address = "Germany",

}

Seo, S, Kang, J, Lee, D & Ryu, KH 2006, Multivariate stream data classification using simple text classifiers. in Database and Expert Systems Applications - 17th International Conference, DEXA 2006, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4080 LNCS, Springer Verlag, pp. 420-429, 17th International Conference on Database and Expert Systems Applications, DEXA 2006, Krakow, Poland, 9/4/06.

Multivariate stream data classification using simple text classifiers. / Seo, Sungbo; Kang, Jaewoo; Lee, Dongwon; Ryu, Keun Ho.

Database and Expert Systems Applications - 17th International Conference, DEXA 2006, Proceedings. Springer Verlag, 2006. p. 420-429 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4080 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Multivariate stream data classification using simple text classifiers

AU - Seo, Sungbo

AU - Kang, Jaewoo

AU - Lee, Dongwon

AU - Ryu, Keun Ho

PY - 2006/1/1

Y1 - 2006/1/1

N2 - We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Naïve Bayes Model and SVM, and for unsupervised, we tested Jaccard, TFIDF, Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed the other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

AB - We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of multivariate stream data and discretizes the data in the window into a string of symbols that characterize the signal changes. In the classification step, it uses a simple text classification algorithm to classify the discretized data in the window. We evaluated both supervised and unsupervised classification algorithms. For supervised, we tested Naïve Bayes Model and SVM, and for unsupervised, we tested Jaccard, TFIDF, Jaro and Jaro Winkler. In our experiments, SVM and TFIDF outperformed the other classification methods. In particular, we observed that classification accuracy is improved when the correlation of attributes is also considered along with the n-gram tokens of symbols.

UR - http://www.scopus.com/inward/record.url?scp=33749408767&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749408767&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:33749408767

SN - 3540378715

SN - 9783540378716

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 420

EP - 429

BT - Database and Expert Systems Applications - 17th International Conference, DEXA 2006, Proceedings

PB - Springer Verlag

ER -

Seo S, Kang J, Lee D, Ryu KH. Multivariate stream data classification using simple text classifiers. In Database and Expert Systems Applications - 17th International Conference, DEXA 2006, Proceedings. Springer Verlag. 2006. p. 420-429. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).