Learning classifiers from distributional data

Harris T. Lin, Sanghack Lee, Ngot Bui, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Many big data applications give rise to distributional data wherein objects or individuals are naturally represented as K-tuples of bags of feature values where feature values in each bag are sampled from a feature and object specific distribution. We formulate and solve the problem of learning classifiers from distributional data. We consider three classes of methods for learning distributional classifiers: (i) those that rely on aggregation to encode distributional data into tuples of attribute values, i.e., instances that can be handled by traditional supervised machine learning algorithms, (ii) those that are based on generative models of distributional data, and (iii) the discriminative counterparts of the generative models considered in (ii) above. We compare the performance of the different algorithms on real-world as well as synthetic distributional data sets. The results of our experiments demonstrate that classifiers that take advantage of the information available in the distributional instance representation outperform or match the performance of those that fail to fully exploit such information.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Congress on Big Data, BigData 2013
Pages302-309
Number of pages8
DOIs
StatePublished - Oct 28 2013
Event2013 IEEE International Congress on Big Data, BigData 2013 - Santa Clara, CA, United States
Duration: Jun 27 2013Jul 2 2013

Publication series

NameProceedings - 2013 IEEE International Congress on Big Data, BigData 2013

Other

Other2013 IEEE International Congress on Big Data, BigData 2013
CountryUnited States
CitySanta Clara, CA
Period6/27/137/2/13

Fingerprint

Classifiers
Learning algorithms
Learning systems
Agglomeration
Experiments
Big data

All Science Journal Classification (ASJC) codes

  • Computer Science Applications

Cite this

Lin, H. T., Lee, S., Bui, N., & Honavar, V. (2013). Learning classifiers from distributional data. In Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013 (pp. 302-309). [6597151] (Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013). https://doi.org/10.1109/BigData.Congress.2013.47
Lin, Harris T. ; Lee, Sanghack ; Bui, Ngot ; Honavar, Vasant. / Learning classifiers from distributional data. Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013. 2013. pp. 302-309 (Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013).
@inproceedings{8720a0fe63314a729a0bedf0c434f361,
title = "Learning classifiers from distributional data",
abstract = "Many big data applications give rise to distributional data wherein objects or individuals are naturally represented as K-tuples of bags of feature values where feature values in each bag are sampled from a feature and object specific distribution. We formulate and solve the problem of learning classifiers from distributional data. We consider three classes of methods for learning distributional classifiers: (i) those that rely on aggregation to encode distributional data into tuples of attribute values, i.e., instances that can be handled by traditional supervised machine learning algorithms, (ii) those that are based on generative models of distributional data, and (iii) the discriminative counterparts of the generative models considered in (ii) above. We compare the performance of the different algorithms on real-world as well as synthetic distributional data sets. The results of our experiments demonstrate that classifiers that take advantage of the information available in the distributional instance representation outperform or match the performance of those that fail to fully exploit such information.",
author = "Lin, {Harris T.} and Sanghack Lee and Ngot Bui and Vasant Honavar",
year = "2013",
month = "10",
day = "28",
doi = "10.1109/BigData.Congress.2013.47",
language = "English (US)",
isbn = "9780768550060",
series = "Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013",
pages = "302--309",
booktitle = "Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013",

}

Lin, HT, Lee, S, Bui, N & Honavar, V 2013, Learning classifiers from distributional data. in Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013., 6597151, Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013, pp. 302-309, 2013 IEEE International Congress on Big Data, BigData 2013, Santa Clara, CA, United States, 6/27/13. https://doi.org/10.1109/BigData.Congress.2013.47

Learning classifiers from distributional data. / Lin, Harris T.; Lee, Sanghack; Bui, Ngot; Honavar, Vasant.

Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013. 2013. p. 302-309 6597151 (Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Learning classifiers from distributional data

AU - Lin, Harris T.

AU - Lee, Sanghack

AU - Bui, Ngot

AU - Honavar, Vasant

PY - 2013/10/28

Y1 - 2013/10/28

N2 - Many big data applications give rise to distributional data wherein objects or individuals are naturally represented as K-tuples of bags of feature values where feature values in each bag are sampled from a feature and object specific distribution. We formulate and solve the problem of learning classifiers from distributional data. We consider three classes of methods for learning distributional classifiers: (i) those that rely on aggregation to encode distributional data into tuples of attribute values, i.e., instances that can be handled by traditional supervised machine learning algorithms, (ii) those that are based on generative models of distributional data, and (iii) the discriminative counterparts of the generative models considered in (ii) above. We compare the performance of the different algorithms on real-world as well as synthetic distributional data sets. The results of our experiments demonstrate that classifiers that take advantage of the information available in the distributional instance representation outperform or match the performance of those that fail to fully exploit such information.

AB - Many big data applications give rise to distributional data wherein objects or individuals are naturally represented as K-tuples of bags of feature values where feature values in each bag are sampled from a feature and object specific distribution. We formulate and solve the problem of learning classifiers from distributional data. We consider three classes of methods for learning distributional classifiers: (i) those that rely on aggregation to encode distributional data into tuples of attribute values, i.e., instances that can be handled by traditional supervised machine learning algorithms, (ii) those that are based on generative models of distributional data, and (iii) the discriminative counterparts of the generative models considered in (ii) above. We compare the performance of the different algorithms on real-world as well as synthetic distributional data sets. The results of our experiments demonstrate that classifiers that take advantage of the information available in the distributional instance representation outperform or match the performance of those that fail to fully exploit such information.

UR - http://www.scopus.com/inward/record.url?scp=84886077520&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84886077520&partnerID=8YFLogxK

U2 - 10.1109/BigData.Congress.2013.47

DO - 10.1109/BigData.Congress.2013.47

M3 - Conference contribution

AN - SCOPUS:84886077520

SN - 9780768550060

T3 - Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013

SP - 302

EP - 309

BT - Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013

ER -

Lin HT, Lee S, Bui N, Honavar V. Learning classifiers from distributional data. In Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013. 2013. p. 302-309. 6597151. (Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013). https://doi.org/10.1109/BigData.Congress.2013.47