Discovering health-related knowledge in social media using ensembles of heterogeneous features

Suppawong Tuarob, Conrad S. Tucker, Marcel Salathe, Nilam Ram

Research output: Chapter in Book/Report/Conference proceedingConference contribution

20 Citations (Scopus)

Abstract

Social media is emerging as a powerful source of communication, information dissemination and mining. Being colloquial and ubiquitous in nature makes it easier for users to express their opinions and preferences in a seamless, dynamic manner. Epidemic surveillance systems that utilize social media to detect the emergence of diseases have been proposed in the literature. These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. However, such techniques are not optimal for social media where sparsity and noise are norms. The authors address the limitations posed by the traditional N-gram based methods and propose to use features that represent different semantic aspects of the data in combination with ensemble machine learning techniques to identify health-related messages in a heterogenous pool of social media data. Furthermore, the results reveal significant improvement in identifying health related social media content which can be critical in the emergence of a novel, unknown disease epidemic.

Original languageEnglish (US)
Title of host publicationCIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management
Pages1685-1690
Number of pages6
DOIs
StatePublished - Dec 11 2013
Event22nd ACM International Conference on Information and Knowledge Management, CIKM 2013 - San Francisco, CA, United States
Duration: Oct 27 2013Nov 1 2013

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Other

Other22nd ACM International Conference on Information and Knowledge Management, CIKM 2013
CountryUnited States
CitySan Francisco, CA
Period10/27/1311/1/13

Fingerprint

Health
Social media
Machine learning
Communication
Dissemination
Surveillance
Document classification

All Science Journal Classification (ASJC) codes

  • Decision Sciences(all)
  • Business, Management and Accounting(all)

Cite this

Tuarob, S., Tucker, C. S., Salathe, M., & Ram, N. (2013). Discovering health-related knowledge in social media using ensembles of heterogeneous features. In CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (pp. 1685-1690). (International Conference on Information and Knowledge Management, Proceedings). https://doi.org/10.1145/2505515.2505629
Tuarob, Suppawong ; Tucker, Conrad S. ; Salathe, Marcel ; Ram, Nilam. / Discovering health-related knowledge in social media using ensembles of heterogeneous features. CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013. pp. 1685-1690 (International Conference on Information and Knowledge Management, Proceedings).
@inproceedings{d2de8b2f4fe24fa9af76bf75dfc2dca2,
title = "Discovering health-related knowledge in social media using ensembles of heterogeneous features",
abstract = "Social media is emerging as a powerful source of communication, information dissemination and mining. Being colloquial and ubiquitous in nature makes it easier for users to express their opinions and preferences in a seamless, dynamic manner. Epidemic surveillance systems that utilize social media to detect the emergence of diseases have been proposed in the literature. These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. However, such techniques are not optimal for social media where sparsity and noise are norms. The authors address the limitations posed by the traditional N-gram based methods and propose to use features that represent different semantic aspects of the data in combination with ensemble machine learning techniques to identify health-related messages in a heterogenous pool of social media data. Furthermore, the results reveal significant improvement in identifying health related social media content which can be critical in the emergence of a novel, unknown disease epidemic.",
author = "Suppawong Tuarob and Tucker, {Conrad S.} and Marcel Salathe and Nilam Ram",
year = "2013",
month = "12",
day = "11",
doi = "10.1145/2505515.2505629",
language = "English (US)",
isbn = "9781450322638",
series = "International Conference on Information and Knowledge Management, Proceedings",
pages = "1685--1690",
booktitle = "CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management",

}

Tuarob, S, Tucker, CS, Salathe, M & Ram, N 2013, Discovering health-related knowledge in social media using ensembles of heterogeneous features. in CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. International Conference on Information and Knowledge Management, Proceedings, pp. 1685-1690, 22nd ACM International Conference on Information and Knowledge Management, CIKM 2013, San Francisco, CA, United States, 10/27/13. https://doi.org/10.1145/2505515.2505629

Discovering health-related knowledge in social media using ensembles of heterogeneous features. / Tuarob, Suppawong; Tucker, Conrad S.; Salathe, Marcel; Ram, Nilam.

CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013. p. 1685-1690 (International Conference on Information and Knowledge Management, Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Discovering health-related knowledge in social media using ensembles of heterogeneous features

AU - Tuarob, Suppawong

AU - Tucker, Conrad S.

AU - Salathe, Marcel

AU - Ram, Nilam

PY - 2013/12/11

Y1 - 2013/12/11

N2 - Social media is emerging as a powerful source of communication, information dissemination and mining. Being colloquial and ubiquitous in nature makes it easier for users to express their opinions and preferences in a seamless, dynamic manner. Epidemic surveillance systems that utilize social media to detect the emergence of diseases have been proposed in the literature. These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. However, such techniques are not optimal for social media where sparsity and noise are norms. The authors address the limitations posed by the traditional N-gram based methods and propose to use features that represent different semantic aspects of the data in combination with ensemble machine learning techniques to identify health-related messages in a heterogenous pool of social media data. Furthermore, the results reveal significant improvement in identifying health related social media content which can be critical in the emergence of a novel, unknown disease epidemic.

AB - Social media is emerging as a powerful source of communication, information dissemination and mining. Being colloquial and ubiquitous in nature makes it easier for users to express their opinions and preferences in a seamless, dynamic manner. Epidemic surveillance systems that utilize social media to detect the emergence of diseases have been proposed in the literature. These systems mostly employ traditional document classification techniques that represent a document with a bag of N-grams. However, such techniques are not optimal for social media where sparsity and noise are norms. The authors address the limitations posed by the traditional N-gram based methods and propose to use features that represent different semantic aspects of the data in combination with ensemble machine learning techniques to identify health-related messages in a heterogenous pool of social media data. Furthermore, the results reveal significant improvement in identifying health related social media content which can be critical in the emergence of a novel, unknown disease epidemic.

UR - http://www.scopus.com/inward/record.url?scp=84889601386&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84889601386&partnerID=8YFLogxK

U2 - 10.1145/2505515.2505629

DO - 10.1145/2505515.2505629

M3 - Conference contribution

AN - SCOPUS:84889601386

SN - 9781450322638

T3 - International Conference on Information and Knowledge Management, Proceedings

SP - 1685

EP - 1690

BT - CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management

ER -

Tuarob S, Tucker CS, Salathe M, Ram N. Discovering health-related knowledge in social media using ensembles of heterogeneous features. In CIKM 2013 - Proceedings of the 22nd ACM International Conference on Information and Knowledge Management. 2013. p. 1685-1690. (International Conference on Information and Knowledge Management, Proceedings). https://doi.org/10.1145/2505515.2505629