Evaluating the representativeness in the geographic distribution of twitter user population

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Twitter data are becoming a Big Data stream and have drawn multidisciplinary interests to study population characteristics and social problems that cannot be measured well by traditional surveys. However, the use of Twitter data has been strongly resisted because of concerns about the representativeness of the population as we know little about the demographic characters of the users. It is critical to evaluate the extent to which Twitter users represent the population across different demographic groups. This study evaluates the representativeness and examines the geographic distributions of Twitter user population and its correspondence to the real population. By estimating Twitter user demographics for the contiguous U.S. in 2014, the preliminary results revealed both over- and under-representation of certain demographic groups against the real population at county-level. A representation index is used to assess the representativeness of Twitter samples geographically, which may help further studies to identify the determinants of biases.

Original languageEnglish (US)
Title of host publicationProceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018
EditorsChristopher B. Jones, Ross S. Purves
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450360340
DOIs
StatePublished - Nov 6 2018
Event12th Workshop on Geographic Information Retrieval, GIR 2018 - Seattle, United States
Duration: Nov 6 2018 → …

Publication series

NameProceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018

Conference

Conference12th Workshop on Geographic Information Retrieval, GIR 2018
CountryUnited States
CitySeattle
Period11/6/18 → …

Fingerprint

twitter
population characteristics
social problem
Social Problems
distribution
Big data
Group
determinants
trend

All Science Journal Classification (ASJC) codes

  • Geography, Planning and Development
  • Information Systems
  • Computer Networks and Communications

Cite this

Yin, J., Chi, G., & Van Hook, J. L. (2018). Evaluating the representativeness in the geographic distribution of twitter user population. In C. B. Jones, & R. S. Purves (Eds.), Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018 [6] (Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018). Association for Computing Machinery, Inc. https://doi.org/10.1145/3281354.3281360
Yin, Junjun ; Chi, Guangqing ; Van Hook, Jennifer Lynne. / Evaluating the representativeness in the geographic distribution of twitter user population. Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018. editor / Christopher B. Jones ; Ross S. Purves. Association for Computing Machinery, Inc, 2018. (Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018).
@inproceedings{59d24919a18b4f7fb9958521edbf5304,
title = "Evaluating the representativeness in the geographic distribution of twitter user population",
abstract = "Twitter data are becoming a Big Data stream and have drawn multidisciplinary interests to study population characteristics and social problems that cannot be measured well by traditional surveys. However, the use of Twitter data has been strongly resisted because of concerns about the representativeness of the population as we know little about the demographic characters of the users. It is critical to evaluate the extent to which Twitter users represent the population across different demographic groups. This study evaluates the representativeness and examines the geographic distributions of Twitter user population and its correspondence to the real population. By estimating Twitter user demographics for the contiguous U.S. in 2014, the preliminary results revealed both over- and under-representation of certain demographic groups against the real population at county-level. A representation index is used to assess the representativeness of Twitter samples geographically, which may help further studies to identify the determinants of biases.",
author = "Junjun Yin and Guangqing Chi and {Van Hook}, {Jennifer Lynne}",
year = "2018",
month = "11",
day = "6",
doi = "10.1145/3281354.3281360",
language = "English (US)",
series = "Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018",
publisher = "Association for Computing Machinery, Inc",
editor = "Jones, {Christopher B.} and Purves, {Ross S.}",
booktitle = "Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018",

}

Yin, J, Chi, G & Van Hook, JL 2018, Evaluating the representativeness in the geographic distribution of twitter user population. in CB Jones & RS Purves (eds), Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018., 6, Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018, Association for Computing Machinery, Inc, 12th Workshop on Geographic Information Retrieval, GIR 2018, Seattle, United States, 11/6/18. https://doi.org/10.1145/3281354.3281360

Evaluating the representativeness in the geographic distribution of twitter user population. / Yin, Junjun; Chi, Guangqing; Van Hook, Jennifer Lynne.

Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018. ed. / Christopher B. Jones; Ross S. Purves. Association for Computing Machinery, Inc, 2018. 6 (Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Evaluating the representativeness in the geographic distribution of twitter user population

AU - Yin, Junjun

AU - Chi, Guangqing

AU - Van Hook, Jennifer Lynne

PY - 2018/11/6

Y1 - 2018/11/6

N2 - Twitter data are becoming a Big Data stream and have drawn multidisciplinary interests to study population characteristics and social problems that cannot be measured well by traditional surveys. However, the use of Twitter data has been strongly resisted because of concerns about the representativeness of the population as we know little about the demographic characters of the users. It is critical to evaluate the extent to which Twitter users represent the population across different demographic groups. This study evaluates the representativeness and examines the geographic distributions of Twitter user population and its correspondence to the real population. By estimating Twitter user demographics for the contiguous U.S. in 2014, the preliminary results revealed both over- and under-representation of certain demographic groups against the real population at county-level. A representation index is used to assess the representativeness of Twitter samples geographically, which may help further studies to identify the determinants of biases.

AB - Twitter data are becoming a Big Data stream and have drawn multidisciplinary interests to study population characteristics and social problems that cannot be measured well by traditional surveys. However, the use of Twitter data has been strongly resisted because of concerns about the representativeness of the population as we know little about the demographic characters of the users. It is critical to evaluate the extent to which Twitter users represent the population across different demographic groups. This study evaluates the representativeness and examines the geographic distributions of Twitter user population and its correspondence to the real population. By estimating Twitter user demographics for the contiguous U.S. in 2014, the preliminary results revealed both over- and under-representation of certain demographic groups against the real population at county-level. A representation index is used to assess the representativeness of Twitter samples geographically, which may help further studies to identify the determinants of biases.

UR - http://www.scopus.com/inward/record.url?scp=85061791705&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061791705&partnerID=8YFLogxK

U2 - 10.1145/3281354.3281360

DO - 10.1145/3281354.3281360

M3 - Conference contribution

T3 - Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018

BT - Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018

A2 - Jones, Christopher B.

A2 - Purves, Ross S.

PB - Association for Computing Machinery, Inc

ER -

Yin J, Chi G, Van Hook JL. Evaluating the representativeness in the geographic distribution of twitter user population. In Jones CB, Purves RS, editors, Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018. Association for Computing Machinery, Inc. 2018. 6. (Proceedings of the 12th Workshop on Geographic Information Retrieval, GIR 2018). https://doi.org/10.1145/3281354.3281360