Detecting demographic bias in automatically generated personas

Joni Salminen, Bernard James Jansen, Jung Soongyo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.

Original languageEnglish (US)
Title of host publicationCHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450359719
DOIs
StatePublished - May 2 2019
Event2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019 - Glasgow, United Kingdom
Duration: May 4 2019May 9 2019

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019
CountryUnited Kingdom
CityGlasgow
Period5/4/195/9/19

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Graphics and Computer-Aided Design

Cite this

Salminen, J., Jansen, B. J., & Soongyo, J. (2019). Detecting demographic bias in automatically generated personas. In CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems [3313034] (Conference on Human Factors in Computing Systems - Proceedings). Association for Computing Machinery. https://doi.org/10.1145/3290607.3313034
Salminen, Joni ; Jansen, Bernard James ; Soongyo, Jung. / Detecting demographic bias in automatically generated personas. CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2019. (Conference on Human Factors in Computing Systems - Proceedings).
@inproceedings{a2e039aa1e4b491e81f103afe591fc44,
title = "Detecting demographic bias in automatically generated personas",
abstract = "We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.",
author = "Joni Salminen and Jansen, {Bernard James} and Jung Soongyo",
year = "2019",
month = "5",
day = "2",
doi = "10.1145/3290607.3313034",
language = "English (US)",
series = "Conference on Human Factors in Computing Systems - Proceedings",
publisher = "Association for Computing Machinery",
booktitle = "CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems",

}

Salminen, J, Jansen, BJ & Soongyo, J 2019, Detecting demographic bias in automatically generated personas. in CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems., 3313034, Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery, 2019 CHI Conference on Human Factors in Computing Systems, CHI EA 2019, Glasgow, United Kingdom, 5/4/19. https://doi.org/10.1145/3290607.3313034

Detecting demographic bias in automatically generated personas. / Salminen, Joni; Jansen, Bernard James; Soongyo, Jung.

CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery, 2019. 3313034 (Conference on Human Factors in Computing Systems - Proceedings).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Detecting demographic bias in automatically generated personas

AU - Salminen, Joni

AU - Jansen, Bernard James

AU - Soongyo, Jung

PY - 2019/5/2

Y1 - 2019/5/2

N2 - We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.

AB - We investigate the existence of demographic bias in automatically generated personas by producing personas from YouTube Analytics data. Despite the intended objectivity of the methodology, we find elements of bias in the data-driven personas. The bias is highest when doing an exact match comparison, and the bias decreases when comparing at age or gender level. The bias also decreases when increasing the number of generated personas. For example, the smaller number of personas resulted in underrepresentation of female personas. This suggests that a higher number of personas gives a more balanced representation of the user population and a smaller number increases biases. Researchers and practitioners developing data-driven personas should consider the possibility of algorithmic bias, even unintentional, in their personas by comparing the personas against the underlying raw data.

UR - http://www.scopus.com/inward/record.url?scp=85067280102&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067280102&partnerID=8YFLogxK

U2 - 10.1145/3290607.3313034

DO - 10.1145/3290607.3313034

M3 - Conference contribution

AN - SCOPUS:85067280102

T3 - Conference on Human Factors in Computing Systems - Proceedings

BT - CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems

PB - Association for Computing Machinery

ER -

Salminen J, Jansen BJ, Soongyo J. Detecting demographic bias in automatically generated personas. In CHI EA 2019 - Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems. Association for Computing Machinery. 2019. 3313034. (Conference on Human Factors in Computing Systems - Proceedings). https://doi.org/10.1145/3290607.3313034