TY - GEN
T1 - A general methodology to quantify biases in natural language data
AU - Chen, Jiawei
AU - Xu, Anbang
AU - Liu, Zhe
AU - Guo, Yufan
AU - Liu, Xiaotong
AU - Tong, Yingbei
AU - Akkiraju, Rama
AU - Carroll, John M.
N1 - Publisher Copyright:
© 2020 Owner/Author.
PY - 2020/4/25
Y1 - 2020/4/25
N2 - Biases in data, such as gender and racial stereotypes, are propagated through intelligent systems and amplified at end-user applications. Existing studies detect and quantify biases based on pre-defined attributes. However, in real practices, it is difficult to gather a comprehensive list of sensitive concepts for various categories of biases. We propose a general methodology to quantify dataset biases by measuring the difference of its data distribution with a reference dataset using Maximum Mean Discrepancy. For the case of natural language data, we show that lexicon-based features quantify explicit stereotypes, while deep learning-based features further capture implicit stereotypes represented by complex semantics. Our method provides a more flexible way to detect potential biases.
AB - Biases in data, such as gender and racial stereotypes, are propagated through intelligent systems and amplified at end-user applications. Existing studies detect and quantify biases based on pre-defined attributes. However, in real practices, it is difficult to gather a comprehensive list of sensitive concepts for various categories of biases. We propose a general methodology to quantify dataset biases by measuring the difference of its data distribution with a reference dataset using Maximum Mean Discrepancy. For the case of natural language data, we show that lexicon-based features quantify explicit stereotypes, while deep learning-based features further capture implicit stereotypes represented by complex semantics. Our method provides a more flexible way to detect potential biases.
UR - http://www.scopus.com/inward/record.url?scp=85090237160&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090237160&partnerID=8YFLogxK
U2 - 10.1145/3334480.3382949
DO - 10.1145/3334480.3382949
M3 - Conference contribution
AN - SCOPUS:85090237160
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI EA 2020 - Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems
PB - Association for Computing Machinery
T2 - 2020 ACM CHI Conference on Human Factors in Computing Systems, CHI EA 2020
Y2 - 25 April 2020 through 30 April 2020
ER -