Network structure and biased variance estimation in respondent driven sampling

Ashton Michael Verdery, Ted Mouw, Shawn Bauldry, Peter J. Mucha

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.

Original languageEnglish (US)
Article numbere0145296
JournalPloS one
Volume10
Issue number12
DOIs
StatePublished - Dec 1 2015

Fingerprint

Sampling
sampling
Population
Surveys and Questionnaires
Social Support
Research Personnel
Confidence Intervals
social networks
Health
confidence interval
researchers
testing

All Science Journal Classification (ASJC) codes

  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • General

Cite this

Verdery, Ashton Michael ; Mouw, Ted ; Bauldry, Shawn ; Mucha, Peter J. / Network structure and biased variance estimation in respondent driven sampling. In: PloS one. 2015 ; Vol. 10, No. 12.
@article{341b530c4c3d467a919d54a04a30c9d4,
title = "Network structure and biased variance estimation in respondent driven sampling",
abstract = "This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.",
author = "Verdery, {Ashton Michael} and Ted Mouw and Shawn Bauldry and Mucha, {Peter J.}",
year = "2015",
month = "12",
day = "1",
doi = "10.1371/journal.pone.0145296",
language = "English (US)",
volume = "10",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "12",

}

Network structure and biased variance estimation in respondent driven sampling. / Verdery, Ashton Michael; Mouw, Ted; Bauldry, Shawn; Mucha, Peter J.

In: PloS one, Vol. 10, No. 12, e0145296, 01.12.2015.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Network structure and biased variance estimation in respondent driven sampling

AU - Verdery, Ashton Michael

AU - Mouw, Ted

AU - Bauldry, Shawn

AU - Mucha, Peter J.

PY - 2015/12/1

Y1 - 2015/12/1

N2 - This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.

AB - This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.

UR - http://www.scopus.com/inward/record.url?scp=84957596687&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84957596687&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0145296

DO - 10.1371/journal.pone.0145296

M3 - Article

C2 - 26679927

AN - SCOPUS:84957596687

VL - 10

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 12

M1 - e0145296

ER -