Multivariate Regression with Respondent-Driven Sampling Data

Project: Research project

Project Details


Abstract Many subpopulations of special interest to public health, such as sex workers, are hard to survey because they are rare and would require a large number of screening interviews to generate a sufficient sample size or because they are stigmatized and unlikely to trust researchers with personal information. Respondent-driven sampling (RDS) is one of the most effective means of sampling such subpopulations, because it asks and incentivizes subpopulation members to recruit other members through their personal social networks and then weights the resultant sample to correct for biases induced by the sampling design and make inferences about univariate statistics that are, under certain conditions, generalizable to the subpopulation of interest. Hundreds of studies have been conducted using RDS, backed by over $166 million of federal funding. The basic methodology of RDS has been subjected to several methodological extensions, evaluations, and criticisms, but prior statistical developments have largely focused on improving estimators for univariate statistics (e.g., prevalence of a risk factor). We propose to extend prior methodological work on statistical estimation in RDS to develop accurate and efficient tools that will allow researchers to estimate the parameters of multivariate regression models which will enhance understandings of hard to survey subpopulations. The current practice of multivariate RDS estimation is ad hoc with researchers applying over 10 distinct approaches throughout the literature but offering little or no justification for the approach they chose. RDS methodologists have yet to establish best practices or evaluate the performance of these different approaches. We propose to perform this evaluation. By doing so, this project will enable future RDS studies to address multivariate research questions about hard to survey subpopulations, and it will add substantial value to the hundreds of RDS studies that have previously been funded and collected. The proposed project has two components that will provide guidance to researchers (and the public health community) about conducting multivariate analyses with RDS data and the tools to conduct these analyses. The first component consists of a series of simulation studies that evaluate the performance of the most popular multivariate RDS estimators. The simulation studies will be designed to explore the performance of the estimators across a range of theoretically ideal and more realistic RDS sampling scenarios as well as a diversity of network types. The second component involves the development and dissemination of software in two commonly used statistical packages (R and Stata) that implements the best performing multivariate estimators identified in the simulation studies. The data collected in RDS studies has vast untapped potential to contribute to understandings of specific risk factors in hard to survey populations and the multivariate tools we will develop as part of this proposal will help to unlock this potential.
Effective start/end date7/1/166/30/18


  • National Institutes of Health: $78,600.00
  • National Institutes of Health: $78,599.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.