Enrichment sampling for a multi-site patient survey using electronic health records and census data

Nathaniel D. Mercaldo, Kyle B. Brothers, David S. Carrell, Ellen W. Clayton, John J. Connolly, Ingrid A. Holm, Carol R. Horowitz, Gail P. Jarvik, Terrie E. Kitchner, Rongling Li, Catherine A. McCarty, Jennifer B. McCormick, Valerie D. McManus, Melanie F. Myers, Joshua J. Pankratz, Martha J. Shrubsole, Maureen E. Smith, Sarah C. Stallings, Janet L. Williams, Jonathan S. Schildcrout

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Objective: We describe a stratified sampling design that combines electronic health records (EHRs) and United States Census (USC) data to construct the sampling frame and an algorithm to enrich the sample with individuals belonging to rarer strata. Materials and Methods: This design was developed for a multi-site survey that sought to examine patient concerns about and barriers to participating in research studies, especially among under-studied populations (eg, minorities, low educational attainment). We defined sampling strata by cross-tabulating several sociodemographic variables obtained from EHR and augmented with census-block-level USC data. We oversampled rarer and historically underrepresented subpopulations. Results: The sampling strategy, which included USC-supplemented EHR data, led to a far more diverse sample than would have been expected under random sampling (eg, 3-, 8-, 7-, and 12-fold increase in African Americans, Asians, Hispanics and those with less than a high school degree, respectively). We observed that our EHR data tended to misclassify minority races more often than majority races, and that non-majority races, Latino ethnicity, younger adult age, lower education, and urban/suburban living were each associated with lower response rates to the mailed surveys.

Original languageEnglish (US)
Pages (from-to)219-227
Number of pages9
JournalJournal of the American Medical Informatics Association
Volume26
Issue number3
DOIs
StatePublished - Jan 1 2019

Fingerprint

Electronic Health Records
Censuses
Hispanic Americans
African Americans
Young Adult
Education
Surveys and Questionnaires
Research
Population

All Science Journal Classification (ASJC) codes

  • Health Informatics

Cite this

Mercaldo, N. D., Brothers, K. B., Carrell, D. S., Clayton, E. W., Connolly, J. J., Holm, I. A., ... Schildcrout, J. S. (2019). Enrichment sampling for a multi-site patient survey using electronic health records and census data. Journal of the American Medical Informatics Association, 26(3), 219-227. https://doi.org/10.1093/jamia/ocy164
Mercaldo, Nathaniel D. ; Brothers, Kyle B. ; Carrell, David S. ; Clayton, Ellen W. ; Connolly, John J. ; Holm, Ingrid A. ; Horowitz, Carol R. ; Jarvik, Gail P. ; Kitchner, Terrie E. ; Li, Rongling ; McCarty, Catherine A. ; McCormick, Jennifer B. ; McManus, Valerie D. ; Myers, Melanie F. ; Pankratz, Joshua J. ; Shrubsole, Martha J. ; Smith, Maureen E. ; Stallings, Sarah C. ; Williams, Janet L. ; Schildcrout, Jonathan S. / Enrichment sampling for a multi-site patient survey using electronic health records and census data. In: Journal of the American Medical Informatics Association. 2019 ; Vol. 26, No. 3. pp. 219-227.
@article{6d0dae63c6b84a12a970c2400ef33d63,
title = "Enrichment sampling for a multi-site patient survey using electronic health records and census data",
abstract = "Objective: We describe a stratified sampling design that combines electronic health records (EHRs) and United States Census (USC) data to construct the sampling frame and an algorithm to enrich the sample with individuals belonging to rarer strata. Materials and Methods: This design was developed for a multi-site survey that sought to examine patient concerns about and barriers to participating in research studies, especially among under-studied populations (eg, minorities, low educational attainment). We defined sampling strata by cross-tabulating several sociodemographic variables obtained from EHR and augmented with census-block-level USC data. We oversampled rarer and historically underrepresented subpopulations. Results: The sampling strategy, which included USC-supplemented EHR data, led to a far more diverse sample than would have been expected under random sampling (eg, 3-, 8-, 7-, and 12-fold increase in African Americans, Asians, Hispanics and those with less than a high school degree, respectively). We observed that our EHR data tended to misclassify minority races more often than majority races, and that non-majority races, Latino ethnicity, younger adult age, lower education, and urban/suburban living were each associated with lower response rates to the mailed surveys.",
author = "Mercaldo, {Nathaniel D.} and Brothers, {Kyle B.} and Carrell, {David S.} and Clayton, {Ellen W.} and Connolly, {John J.} and Holm, {Ingrid A.} and Horowitz, {Carol R.} and Jarvik, {Gail P.} and Kitchner, {Terrie E.} and Rongling Li and McCarty, {Catherine A.} and McCormick, {Jennifer B.} and McManus, {Valerie D.} and Myers, {Melanie F.} and Pankratz, {Joshua J.} and Shrubsole, {Martha J.} and Smith, {Maureen E.} and Stallings, {Sarah C.} and Williams, {Janet L.} and Schildcrout, {Jonathan S.}",
year = "2019",
month = "1",
day = "1",
doi = "10.1093/jamia/ocy164",
language = "English (US)",
volume = "26",
pages = "219--227",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "3",

}

Mercaldo, ND, Brothers, KB, Carrell, DS, Clayton, EW, Connolly, JJ, Holm, IA, Horowitz, CR, Jarvik, GP, Kitchner, TE, Li, R, McCarty, CA, McCormick, JB, McManus, VD, Myers, MF, Pankratz, JJ, Shrubsole, MJ, Smith, ME, Stallings, SC, Williams, JL & Schildcrout, JS 2019, 'Enrichment sampling for a multi-site patient survey using electronic health records and census data', Journal of the American Medical Informatics Association, vol. 26, no. 3, pp. 219-227. https://doi.org/10.1093/jamia/ocy164

Enrichment sampling for a multi-site patient survey using electronic health records and census data. / Mercaldo, Nathaniel D.; Brothers, Kyle B.; Carrell, David S.; Clayton, Ellen W.; Connolly, John J.; Holm, Ingrid A.; Horowitz, Carol R.; Jarvik, Gail P.; Kitchner, Terrie E.; Li, Rongling; McCarty, Catherine A.; McCormick, Jennifer B.; McManus, Valerie D.; Myers, Melanie F.; Pankratz, Joshua J.; Shrubsole, Martha J.; Smith, Maureen E.; Stallings, Sarah C.; Williams, Janet L.; Schildcrout, Jonathan S.

In: Journal of the American Medical Informatics Association, Vol. 26, No. 3, 01.01.2019, p. 219-227.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Enrichment sampling for a multi-site patient survey using electronic health records and census data

AU - Mercaldo, Nathaniel D.

AU - Brothers, Kyle B.

AU - Carrell, David S.

AU - Clayton, Ellen W.

AU - Connolly, John J.

AU - Holm, Ingrid A.

AU - Horowitz, Carol R.

AU - Jarvik, Gail P.

AU - Kitchner, Terrie E.

AU - Li, Rongling

AU - McCarty, Catherine A.

AU - McCormick, Jennifer B.

AU - McManus, Valerie D.

AU - Myers, Melanie F.

AU - Pankratz, Joshua J.

AU - Shrubsole, Martha J.

AU - Smith, Maureen E.

AU - Stallings, Sarah C.

AU - Williams, Janet L.

AU - Schildcrout, Jonathan S.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Objective: We describe a stratified sampling design that combines electronic health records (EHRs) and United States Census (USC) data to construct the sampling frame and an algorithm to enrich the sample with individuals belonging to rarer strata. Materials and Methods: This design was developed for a multi-site survey that sought to examine patient concerns about and barriers to participating in research studies, especially among under-studied populations (eg, minorities, low educational attainment). We defined sampling strata by cross-tabulating several sociodemographic variables obtained from EHR and augmented with census-block-level USC data. We oversampled rarer and historically underrepresented subpopulations. Results: The sampling strategy, which included USC-supplemented EHR data, led to a far more diverse sample than would have been expected under random sampling (eg, 3-, 8-, 7-, and 12-fold increase in African Americans, Asians, Hispanics and those with less than a high school degree, respectively). We observed that our EHR data tended to misclassify minority races more often than majority races, and that non-majority races, Latino ethnicity, younger adult age, lower education, and urban/suburban living were each associated with lower response rates to the mailed surveys.

AB - Objective: We describe a stratified sampling design that combines electronic health records (EHRs) and United States Census (USC) data to construct the sampling frame and an algorithm to enrich the sample with individuals belonging to rarer strata. Materials and Methods: This design was developed for a multi-site survey that sought to examine patient concerns about and barriers to participating in research studies, especially among under-studied populations (eg, minorities, low educational attainment). We defined sampling strata by cross-tabulating several sociodemographic variables obtained from EHR and augmented with census-block-level USC data. We oversampled rarer and historically underrepresented subpopulations. Results: The sampling strategy, which included USC-supplemented EHR data, led to a far more diverse sample than would have been expected under random sampling (eg, 3-, 8-, 7-, and 12-fold increase in African Americans, Asians, Hispanics and those with less than a high school degree, respectively). We observed that our EHR data tended to misclassify minority races more often than majority races, and that non-majority races, Latino ethnicity, younger adult age, lower education, and urban/suburban living were each associated with lower response rates to the mailed surveys.

UR - http://www.scopus.com/inward/record.url?scp=85060814318&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060814318&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocy164

DO - 10.1093/jamia/ocy164

M3 - Article

C2 - 30590688

AN - SCOPUS:85060814318

VL - 26

SP - 219

EP - 227

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 3

ER -