TY - JOUR
T1 - Genetic association analysis under complex survey sampling
T2 - The hispanic community health study/study of latinos
AU - Lin, Dan Yu
AU - Tao, Ran
AU - Kalsbeek, William D.
AU - Zeng, Donglin
AU - Gonzalez, Franklyn
AU - Fernández-Rhodes, Lindsay
AU - Graff, Mariaelisa
AU - Koch, Gary G.
AU - North, Kari E.
AU - Heiss, Gerardo
N1 - Funding Information:
This work was supported by NIH awards R01CA082659 (D.-Y.L., R.T., D.Z.), R37GM047845 (D.-Y.L., D.Z.), and U01HG004803 (D.-Y.L., R.T., L.F.-R., M.G., K.E.N., G.H.). The authors thank the staff and participants of the HCHS/SOL for their important contributions. The HCHS/SOL was carried out as a collaborative study supported by contracts from the National Heart, Lung, and Blood Institute (NHLBI) to the University of North Carolina (N01-HC65233), University of Miami (N01-HC65234), Albert Einstein College of Medicine (N01-HC65235), Northwestern University (N01-HC65236), and San Diego State University (N01-HC65237). The following Institutes/Centers/Offices contribute to the HCHS/SOL through a transfer of funds to the NHLBI: National Institute on Minority Health and Health Disparities, National Institute on Deafness and Other Communication Disorders, National Institute of Dental and Craniofacial Research, National Institute of Diabetes and Digestive and Kidney Diseases, National Institute of Neurological Disorders and Stroke, and NIH Institution-Office of Dietary Supplements.
Publisher Copyright:
© 2014 The American Society of Human Genetics.
PY - 2014/12/4
Y1 - 2014/12/4
N2 - The cohort design allows investigators to explore the genetic basis of a variety of diseases and traits in a single study while avoiding major weaknesses of the case-control design. Most cohort studies employ multistage cluster sampling with unequal probabilities to conveniently select participants with desired characteristics, and participants from different clusters might be genetically related. Analysis that ignores the complex sampling design can yield biased estimation of the genetic association and inflation of the type I error. Herein, we develop weighted estimators that reflect unequal selection probabilities and differential nonresponse rates, and we derive variance estimators that properly account for the sampling design and the potential relatedness of participants in different sampling units. We compare, both analytically and numerically, the performance of the proposed weighted estimators with unweighted estimators that disregard the sampling design. We demonstrate the usefulness of the proposed methods through analysis of MetaboChip data in the Hispanic Community Health Study/Study of Latinos, which is the largest health study of the Hispanic/Latino population in the United States aimed at identifying risk factors for various diseases and determining the role of genes and environment in the occurrence of diseases. We provide guidelines on the use of weighted and unweighted estimators, as well as the relevant software.
AB - The cohort design allows investigators to explore the genetic basis of a variety of diseases and traits in a single study while avoiding major weaknesses of the case-control design. Most cohort studies employ multistage cluster sampling with unequal probabilities to conveniently select participants with desired characteristics, and participants from different clusters might be genetically related. Analysis that ignores the complex sampling design can yield biased estimation of the genetic association and inflation of the type I error. Herein, we develop weighted estimators that reflect unequal selection probabilities and differential nonresponse rates, and we derive variance estimators that properly account for the sampling design and the potential relatedness of participants in different sampling units. We compare, both analytically and numerically, the performance of the proposed weighted estimators with unweighted estimators that disregard the sampling design. We demonstrate the usefulness of the proposed methods through analysis of MetaboChip data in the Hispanic Community Health Study/Study of Latinos, which is the largest health study of the Hispanic/Latino population in the United States aimed at identifying risk factors for various diseases and determining the role of genes and environment in the occurrence of diseases. We provide guidelines on the use of weighted and unweighted estimators, as well as the relevant software.
UR - http://www.scopus.com/inward/record.url?scp=84919625332&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919625332&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2014.11.005
DO - 10.1016/j.ajhg.2014.11.005
M3 - Article
C2 - 25480034
AN - SCOPUS:84919625332
SN - 0002-9297
VL - 95
SP - 675
EP - 688
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 6
ER -