Abstract
We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods. MSC 2010 subject classifications: Primary 62G08; secondary 62N01.
Original language | English (US) |
---|---|
Pages (from-to) | 1928-1948 |
Number of pages | 21 |
Journal | Electronic Journal of Statistics |
Volume | 11 |
Issue number | 1 |
DOIs | |
State | Published - Jan 1 2017 |
Fingerprint
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Statistics, Probability and Uncertainty
Cite this
}
Nonparametric distribution estimation in the presence of familial correlation and censoring. / Xu, Kun; Ma, Yanyuan; Wang, Yuanjia.
In: Electronic Journal of Statistics, Vol. 11, No. 1, 01.01.2017, p. 1928-1948.Research output: Contribution to journal › Article
TY - JOUR
T1 - Nonparametric distribution estimation in the presence of familial correlation and censoring
AU - Xu, Kun
AU - Ma, Yanyuan
AU - Wang, Yuanjia
PY - 2017/1/1
Y1 - 2017/1/1
N2 - We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods. MSC 2010 subject classifications: Primary 62G08; secondary 62N01.
AB - We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods. MSC 2010 subject classifications: Primary 62G08; secondary 62N01.
UR - http://www.scopus.com/inward/record.url?scp=85018450578&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018450578&partnerID=8YFLogxK
U2 - 10.1214/17-EJS1274
DO - 10.1214/17-EJS1274
M3 - Article
AN - SCOPUS:85018450578
VL - 11
SP - 1928
EP - 1948
JO - Electronic Journal of Statistics
JF - Electronic Journal of Statistics
SN - 1935-7524
IS - 1
ER -