Nonparametric distribution estimation in the presence of familial correlation and censoring

Kun Xu, Yanyuan Ma, Yuanjia Wang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods. MSC 2010 subject classifications: Primary 62G08; secondary 62N01.

Original languageEnglish (US)
Pages (from-to)1928-1948
Number of pages21
JournalElectronic Journal of Statistics
Volume11
Issue number1
DOIs
StatePublished - Jan 1 2017

Fingerprint

Censoring
Mutation
Gene
Phenotype
Estimator
Cohort Study
Variance Estimation
Correlation Structure
Breast Cancer
Estimate
Optimality
Data analysis
Distribution Function
Imply
Modeling
Simulation
Family
Observation

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

@article{a03d9b00782d4b128c986f8c0048658e,
title = "Nonparametric distribution estimation in the presence of familial correlation and censoring",
abstract = "We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods. MSC 2010 subject classifications: Primary 62G08; secondary 62N01.",
author = "Kun Xu and Yanyuan Ma and Yuanjia Wang",
year = "2017",
month = "1",
day = "1",
doi = "10.1214/17-EJS1274",
language = "English (US)",
volume = "11",
pages = "1928--1948",
journal = "Electronic Journal of Statistics",
issn = "1935-7524",
publisher = "Institute of Mathematical Statistics",
number = "1",

}

Nonparametric distribution estimation in the presence of familial correlation and censoring. / Xu, Kun; Ma, Yanyuan; Wang, Yuanjia.

In: Electronic Journal of Statistics, Vol. 11, No. 1, 01.01.2017, p. 1928-1948.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Nonparametric distribution estimation in the presence of familial correlation and censoring

AU - Xu, Kun

AU - Ma, Yanyuan

AU - Wang, Yuanjia

PY - 2017/1/1

Y1 - 2017/1/1

N2 - We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods. MSC 2010 subject classifications: Primary 62G08; secondary 62N01.

AB - We propose methods to estimate the distribution functions for multiple populations from mixture data that are only known to belong to a specific population with certain probabilities. The problem is motivated from kin-cohort studies collecting phenotype data in families for various diseases such as the Huntington’s disease (HD) or breast cancer. Relatives in these studies are not genotyped hence only their probabilities of carrying a known causal mutation (e.g., BRCA1 gene mutation or HD gene mutation) can be derived. In addition, phenotype observations from the same family may be correlated due to shared life style or other genes associated with disease, and the observations are subject to censoring. Our estimator does not assume any parametric form of the distributions, and does not require modeling of the correlation structure. It estimates the distributions through using the optimal base estimators and then optimally combine them. The optimality implies both estimation consistency and minimum estimation variance. Simulations and real data analysis on an HD study are performed to illustrate the improved efficiency of the proposed methods. MSC 2010 subject classifications: Primary 62G08; secondary 62N01.

UR - http://www.scopus.com/inward/record.url?scp=85018450578&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85018450578&partnerID=8YFLogxK

U2 - 10.1214/17-EJS1274

DO - 10.1214/17-EJS1274

M3 - Article

AN - SCOPUS:85018450578

VL - 11

SP - 1928

EP - 1948

JO - Electronic Journal of Statistics

JF - Electronic Journal of Statistics

SN - 1935-7524

IS - 1

ER -