Estimating disease onset distribution functions in mutation carriers with censored mixture data

Yanyuan Ma, Yuanjia Wang

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


We consider non-parametric estimation of disease onset distribution functions in multiple populations by using censored data with unknown population identifiers. The problem is motivated from studies aiming at estimating the age-specific disease risk distribution in deleterious mutation carriers for genetic counselling and design of therapeutic intervention trials to modify disease progression (i.e. to slow down the development of symptoms and to delay the onset of disease). In some of these studies, the distribution of disease risk in participants assumes a mixture form. Although the population identifiers are missing, study design and scientific knowledge allow calculation of the probability of a subject belonging to each population. We propose a general family of weighted least squares estimators and show that existing consistent non-parametric methods belong to this family. We identify a computationally effortless estimator in the family, study its asymptotic properties and show its significant gain in efficiency compared with the existing estimators in the literature. The application to a large genetic epidemiological study of Huntington's disease reveals information on the age-at-onset distribution of Huntington's disease which sheds light on some clinical hypotheses.

Original languageEnglish (US)
Pages (from-to)1-23
Number of pages23
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Issue number1
StatePublished - Jan 2014

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Estimating disease onset distribution functions in mutation carriers with censored mixture data'. Together they form a unique fingerprint.

Cite this