Estimating disease onset distribution functions in mutation carriers with censored mixture data

Yanyuan Ma, Yuanjia Wang

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

We consider non-parametric estimation of disease onset distribution functions in multiple populations by using censored data with unknown population identifiers. The problem is motivated from studies aiming at estimating the age-specific disease risk distribution in deleterious mutation carriers for genetic counselling and design of therapeutic intervention trials to modify disease progression (i.e. to slow down the development of symptoms and to delay the onset of disease). In some of these studies, the distribution of disease risk in participants assumes a mixture form. Although the population identifiers are missing, study design and scientific knowledge allow calculation of the probability of a subject belonging to each population. We propose a general family of weighted least squares estimators and show that existing consistent non-parametric methods belong to this family. We identify a computationally effortless estimator in the family, study its asymptotic properties and show its significant gain in efficiency compared with the existing estimators in the literature. The application to a large genetic epidemiological study of Huntington's disease reveals information on the age-at-onset distribution of Huntington's disease which sheds light on some clinical hypotheses.

Original languageEnglish (US)
Pages (from-to)1-23
Number of pages23
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Volume63
Issue number1
DOIs
StatePublished - Jan 1 2014

Fingerprint

Distribution Function
Mutation
Weighted Least Squares Estimator
Estimator
Nonparametric Methods
Censored Data
Nonparametric Estimation
Distribution function
Progression
Asymptotic Properties
Unknown
Family

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

@article{d23c6c3af40a4c9bb7b55433c12b75ba,
title = "Estimating disease onset distribution functions in mutation carriers with censored mixture data",
abstract = "We consider non-parametric estimation of disease onset distribution functions in multiple populations by using censored data with unknown population identifiers. The problem is motivated from studies aiming at estimating the age-specific disease risk distribution in deleterious mutation carriers for genetic counselling and design of therapeutic intervention trials to modify disease progression (i.e. to slow down the development of symptoms and to delay the onset of disease). In some of these studies, the distribution of disease risk in participants assumes a mixture form. Although the population identifiers are missing, study design and scientific knowledge allow calculation of the probability of a subject belonging to each population. We propose a general family of weighted least squares estimators and show that existing consistent non-parametric methods belong to this family. We identify a computationally effortless estimator in the family, study its asymptotic properties and show its significant gain in efficiency compared with the existing estimators in the literature. The application to a large genetic epidemiological study of Huntington's disease reveals information on the age-at-onset distribution of Huntington's disease which sheds light on some clinical hypotheses.",
author = "Yanyuan Ma and Yuanjia Wang",
year = "2014",
month = "1",
day = "1",
doi = "10.1111/rssc.12025",
language = "English (US)",
volume = "63",
pages = "1--23",
journal = "Journal of the Royal Statistical Society. Series C: Applied Statistics",
issn = "0035-9254",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Estimating disease onset distribution functions in mutation carriers with censored mixture data

AU - Ma, Yanyuan

AU - Wang, Yuanjia

PY - 2014/1/1

Y1 - 2014/1/1

N2 - We consider non-parametric estimation of disease onset distribution functions in multiple populations by using censored data with unknown population identifiers. The problem is motivated from studies aiming at estimating the age-specific disease risk distribution in deleterious mutation carriers for genetic counselling and design of therapeutic intervention trials to modify disease progression (i.e. to slow down the development of symptoms and to delay the onset of disease). In some of these studies, the distribution of disease risk in participants assumes a mixture form. Although the population identifiers are missing, study design and scientific knowledge allow calculation of the probability of a subject belonging to each population. We propose a general family of weighted least squares estimators and show that existing consistent non-parametric methods belong to this family. We identify a computationally effortless estimator in the family, study its asymptotic properties and show its significant gain in efficiency compared with the existing estimators in the literature. The application to a large genetic epidemiological study of Huntington's disease reveals information on the age-at-onset distribution of Huntington's disease which sheds light on some clinical hypotheses.

AB - We consider non-parametric estimation of disease onset distribution functions in multiple populations by using censored data with unknown population identifiers. The problem is motivated from studies aiming at estimating the age-specific disease risk distribution in deleterious mutation carriers for genetic counselling and design of therapeutic intervention trials to modify disease progression (i.e. to slow down the development of symptoms and to delay the onset of disease). In some of these studies, the distribution of disease risk in participants assumes a mixture form. Although the population identifiers are missing, study design and scientific knowledge allow calculation of the probability of a subject belonging to each population. We propose a general family of weighted least squares estimators and show that existing consistent non-parametric methods belong to this family. We identify a computationally effortless estimator in the family, study its asymptotic properties and show its significant gain in efficiency compared with the existing estimators in the literature. The application to a large genetic epidemiological study of Huntington's disease reveals information on the age-at-onset distribution of Huntington's disease which sheds light on some clinical hypotheses.

UR - http://www.scopus.com/inward/record.url?scp=84890869454&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890869454&partnerID=8YFLogxK

U2 - 10.1111/rssc.12025

DO - 10.1111/rssc.12025

M3 - Article

AN - SCOPUS:84890869454

VL - 63

SP - 1

EP - 23

JO - Journal of the Royal Statistical Society. Series C: Applied Statistics

JF - Journal of the Royal Statistical Society. Series C: Applied Statistics

SN - 0035-9254

IS - 1

ER -