Replicating sequencing-based association studies of rare variants

Dajiang Liu, Suzanne M. Leal

Research output: Chapter in Book/Report/Conference proceedingChapter

1 Citation (Scopus)

Abstract

Large-scale sequence-based association analysis is a powerful approach to identify rare variants involved in complex trait etiologies. Confirmation of significant findings in stage 1 through replication in an independent stage 2 sample is necessary to avoid reporting spurious results. For gene-based mapping of rare variants, where rare variants within a region are analyzed in aggregate, three replication strategies are possible: (1) variant-based replication, wherein only variants from nucleotide sites uncovered in stage 1 within the gene region are genotyped and followed up; (2) sequence-based replication, wherein the gene region is sequenced in the replication sample and both known and novel variants are tested; and (3) exome-array-based replication, where the identified gene region in the stage 1 sample is followed up using exome arrays in the stage 2 sample. The efficiency of the three strategies is dependent on the proportions of causative variants discovered in stage 1, sequencing/genotyping errors, trait-specific genetic architecture, as well as how many variants within the identified gene region are available for genotyping on the exome array. With rigorous population genetic and phenotypic models, it is demonstrated that sequence-based replication is consistently more powerful than variant- and exome-array-based replication, although the power gain can be small. For variant-based replication, if the stage 1 sample consists of several thousands of individuals, a large fraction of causative variant sites can be observed, and even for smaller stage 1 studies, a large proportion of the locus population attributable risk can be explained by the uncovered variants. Exome-array-based replication can have comparable power to the other two approaches if coding variants driving the association are well represented. As a consequence, although sequence-based replication is usually more powerful and also valuable to identify novel potentially causal variants, both variant- and exome-array-based replication can be a viable and cost-effective approach for replicating rare variant associations.

Original languageEnglish (US)
Title of host publicationAssessing Rare Variation in Complex Traits
Subtitle of host publicationDesign and Analysis of Genetic Studies
PublisherSpringer New York
Pages201-213
Number of pages13
ISBN (Electronic)9781493928248
ISBN (Print)9781493928231
DOIs
StatePublished - Jan 1 2015

Fingerprint

Exome
Genes
Chromosome Mapping
Genetic Models
Population Genetics
Nucleotides
Costs and Cost Analysis
Costs
Population

All Science Journal Classification (ASJC) codes

  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Liu, D., & Leal, S. M. (2015). Replicating sequencing-based association studies of rare variants. In Assessing Rare Variation in Complex Traits: Design and Analysis of Genetic Studies (pp. 201-213). Springer New York. https://doi.org/10.1007/978-1-4939-2824-8_14
Liu, Dajiang ; Leal, Suzanne M. / Replicating sequencing-based association studies of rare variants. Assessing Rare Variation in Complex Traits: Design and Analysis of Genetic Studies. Springer New York, 2015. pp. 201-213
@inbook{5654a2b522574f26b7ff84ed8300bb3d,
title = "Replicating sequencing-based association studies of rare variants",
abstract = "Large-scale sequence-based association analysis is a powerful approach to identify rare variants involved in complex trait etiologies. Confirmation of significant findings in stage 1 through replication in an independent stage 2 sample is necessary to avoid reporting spurious results. For gene-based mapping of rare variants, where rare variants within a region are analyzed in aggregate, three replication strategies are possible: (1) variant-based replication, wherein only variants from nucleotide sites uncovered in stage 1 within the gene region are genotyped and followed up; (2) sequence-based replication, wherein the gene region is sequenced in the replication sample and both known and novel variants are tested; and (3) exome-array-based replication, where the identified gene region in the stage 1 sample is followed up using exome arrays in the stage 2 sample. The efficiency of the three strategies is dependent on the proportions of causative variants discovered in stage 1, sequencing/genotyping errors, trait-specific genetic architecture, as well as how many variants within the identified gene region are available for genotyping on the exome array. With rigorous population genetic and phenotypic models, it is demonstrated that sequence-based replication is consistently more powerful than variant- and exome-array-based replication, although the power gain can be small. For variant-based replication, if the stage 1 sample consists of several thousands of individuals, a large fraction of causative variant sites can be observed, and even for smaller stage 1 studies, a large proportion of the locus population attributable risk can be explained by the uncovered variants. Exome-array-based replication can have comparable power to the other two approaches if coding variants driving the association are well represented. As a consequence, although sequence-based replication is usually more powerful and also valuable to identify novel potentially causal variants, both variant- and exome-array-based replication can be a viable and cost-effective approach for replicating rare variant associations.",
author = "Dajiang Liu and Leal, {Suzanne M.}",
year = "2015",
month = "1",
day = "1",
doi = "10.1007/978-1-4939-2824-8_14",
language = "English (US)",
isbn = "9781493928231",
pages = "201--213",
booktitle = "Assessing Rare Variation in Complex Traits",
publisher = "Springer New York",
address = "United States",

}

Liu, D & Leal, SM 2015, Replicating sequencing-based association studies of rare variants. in Assessing Rare Variation in Complex Traits: Design and Analysis of Genetic Studies. Springer New York, pp. 201-213. https://doi.org/10.1007/978-1-4939-2824-8_14

Replicating sequencing-based association studies of rare variants. / Liu, Dajiang; Leal, Suzanne M.

Assessing Rare Variation in Complex Traits: Design and Analysis of Genetic Studies. Springer New York, 2015. p. 201-213.

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Replicating sequencing-based association studies of rare variants

AU - Liu, Dajiang

AU - Leal, Suzanne M.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Large-scale sequence-based association analysis is a powerful approach to identify rare variants involved in complex trait etiologies. Confirmation of significant findings in stage 1 through replication in an independent stage 2 sample is necessary to avoid reporting spurious results. For gene-based mapping of rare variants, where rare variants within a region are analyzed in aggregate, three replication strategies are possible: (1) variant-based replication, wherein only variants from nucleotide sites uncovered in stage 1 within the gene region are genotyped and followed up; (2) sequence-based replication, wherein the gene region is sequenced in the replication sample and both known and novel variants are tested; and (3) exome-array-based replication, where the identified gene region in the stage 1 sample is followed up using exome arrays in the stage 2 sample. The efficiency of the three strategies is dependent on the proportions of causative variants discovered in stage 1, sequencing/genotyping errors, trait-specific genetic architecture, as well as how many variants within the identified gene region are available for genotyping on the exome array. With rigorous population genetic and phenotypic models, it is demonstrated that sequence-based replication is consistently more powerful than variant- and exome-array-based replication, although the power gain can be small. For variant-based replication, if the stage 1 sample consists of several thousands of individuals, a large fraction of causative variant sites can be observed, and even for smaller stage 1 studies, a large proportion of the locus population attributable risk can be explained by the uncovered variants. Exome-array-based replication can have comparable power to the other two approaches if coding variants driving the association are well represented. As a consequence, although sequence-based replication is usually more powerful and also valuable to identify novel potentially causal variants, both variant- and exome-array-based replication can be a viable and cost-effective approach for replicating rare variant associations.

AB - Large-scale sequence-based association analysis is a powerful approach to identify rare variants involved in complex trait etiologies. Confirmation of significant findings in stage 1 through replication in an independent stage 2 sample is necessary to avoid reporting spurious results. For gene-based mapping of rare variants, where rare variants within a region are analyzed in aggregate, three replication strategies are possible: (1) variant-based replication, wherein only variants from nucleotide sites uncovered in stage 1 within the gene region are genotyped and followed up; (2) sequence-based replication, wherein the gene region is sequenced in the replication sample and both known and novel variants are tested; and (3) exome-array-based replication, where the identified gene region in the stage 1 sample is followed up using exome arrays in the stage 2 sample. The efficiency of the three strategies is dependent on the proportions of causative variants discovered in stage 1, sequencing/genotyping errors, trait-specific genetic architecture, as well as how many variants within the identified gene region are available for genotyping on the exome array. With rigorous population genetic and phenotypic models, it is demonstrated that sequence-based replication is consistently more powerful than variant- and exome-array-based replication, although the power gain can be small. For variant-based replication, if the stage 1 sample consists of several thousands of individuals, a large fraction of causative variant sites can be observed, and even for smaller stage 1 studies, a large proportion of the locus population attributable risk can be explained by the uncovered variants. Exome-array-based replication can have comparable power to the other two approaches if coding variants driving the association are well represented. As a consequence, although sequence-based replication is usually more powerful and also valuable to identify novel potentially causal variants, both variant- and exome-array-based replication can be a viable and cost-effective approach for replicating rare variant associations.

UR - http://www.scopus.com/inward/record.url?scp=84944592902&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84944592902&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-2824-8_14

DO - 10.1007/978-1-4939-2824-8_14

M3 - Chapter

SN - 9781493928231

SP - 201

EP - 213

BT - Assessing Rare Variation in Complex Traits

PB - Springer New York

ER -

Liu D, Leal SM. Replicating sequencing-based association studies of rare variants. In Assessing Rare Variation in Complex Traits: Design and Analysis of Genetic Studies. Springer New York. 2015. p. 201-213 https://doi.org/10.1007/978-1-4939-2824-8_14