Seqchip: A powerful method to integrate sequence and genotype data for the detection of rare variant associations

Dajiang J. Liu, Suzanne M. Leal

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Motivation: Next-generation sequencing greatly increases the capacity to detect rare-variant complex-trait associations. However, it is still expensive to sequence a large number of samples and therefore often small datasets are used. Given cost constraints, a potentially more powerful two-step strategy is to sequence a subset of the sample to discover variants, and genotype the identified variants in the remaining sample. If only cases are sequenced, directly combining sequence and genotype data will lead to inflated type-I errors in rare-variant association analysis. Although several methods have been developed to correct for the bias, they are either underpowered or theoretically invalid. We proposed a new method SEQCHIP to integrate genotype and sequence data, which can be used with most existing rare-variant tests. Results: It is demonstrated using both simulated and real datasets that the SEQCHIP method has controlled type-I errors, and is substantially more powerful than all other currently available methods.

Original languageEnglish (US)
Article numberbts263
Pages (from-to)1745-1751
Number of pages7
JournalBioinformatics
Volume28
Issue number13
DOIs
StatePublished - Jul 1 2012

Fingerprint

Genotype
Integrate
Type I error
Sequencing
Costs
Costs and Cost Analysis
Subset
Datasets

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

@article{2d9f47d9e2b54d1d95c9936c7898cd17,
title = "Seqchip: A powerful method to integrate sequence and genotype data for the detection of rare variant associations",
abstract = "Motivation: Next-generation sequencing greatly increases the capacity to detect rare-variant complex-trait associations. However, it is still expensive to sequence a large number of samples and therefore often small datasets are used. Given cost constraints, a potentially more powerful two-step strategy is to sequence a subset of the sample to discover variants, and genotype the identified variants in the remaining sample. If only cases are sequenced, directly combining sequence and genotype data will lead to inflated type-I errors in rare-variant association analysis. Although several methods have been developed to correct for the bias, they are either underpowered or theoretically invalid. We proposed a new method SEQCHIP to integrate genotype and sequence data, which can be used with most existing rare-variant tests. Results: It is demonstrated using both simulated and real datasets that the SEQCHIP method has controlled type-I errors, and is substantially more powerful than all other currently available methods.",
author = "Liu, {Dajiang J.} and Leal, {Suzanne M.}",
year = "2012",
month = "7",
day = "1",
doi = "10.1093/bioinformatics/bts263",
language = "English (US)",
volume = "28",
pages = "1745--1751",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "13",

}

Seqchip : A powerful method to integrate sequence and genotype data for the detection of rare variant associations. / Liu, Dajiang J.; Leal, Suzanne M.

In: Bioinformatics, Vol. 28, No. 13, bts263, 01.07.2012, p. 1745-1751.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Seqchip

T2 - A powerful method to integrate sequence and genotype data for the detection of rare variant associations

AU - Liu, Dajiang J.

AU - Leal, Suzanne M.

PY - 2012/7/1

Y1 - 2012/7/1

N2 - Motivation: Next-generation sequencing greatly increases the capacity to detect rare-variant complex-trait associations. However, it is still expensive to sequence a large number of samples and therefore often small datasets are used. Given cost constraints, a potentially more powerful two-step strategy is to sequence a subset of the sample to discover variants, and genotype the identified variants in the remaining sample. If only cases are sequenced, directly combining sequence and genotype data will lead to inflated type-I errors in rare-variant association analysis. Although several methods have been developed to correct for the bias, they are either underpowered or theoretically invalid. We proposed a new method SEQCHIP to integrate genotype and sequence data, which can be used with most existing rare-variant tests. Results: It is demonstrated using both simulated and real datasets that the SEQCHIP method has controlled type-I errors, and is substantially more powerful than all other currently available methods.

AB - Motivation: Next-generation sequencing greatly increases the capacity to detect rare-variant complex-trait associations. However, it is still expensive to sequence a large number of samples and therefore often small datasets are used. Given cost constraints, a potentially more powerful two-step strategy is to sequence a subset of the sample to discover variants, and genotype the identified variants in the remaining sample. If only cases are sequenced, directly combining sequence and genotype data will lead to inflated type-I errors in rare-variant association analysis. Although several methods have been developed to correct for the bias, they are either underpowered or theoretically invalid. We proposed a new method SEQCHIP to integrate genotype and sequence data, which can be used with most existing rare-variant tests. Results: It is demonstrated using both simulated and real datasets that the SEQCHIP method has controlled type-I errors, and is substantially more powerful than all other currently available methods.

UR - http://www.scopus.com/inward/record.url?scp=84863995321&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863995321&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts263

DO - 10.1093/bioinformatics/bts263

M3 - Article

C2 - 22556370

AN - SCOPUS:84863995321

VL - 28

SP - 1745

EP - 1751

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 13

M1 - bts263

ER -