Population analysis of large copy number variants and hotspots of human genetic disease

Andy Itsara, Gregory M. Cooper, Carl Baker, Santhosh Girirajan, Jun Li, Devin Absher, Ronald M. Krauss, Richard M. Myers, Paul M. Ridker, Daniel I. Chasman, Heather Mefford, Phyllis Ying, Deborah A. Nickerson, Eva E. Eichler

Research output: Contribution to journalArticle

393 Citations (Scopus)

Abstract

Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in ∼2500 individuals by using Illumina SNP data, with an emphasis on "hotspots" prone to recurrent mutations. We find variants larger than 5% kb in 5%-10% of individuals and variants greater than 1 Mb in 1%-2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%-1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.

Original languageEnglish (US)
Pages (from-to)148-161
Number of pages14
JournalAmerican Journal of Human Genetics
Volume84
Issue number2
DOIs
StatePublished - Aug 8 2008

Fingerprint

Inborn Genetic Diseases
Medical Genetics
Population
Gene Frequency
Sample Size
Single Nucleotide Polymorphism
Virulence
Mutation
Genes

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Cite this

Itsara, Andy ; Cooper, Gregory M. ; Baker, Carl ; Girirajan, Santhosh ; Li, Jun ; Absher, Devin ; Krauss, Ronald M. ; Myers, Richard M. ; Ridker, Paul M. ; Chasman, Daniel I. ; Mefford, Heather ; Ying, Phyllis ; Nickerson, Deborah A. ; Eichler, Eva E. / Population analysis of large copy number variants and hotspots of human genetic disease. In: American Journal of Human Genetics. 2008 ; Vol. 84, No. 2. pp. 148-161.
@article{549f093282ed4bdb90d479b015e1f1ce,
title = "Population analysis of large copy number variants and hotspots of human genetic disease",
abstract = "Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in ∼2500 individuals by using Illumina SNP data, with an emphasis on {"}hotspots{"} prone to recurrent mutations. We find variants larger than 5{\%} kb in 5{\%}-10{\%} of individuals and variants greater than 1 Mb in 1{\%}-2{\%}. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1{\%}-1{\%}) CNVs in the general population, with insights relevant to future analyses of genetic disease.",
author = "Andy Itsara and Cooper, {Gregory M.} and Carl Baker and Santhosh Girirajan and Jun Li and Devin Absher and Krauss, {Ronald M.} and Myers, {Richard M.} and Ridker, {Paul M.} and Chasman, {Daniel I.} and Heather Mefford and Phyllis Ying and Nickerson, {Deborah A.} and Eichler, {Eva E.}",
year = "2008",
month = "8",
day = "8",
doi = "10.1016/j.ajhg.2008.12.014",
language = "English (US)",
volume = "84",
pages = "148--161",
journal = "American Journal of Human Genetics",
issn = "0002-9297",
publisher = "Cell Press",
number = "2",

}

Itsara, A, Cooper, GM, Baker, C, Girirajan, S, Li, J, Absher, D, Krauss, RM, Myers, RM, Ridker, PM, Chasman, DI, Mefford, H, Ying, P, Nickerson, DA & Eichler, EE 2008, 'Population analysis of large copy number variants and hotspots of human genetic disease', American Journal of Human Genetics, vol. 84, no. 2, pp. 148-161. https://doi.org/10.1016/j.ajhg.2008.12.014

Population analysis of large copy number variants and hotspots of human genetic disease. / Itsara, Andy; Cooper, Gregory M.; Baker, Carl; Girirajan, Santhosh; Li, Jun; Absher, Devin; Krauss, Ronald M.; Myers, Richard M.; Ridker, Paul M.; Chasman, Daniel I.; Mefford, Heather; Ying, Phyllis; Nickerson, Deborah A.; Eichler, Eva E.

In: American Journal of Human Genetics, Vol. 84, No. 2, 08.08.2008, p. 148-161.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Population analysis of large copy number variants and hotspots of human genetic disease

AU - Itsara, Andy

AU - Cooper, Gregory M.

AU - Baker, Carl

AU - Girirajan, Santhosh

AU - Li, Jun

AU - Absher, Devin

AU - Krauss, Ronald M.

AU - Myers, Richard M.

AU - Ridker, Paul M.

AU - Chasman, Daniel I.

AU - Mefford, Heather

AU - Ying, Phyllis

AU - Nickerson, Deborah A.

AU - Eichler, Eva E.

PY - 2008/8/8

Y1 - 2008/8/8

N2 - Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in ∼2500 individuals by using Illumina SNP data, with an emphasis on "hotspots" prone to recurrent mutations. We find variants larger than 5% kb in 5%-10% of individuals and variants greater than 1 Mb in 1%-2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%-1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.

AB - Copy number variants (CNVs) contribute to human genetic and phenotypic diversity. However, the distribution of larger CNVs in the general population remains largely unexplored. We identify large variants in ∼2500 individuals by using Illumina SNP data, with an emphasis on "hotspots" prone to recurrent mutations. We find variants larger than 5% kb in 5%-10% of individuals and variants greater than 1 Mb in 1%-2%. In contrast to previous studies, we find limited evidence for stratification of CNVs in geographically distinct human populations. Importantly, our sample size permits a robust distinction between truly rare and polymorphic but low-frequency copy number variation. We find that a significant fraction of individual CNVs larger than 100 kb are rare and that both gene density and size are strongly anticorrelated with allele frequency. Thus, although large CNVs commonly exist in normal individuals, which suggests that size alone can not be used as a predictor of pathogenicity, such variation is generally deleterious. Considering these observations, we combine our data with published CNVs from more than 12,000 individuals contrasting control and neurological disease collections. This analysis identifies known disease loci and highlights additional CNVs (e.g., 3q29, 16p12, and 15q25.2) for further investigation. This study provides one of the first analyses of large, rare (0.1%-1%) CNVs in the general population, with insights relevant to future analyses of genetic disease.

UR - http://www.scopus.com/inward/record.url?scp=62649088108&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=62649088108&partnerID=8YFLogxK

U2 - 10.1016/j.ajhg.2008.12.014

DO - 10.1016/j.ajhg.2008.12.014

M3 - Article

C2 - 19166990

AN - SCOPUS:62649088108

VL - 84

SP - 148

EP - 161

JO - American Journal of Human Genetics

JF - American Journal of Human Genetics

SN - 0002-9297

IS - 2

ER -