A comparison of cataloged variation between international HapMap consortium and 1000 genomes project data

Carrie C. Buchanan, Eric S. Torstenson, William S. Bush, Marylyn Deriggi Ritchie

Research output: Contribution to journalArticle

46 Citations (Scopus)

Abstract

Background: Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99% of variants with a MAF of <1%. Methods: To determine whether the 1000 Genomes Project includes all the variants in HapMap, we examined the overlap between single nucleotide polymorphisms (SNPs) genotyped in the two resources using merged phase II/III HapMap data and low coverage pilot data from 1000 Genomes. Results: Comparison of the two data sets showed that approximately 72% of HapMap SNPs were also found in 1000 Genomes Project pilot data. After filtering out HapMap variants with a MAF of <5% (separately for each population), 99% of HapMap SNPs were found in 1000 Genomes data. Conclusions: Not all variants cataloged in HapMap are also cataloged in 1000 Genomes. This could affect decisions about which resource to use for SNP queries, rare variant validation, or imputation. Both the HapMap and 1000 Genomes Project databases are useful resources for human genetics, but it is important to understand the assumptions made and filtering strategies employed by these projects.

Original languageEnglish (US)
Pages (from-to)289-294
Number of pages6
JournalJournal of the American Medical Informatics Association
Volume19
Issue number2
DOIs
StatePublished - Mar 1 2012

Fingerprint

HapMap Project
Genome
Single Nucleotide Polymorphism
Gene Frequency
Genome-Wide Association Study
Medical Genetics
Human Genome
Ethnic Groups
Publications
Databases
Technology

All Science Journal Classification (ASJC) codes

  • Health Informatics

Cite this

Buchanan, Carrie C. ; Torstenson, Eric S. ; Bush, William S. ; Ritchie, Marylyn Deriggi. / A comparison of cataloged variation between international HapMap consortium and 1000 genomes project data. In: Journal of the American Medical Informatics Association. 2012 ; Vol. 19, No. 2. pp. 289-294.
@article{4e3bd12468ee45bcb2265c1b3b22ecf0,
title = "A comparison of cataloged variation between international HapMap consortium and 1000 genomes project data",
abstract = "Background: Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5{\%} in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99{\%} of variants with a MAF of <1{\%}. Methods: To determine whether the 1000 Genomes Project includes all the variants in HapMap, we examined the overlap between single nucleotide polymorphisms (SNPs) genotyped in the two resources using merged phase II/III HapMap data and low coverage pilot data from 1000 Genomes. Results: Comparison of the two data sets showed that approximately 72{\%} of HapMap SNPs were also found in 1000 Genomes Project pilot data. After filtering out HapMap variants with a MAF of <5{\%} (separately for each population), 99{\%} of HapMap SNPs were found in 1000 Genomes data. Conclusions: Not all variants cataloged in HapMap are also cataloged in 1000 Genomes. This could affect decisions about which resource to use for SNP queries, rare variant validation, or imputation. Both the HapMap and 1000 Genomes Project databases are useful resources for human genetics, but it is important to understand the assumptions made and filtering strategies employed by these projects.",
author = "Buchanan, {Carrie C.} and Torstenson, {Eric S.} and Bush, {William S.} and Ritchie, {Marylyn Deriggi}",
year = "2012",
month = "3",
day = "1",
doi = "10.1136/amiajnl-2011-000652",
language = "English (US)",
volume = "19",
pages = "289--294",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "2",

}

A comparison of cataloged variation between international HapMap consortium and 1000 genomes project data. / Buchanan, Carrie C.; Torstenson, Eric S.; Bush, William S.; Ritchie, Marylyn Deriggi.

In: Journal of the American Medical Informatics Association, Vol. 19, No. 2, 01.03.2012, p. 289-294.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A comparison of cataloged variation between international HapMap consortium and 1000 genomes project data

AU - Buchanan, Carrie C.

AU - Torstenson, Eric S.

AU - Bush, William S.

AU - Ritchie, Marylyn Deriggi

PY - 2012/3/1

Y1 - 2012/3/1

N2 - Background: Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99% of variants with a MAF of <1%. Methods: To determine whether the 1000 Genomes Project includes all the variants in HapMap, we examined the overlap between single nucleotide polymorphisms (SNPs) genotyped in the two resources using merged phase II/III HapMap data and low coverage pilot data from 1000 Genomes. Results: Comparison of the two data sets showed that approximately 72% of HapMap SNPs were also found in 1000 Genomes Project pilot data. After filtering out HapMap variants with a MAF of <5% (separately for each population), 99% of HapMap SNPs were found in 1000 Genomes data. Conclusions: Not all variants cataloged in HapMap are also cataloged in 1000 Genomes. This could affect decisions about which resource to use for SNP queries, rare variant validation, or imputation. Both the HapMap and 1000 Genomes Project databases are useful resources for human genetics, but it is important to understand the assumptions made and filtering strategies employed by these projects.

AB - Background: Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99% of variants with a MAF of <1%. Methods: To determine whether the 1000 Genomes Project includes all the variants in HapMap, we examined the overlap between single nucleotide polymorphisms (SNPs) genotyped in the two resources using merged phase II/III HapMap data and low coverage pilot data from 1000 Genomes. Results: Comparison of the two data sets showed that approximately 72% of HapMap SNPs were also found in 1000 Genomes Project pilot data. After filtering out HapMap variants with a MAF of <5% (separately for each population), 99% of HapMap SNPs were found in 1000 Genomes data. Conclusions: Not all variants cataloged in HapMap are also cataloged in 1000 Genomes. This could affect decisions about which resource to use for SNP queries, rare variant validation, or imputation. Both the HapMap and 1000 Genomes Project databases are useful resources for human genetics, but it is important to understand the assumptions made and filtering strategies employed by these projects.

UR - http://www.scopus.com/inward/record.url?scp=84857156825&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84857156825&partnerID=8YFLogxK

U2 - 10.1136/amiajnl-2011-000652

DO - 10.1136/amiajnl-2011-000652

M3 - Article

VL - 19

SP - 289

EP - 294

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 2

ER -