Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data

Carrie B. Moore, John R. Wallace, Daniel J. Wolfe, Alex T. Frase, Sarah A. Pendergrass, Kenneth M. Weiss, Marylyn D. Ritchie

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.

Original languageEnglish (US)
Article numbere1003959
JournalPLoS genetics
Volume9
Issue number12
DOIs
StatePublished - Dec 1 2013

Fingerprint

stratification
genome
Genome
Population
gene
heritability
genomics
DNA
project
Nucleic Acid Regulatory Sequences
Regulator Genes
DNA Sequence Analysis
human population
Databases
genes
sequence analysis
Genes

All Science Journal Classification (ASJC) codes

  • Ecology, Evolution, Behavior and Systematics
  • Molecular Biology
  • Genetics
  • Genetics(clinical)
  • Cancer Research

Cite this

Moore, C. B., Wallace, J. R., Wolfe, D. J., Frase, A. T., Pendergrass, S. A., Weiss, K. M., & Ritchie, M. D. (2013). Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data. PLoS genetics, 9(12), [e1003959]. https://doi.org/10.1371/journal.pgen.1003959
Moore, Carrie B. ; Wallace, John R. ; Wolfe, Daniel J. ; Frase, Alex T. ; Pendergrass, Sarah A. ; Weiss, Kenneth M. ; Ritchie, Marylyn D. / Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data. In: PLoS genetics. 2013 ; Vol. 9, No. 12.
@article{72544fad054849a39993bf2e7375bc25,
title = "Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data",
abstract = "Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87{\%} of gene bins, 35.47{\%} of intergenic bins, 42.85{\%} of pathway bins, 14.86{\%} of ORegAnno regulatory bins, and 5.97{\%} of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.",
author = "Moore, {Carrie B.} and Wallace, {John R.} and Wolfe, {Daniel J.} and Frase, {Alex T.} and Pendergrass, {Sarah A.} and Weiss, {Kenneth M.} and Ritchie, {Marylyn D.}",
year = "2013",
month = "12",
day = "1",
doi = "10.1371/journal.pgen.1003959",
language = "English (US)",
volume = "9",
journal = "PLoS Genetics",
issn = "1553-7390",
publisher = "Public Library of Science",
number = "12",

}

Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data. / Moore, Carrie B.; Wallace, John R.; Wolfe, Daniel J.; Frase, Alex T.; Pendergrass, Sarah A.; Weiss, Kenneth M.; Ritchie, Marylyn D.

In: PLoS genetics, Vol. 9, No. 12, e1003959, 01.12.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Low Frequency Variants, Collapsed Based on Biological Knowledge, Uncover Complexity of Population Stratification in 1000 Genomes Project Data

AU - Moore, Carrie B.

AU - Wallace, John R.

AU - Wolfe, Daniel J.

AU - Frase, Alex T.

AU - Pendergrass, Sarah A.

AU - Weiss, Kenneth M.

AU - Ritchie, Marylyn D.

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.

AB - Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses.

UR - http://www.scopus.com/inward/record.url?scp=84892707535&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892707535&partnerID=8YFLogxK

U2 - 10.1371/journal.pgen.1003959

DO - 10.1371/journal.pgen.1003959

M3 - Article

C2 - 24385916

AN - SCOPUS:84892707535

VL - 9

JO - PLoS Genetics

JF - PLoS Genetics

SN - 1553-7390

IS - 12

M1 - e1003959

ER -