De novo inference of stratification and local admixture in sequencing studies

Yu Zhang

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Analysis of population structures and genome local ancestry hasbecome increasingly important in population and disease genetics. With the advance of next generation sequencing technologies, complete genetic variants in individuals' genomes are quickly generated, providing unprecedented opportunities for learning population evolution histories and identifying local genetic signatures at the SNP resolution. The successes of those studies critically rely on accurate and powerful computational tools that can fully utilize the sequencing information. Although many algorithms have been developed for population structure inference and admixture mapping, many of them only work for independent SNPs in genotype or haplotype format, and require a large panel of reference individuals. In this paper, we propose a novel probabilistic method for detecting population structure and local admixture. The method takes input of sequencing data, genotype data and haplotype data. The method characterizes the dependence of genetic variants via haplotype segmentation, such that all variants detected in a sequencing study can be fully utilized for inference. The method further utilizes a infinite-state Bayesian Markov model to perform de novo stratification and admixture inference. Using simulated datasets from HapMapII and 1000Genomes, we show that our method performs superior than several existing algorithms, particularly when limited or no reference individuals are available. Our method is applicable to not only human studies but also studies of other species of interests, for which little reference information is available.Software Availability: http://stat.psu.edu/~yuzhang/software/dbm.tar.

Original languageEnglish (US)
Article numberS17
JournalBMC bioinformatics
Volume14
Issue numberSUPPL.5
DOIs
StatePublished - Apr 10 2013

Fingerprint

Stratification
Sequencing
Population Structure
Genes
Haplotype
Tars
Tar
Haplotypes
Genotype
Availability
Genome
Population
Single Nucleotide Polymorphism
Software
Probabilistic Methods
Bayesian Model
Markov Model
Population Genetics
Signature
Segmentation

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

@article{8460065cf5c542608416b6ac3de6fc56,
title = "De novo inference of stratification and local admixture in sequencing studies",
abstract = "Analysis of population structures and genome local ancestry hasbecome increasingly important in population and disease genetics. With the advance of next generation sequencing technologies, complete genetic variants in individuals' genomes are quickly generated, providing unprecedented opportunities for learning population evolution histories and identifying local genetic signatures at the SNP resolution. The successes of those studies critically rely on accurate and powerful computational tools that can fully utilize the sequencing information. Although many algorithms have been developed for population structure inference and admixture mapping, many of them only work for independent SNPs in genotype or haplotype format, and require a large panel of reference individuals. In this paper, we propose a novel probabilistic method for detecting population structure and local admixture. The method takes input of sequencing data, genotype data and haplotype data. The method characterizes the dependence of genetic variants via haplotype segmentation, such that all variants detected in a sequencing study can be fully utilized for inference. The method further utilizes a infinite-state Bayesian Markov model to perform de novo stratification and admixture inference. Using simulated datasets from HapMapII and 1000Genomes, we show that our method performs superior than several existing algorithms, particularly when limited or no reference individuals are available. Our method is applicable to not only human studies but also studies of other species of interests, for which little reference information is available.Software Availability: http://stat.psu.edu/~yuzhang/software/dbm.tar.",
author = "Yu Zhang",
year = "2013",
month = "4",
day = "10",
doi = "10.1186/1471-2105-14-S5-S17",
language = "English (US)",
volume = "14",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL.5",

}

De novo inference of stratification and local admixture in sequencing studies. / Zhang, Yu.

In: BMC bioinformatics, Vol. 14, No. SUPPL.5, S17, 10.04.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - De novo inference of stratification and local admixture in sequencing studies

AU - Zhang, Yu

PY - 2013/4/10

Y1 - 2013/4/10

N2 - Analysis of population structures and genome local ancestry hasbecome increasingly important in population and disease genetics. With the advance of next generation sequencing technologies, complete genetic variants in individuals' genomes are quickly generated, providing unprecedented opportunities for learning population evolution histories and identifying local genetic signatures at the SNP resolution. The successes of those studies critically rely on accurate and powerful computational tools that can fully utilize the sequencing information. Although many algorithms have been developed for population structure inference and admixture mapping, many of them only work for independent SNPs in genotype or haplotype format, and require a large panel of reference individuals. In this paper, we propose a novel probabilistic method for detecting population structure and local admixture. The method takes input of sequencing data, genotype data and haplotype data. The method characterizes the dependence of genetic variants via haplotype segmentation, such that all variants detected in a sequencing study can be fully utilized for inference. The method further utilizes a infinite-state Bayesian Markov model to perform de novo stratification and admixture inference. Using simulated datasets from HapMapII and 1000Genomes, we show that our method performs superior than several existing algorithms, particularly when limited or no reference individuals are available. Our method is applicable to not only human studies but also studies of other species of interests, for which little reference information is available.Software Availability: http://stat.psu.edu/~yuzhang/software/dbm.tar.

AB - Analysis of population structures and genome local ancestry hasbecome increasingly important in population and disease genetics. With the advance of next generation sequencing technologies, complete genetic variants in individuals' genomes are quickly generated, providing unprecedented opportunities for learning population evolution histories and identifying local genetic signatures at the SNP resolution. The successes of those studies critically rely on accurate and powerful computational tools that can fully utilize the sequencing information. Although many algorithms have been developed for population structure inference and admixture mapping, many of them only work for independent SNPs in genotype or haplotype format, and require a large panel of reference individuals. In this paper, we propose a novel probabilistic method for detecting population structure and local admixture. The method takes input of sequencing data, genotype data and haplotype data. The method characterizes the dependence of genetic variants via haplotype segmentation, such that all variants detected in a sequencing study can be fully utilized for inference. The method further utilizes a infinite-state Bayesian Markov model to perform de novo stratification and admixture inference. Using simulated datasets from HapMapII and 1000Genomes, we show that our method performs superior than several existing algorithms, particularly when limited or no reference individuals are available. Our method is applicable to not only human studies but also studies of other species of interests, for which little reference information is available.Software Availability: http://stat.psu.edu/~yuzhang/software/dbm.tar.

UR - http://www.scopus.com/inward/record.url?scp=84876130606&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876130606&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-14-S5-S17

DO - 10.1186/1471-2105-14-S5-S17

M3 - Article

C2 - 23734678

AN - SCOPUS:84876130606

VL - 14

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL.5

M1 - S17

ER -