TY - JOUR
T1 - Weighted likelihood inference of genomic autozygosity patterns in dense genotype data
AU - Blant, Alexandra
AU - Kwong, Michelle
AU - Szpiech, Zachary A.
AU - Pemberton, Trevor J.
N1 - Funding Information:
This research was supported by an institutional start-up fund from the University of Manitoba (T.J.P.), the University of Manitoba Graduate Enhancement of Tri-Council Stipends (GETS) program (A.B.), and a Natural Sciences and Engineering Research Council of Canada (NSERC) Postgraduate Scholarship (A.B.). Partial support to Z.A.S was provided by the National Human Genome Research Institute of the National Institutes of Health under award number R01HG007644 (awarded to Ryan D. Hernandez, University of California – San Francisco).
Publisher Copyright:
© 2017 The Author(s).
PY - 2017/12/1
Y1 - 2017/12/1
N2 - Background: Genomic regions of autozygosity (ROA) arise when an individual is homozygous for haplotypes inherited identical-by-descent from ancestors shared by both parents. Over the past decade, they have gained importance for understanding evolutionary history and the genetic basis of complex diseases and traits. However, methods to infer ROA in dense genotype data have not evolved in step with advances in genome technology that now enable us to rapidly create large high-resolution genotype datasets, limiting our ability to investigate their constituent ROA patterns. Methods: We report a weighted likelihood approach for inferring ROA in dense genotype data that accounts for autocorrelation among genotyped positions and the possibilities of unobserved mutation and recombination events, and variability in the confidence of individual genotype calls in whole genome sequence (WGS) data. Results: Forward-time genetic simulations under two demographic scenarios that reflect situations where inbreeding and its effect on fitness are of interest suggest this approach is better powered than existing state-of-the-art methods to infer ROA at marker densities consistent with WGS and popular microarray genotyping platforms used in human and non-human studies. Moreover, we present evidence that suggests this approach is able to distinguish ROA arising via consanguinity from ROA arising via endogamy. Using subsets of The 1000 Genomes Project Phase 3 data we show that, relative to WGS, intermediate and long ROA are captured robustly with popular microarray platforms, while detection of short ROA is more variable and improves with marker density. Worldwide ROA patterns inferred from WGS data are found to accord well with those previously reported on the basis of microarray genotype data. Finally, we highlight the potential of this approach to detect genomic regions enriched for autozygosity signals in one group relative to another based upon comparisons of per-individual autozygosity likelihoods instead of inferred ROA frequencies. Conclusions: This weighted likelihood ROA inference approach can assist population- and disease-geneticists working with a wide variety of data types and species to explore ROA patterns and to identify genomic regions with differential ROA signals among groups, thereby advancing our understanding of evolutionary history and the role of recessive variation in phenotypic variation and disease.
AB - Background: Genomic regions of autozygosity (ROA) arise when an individual is homozygous for haplotypes inherited identical-by-descent from ancestors shared by both parents. Over the past decade, they have gained importance for understanding evolutionary history and the genetic basis of complex diseases and traits. However, methods to infer ROA in dense genotype data have not evolved in step with advances in genome technology that now enable us to rapidly create large high-resolution genotype datasets, limiting our ability to investigate their constituent ROA patterns. Methods: We report a weighted likelihood approach for inferring ROA in dense genotype data that accounts for autocorrelation among genotyped positions and the possibilities of unobserved mutation and recombination events, and variability in the confidence of individual genotype calls in whole genome sequence (WGS) data. Results: Forward-time genetic simulations under two demographic scenarios that reflect situations where inbreeding and its effect on fitness are of interest suggest this approach is better powered than existing state-of-the-art methods to infer ROA at marker densities consistent with WGS and popular microarray genotyping platforms used in human and non-human studies. Moreover, we present evidence that suggests this approach is able to distinguish ROA arising via consanguinity from ROA arising via endogamy. Using subsets of The 1000 Genomes Project Phase 3 data we show that, relative to WGS, intermediate and long ROA are captured robustly with popular microarray platforms, while detection of short ROA is more variable and improves with marker density. Worldwide ROA patterns inferred from WGS data are found to accord well with those previously reported on the basis of microarray genotype data. Finally, we highlight the potential of this approach to detect genomic regions enriched for autozygosity signals in one group relative to another based upon comparisons of per-individual autozygosity likelihoods instead of inferred ROA frequencies. Conclusions: This weighted likelihood ROA inference approach can assist population- and disease-geneticists working with a wide variety of data types and species to explore ROA patterns and to identify genomic regions with differential ROA signals among groups, thereby advancing our understanding of evolutionary history and the role of recessive variation in phenotypic variation and disease.
UR - http://www.scopus.com/inward/record.url?scp=85036591621&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85036591621&partnerID=8YFLogxK
U2 - 10.1186/s12864-017-4312-3
DO - 10.1186/s12864-017-4312-3
M3 - Article
C2 - 29191164
AN - SCOPUS:85036591621
SN - 1471-2164
VL - 18
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - 928
ER -