TY - JOUR
T1 - Non-B DNA
T2 - A major contributor to small-and large-scale variation in nucleotide substitution frequencies across the genome
AU - Guiblet, Wilfried M.
AU - Cremona, Marzia A.
AU - Harris, Robert S.
AU - Chen, Di
AU - Eckert, Kristin A.
AU - Chiaromonte, Francesca
AU - Huang, Yi Fei
AU - Makova, Kateryna D.
N1 - Funding Information:
National Institutes of Health [R01GM136684 and R01CA23715]; Clinical and Translational Sciences Institute; Institute of Computational and Data Sciences; Huck Institutes of the Life Sciences; Eberly College of Science of the Pennsylvania State University; Pennsylvania Department of Health using Tobacco Settlement and CURE Funds (the department specifically disclaims any responsibility for any analysesy or conclusions); CBIOS Predoctoral Training Program awarded to Penn State by the National Institutes of Health (W.M.G. is a trainee). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Funding for open access charge: National Institutes of Health.
Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.
PY - 2021/2/22
Y1 - 2021/2/22
N2 - Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
AB - Approximately 13% of the human genome can fold into non-canonical (non-B) DNA structures (e.g. G-quadruplexes, Z-DNA, etc.), which have been implicated in vital cellular processes. Non-B DNA also hinders replication, increasing errors and facilitating mutagenesis, yet its contribution to genome-wide variation in mutation rates remains unexplored. Here, we conducted a comprehensive analysis of nucleotide substitution frequencies at non-B DNA loci within noncoding, non-repetitive genome regions, their ±2 kb flanking regions, and 1-Megabase windows, using human-orangutan divergence and human single-nucleotide polymorphisms. Functional data analysis at single-base resolution demonstrated that substitution frequencies are usually elevated at non-B DNA, with patterns specific to each non-B DNA type. Mirror, direct and inverted repeats have higher substitution frequencies in spacers than in repeat arms, whereas G-quadruplexes, particularly stable ones, have higher substitution frequencies in loops than in stems. Several non-B DNA types also affect substitution frequencies in their flanking regions. Finally, non-B DNA explains more variation than any other predictor in multiple regression models for diversity or divergence at 1-Megabase scale. Thus, non-B DNA substantially contributes to variation in substitution frequencies at small and large scales. Our results highlight the role of non-B DNA in germline mutagenesis with implications to evolution and genetic diseases.
UR - http://www.scopus.com/inward/record.url?scp=85102221473&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102221473&partnerID=8YFLogxK
U2 - 10.1093/nar/gkaa1269
DO - 10.1093/nar/gkaa1269
M3 - Article
C2 - 33450015
AN - SCOPUS:85102221473
SN - 0305-1048
VL - 49
SP - 1497
EP - 1516
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - 3
ER -