TY - JOUR
T1 - Localizing and classifying adaptive targets with trend filtered regression
AU - Mughal, Mehreen R.
AU - DeGiorgio, Michael
N1 - Funding Information:
We thank Jonathan Terhorst for providing the SMC++ demographic history estimates from Terhorst et al. (2017), Daniel Schrider and Andrew Kern for their help with discoal, and Alexandre Harris for his assistance in preparing scripts for demographic simulations. We also thank Andrew Kern and two anonymous reviewers for their constructive feedback that helped strengthen this manuscript. This research was funded by National Institutes of Health grant R35GM128590, the Alfred P. Sloan Foundation, Pennsylvania State University startup funds, an NIGMS-funded training grant on Computation, Bioinformatics, and Statistics (Predoctoral Training Program T32GM102057), and the NASA Pennsylvania Space Grant Graduate Fellowship. Portions of this research were conducted with Advanced CyberInfrastructure computational resources provided by the Institute for CyberScience at Pennsylvania State University.
Publisher Copyright:
© The Author(s) 2018.
PY - 2019/2/1
Y1 - 2019/2/1
N2 - Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.
AB - Identifying genomic locations of natural selection from sequence data is an ongoing challenge in population genetics. Current methods utilizing information combined from several summary statistics typically assume no correlation of summary statistics regardless of the genomic location from which they are calculated. However, due to linkage disequilibrium, summary statistics calculated at nearby genomic positions are highly correlated. We introduce an approach termed Trendsetter that accounts for the similarity of statistics calculated from adjacent genomic regions through trend filtering, while reducing the effects of multicollinearity through regularization. Our penalized regression framework has high power to detect sweeps, is capable of classifying sweep regions as either hard or soft, and can be applied to other selection scenarios as well. We find that Trendsetter is robust to both extensive missing data and strong background selection, and has comparable power to similar current approaches. Moreover, the model learned by Trendsetter can be viewed as a set of curves modeling the spatial distribution of summary statistics in the genome. Application to human genomic data revealed positively selected regions previously discovered such as LCT in Europeans and EDAR in East Asians. We also identified a number of novel candidates and show that populations with greater relatedness share more sweep signals.
UR - http://www.scopus.com/inward/record.url?scp=85061484054&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061484054&partnerID=8YFLogxK
U2 - 10.1093/molbev/msy205
DO - 10.1093/molbev/msy205
M3 - Article
C2 - 30398642
AN - SCOPUS:85061484054
VL - 36
SP - 252
EP - 270
JO - Molecular Biology and Evolution
JF - Molecular Biology and Evolution
SN - 0737-4038
IS - 2
ER -