The Bayesian lasso for genome-wide association studies

Jiahan Li, Kiranmoy Das, Guifang Fu, Runze Li, Rongling Wu

Research output: Contribution to journalArticle

118 Citations (Scopus)

Abstract

Motivation: Despite their success in identifying genes that affect complex disease or traits, current genome-wide association studies (GWASs) based on a single SNP analysis are too simple to elucidate a comprehensive picture of the genetic architecture of phenotypes. A simultaneous analysis of a large number of SNPs, although statistically challenging, especially with a small number of samples, is crucial for genetic modeling. Method: We propose a two-stage procedure for multi-SNP modeling and analysis in GWASs, by first producing a 'preconditioned' response variable using a supervised principle component analysis and then formulating Bayesian lasso to select a subset of significant SNPs. The Bayesian lasso is implemented with a hierarchical model, in which scale mixtures of normal are used as prior distributions for the genetic effects and exponential priors are considered for their variances, and then solved by using the Markov chain Monte Carlo (MCMC) algorithm. Our approach obviates the choice of the lasso parameter by imposing a diffuse hyperprior on it and estimating it along with other parameters and is particularly powerful for selecting the most relevant SNPs for GWASs, where the number of predictors exceeds the number of observations. Results: The new approach was examined through a simulation study. By using the approach to analyze a real dataset from the Framingham Heart Study, we detected several significant genes that are associated with body mass index (BMI). Our findings support the previous results about BMI-related SNPs and, meanwhile, gain new insights into the genetic control of this trait.

Original languageEnglish (US)
Article numberbtq688
Pages (from-to)516-523
Number of pages8
JournalBioinformatics
Volume27
Issue number4
DOIs
StatePublished - Feb 1 2011

Fingerprint

Lasso
Genome-Wide Association Study
Single Nucleotide Polymorphism
Genome
Genes
Gene
Scale Mixture
Principle Component Analysis
Two-stage Procedure
Markov Chain Monte Carlo Algorithms
Body Mass Index
Hierarchical Model
Prior distribution
Modeling
Phenotype
Predictors
Markov Chains
Exceed
Markov processes
Simulation Study

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Li, Jiahan ; Das, Kiranmoy ; Fu, Guifang ; Li, Runze ; Wu, Rongling. / The Bayesian lasso for genome-wide association studies. In: Bioinformatics. 2011 ; Vol. 27, No. 4. pp. 516-523.
@article{317e1218cdf54031be979d0af1e10728,
title = "The Bayesian lasso for genome-wide association studies",
abstract = "Motivation: Despite their success in identifying genes that affect complex disease or traits, current genome-wide association studies (GWASs) based on a single SNP analysis are too simple to elucidate a comprehensive picture of the genetic architecture of phenotypes. A simultaneous analysis of a large number of SNPs, although statistically challenging, especially with a small number of samples, is crucial for genetic modeling. Method: We propose a two-stage procedure for multi-SNP modeling and analysis in GWASs, by first producing a 'preconditioned' response variable using a supervised principle component analysis and then formulating Bayesian lasso to select a subset of significant SNPs. The Bayesian lasso is implemented with a hierarchical model, in which scale mixtures of normal are used as prior distributions for the genetic effects and exponential priors are considered for their variances, and then solved by using the Markov chain Monte Carlo (MCMC) algorithm. Our approach obviates the choice of the lasso parameter by imposing a diffuse hyperprior on it and estimating it along with other parameters and is particularly powerful for selecting the most relevant SNPs for GWASs, where the number of predictors exceeds the number of observations. Results: The new approach was examined through a simulation study. By using the approach to analyze a real dataset from the Framingham Heart Study, we detected several significant genes that are associated with body mass index (BMI). Our findings support the previous results about BMI-related SNPs and, meanwhile, gain new insights into the genetic control of this trait.",
author = "Jiahan Li and Kiranmoy Das and Guifang Fu and Runze Li and Rongling Wu",
year = "2011",
month = "2",
day = "1",
doi = "10.1093/bioinformatics/btq688",
language = "English (US)",
volume = "27",
pages = "516--523",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "4",

}

The Bayesian lasso for genome-wide association studies. / Li, Jiahan; Das, Kiranmoy; Fu, Guifang; Li, Runze; Wu, Rongling.

In: Bioinformatics, Vol. 27, No. 4, btq688, 01.02.2011, p. 516-523.

Research output: Contribution to journalArticle

TY - JOUR

T1 - The Bayesian lasso for genome-wide association studies

AU - Li, Jiahan

AU - Das, Kiranmoy

AU - Fu, Guifang

AU - Li, Runze

AU - Wu, Rongling

PY - 2011/2/1

Y1 - 2011/2/1

N2 - Motivation: Despite their success in identifying genes that affect complex disease or traits, current genome-wide association studies (GWASs) based on a single SNP analysis are too simple to elucidate a comprehensive picture of the genetic architecture of phenotypes. A simultaneous analysis of a large number of SNPs, although statistically challenging, especially with a small number of samples, is crucial for genetic modeling. Method: We propose a two-stage procedure for multi-SNP modeling and analysis in GWASs, by first producing a 'preconditioned' response variable using a supervised principle component analysis and then formulating Bayesian lasso to select a subset of significant SNPs. The Bayesian lasso is implemented with a hierarchical model, in which scale mixtures of normal are used as prior distributions for the genetic effects and exponential priors are considered for their variances, and then solved by using the Markov chain Monte Carlo (MCMC) algorithm. Our approach obviates the choice of the lasso parameter by imposing a diffuse hyperprior on it and estimating it along with other parameters and is particularly powerful for selecting the most relevant SNPs for GWASs, where the number of predictors exceeds the number of observations. Results: The new approach was examined through a simulation study. By using the approach to analyze a real dataset from the Framingham Heart Study, we detected several significant genes that are associated with body mass index (BMI). Our findings support the previous results about BMI-related SNPs and, meanwhile, gain new insights into the genetic control of this trait.

AB - Motivation: Despite their success in identifying genes that affect complex disease or traits, current genome-wide association studies (GWASs) based on a single SNP analysis are too simple to elucidate a comprehensive picture of the genetic architecture of phenotypes. A simultaneous analysis of a large number of SNPs, although statistically challenging, especially with a small number of samples, is crucial for genetic modeling. Method: We propose a two-stage procedure for multi-SNP modeling and analysis in GWASs, by first producing a 'preconditioned' response variable using a supervised principle component analysis and then formulating Bayesian lasso to select a subset of significant SNPs. The Bayesian lasso is implemented with a hierarchical model, in which scale mixtures of normal are used as prior distributions for the genetic effects and exponential priors are considered for their variances, and then solved by using the Markov chain Monte Carlo (MCMC) algorithm. Our approach obviates the choice of the lasso parameter by imposing a diffuse hyperprior on it and estimating it along with other parameters and is particularly powerful for selecting the most relevant SNPs for GWASs, where the number of predictors exceeds the number of observations. Results: The new approach was examined through a simulation study. By using the approach to analyze a real dataset from the Framingham Heart Study, we detected several significant genes that are associated with body mass index (BMI). Our findings support the previous results about BMI-related SNPs and, meanwhile, gain new insights into the genetic control of this trait.

UR - http://www.scopus.com/inward/record.url?scp=79951530319&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951530319&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btq688

DO - 10.1093/bioinformatics/btq688

M3 - Article

C2 - 21156729

AN - SCOPUS:79951530319

VL - 27

SP - 516

EP - 523

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 4

M1 - btq688

ER -