Analyzing metabolomics data for association with genotypes using two-component gaussian mixture distributions

Jason Westra, Nicholas Hartman, Bethany Lake, Gregory C. Shearer, Nathan Tintle

Research output: Contribution to journalConference article

Abstract

Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). We develop a likelihood ratio test on the mixing proportions of two-component Gaussian mixture distributions and consider more restrictive models to increase power in light of a priori biological knowledge. Data were simulated to validate the improved power of the likelihood ratio test and the restricted likelihood ratio test over a linear model and a log transformed linear model. Then, using real data from the Framingham Heart Study, we analyzed 20,315 SNPs on chromosome 11, demonstrating that the proposed likelihood ratio test identifies SNPs well known to participate in the desaturation of certain fatty acids. Our study both validates the approach of increasing power by using the likelihood ratio test that leverages Gaussian mixture models, and creates a model with improved sensitivity and interpretability.

Original languageEnglish (US)
Pages (from-to)496-506
Number of pages11
JournalPacific Symposium on Biocomputing
Volume0
Issue number212669
DOIs
StatePublished - Jan 1 2018
Event23rd Pacific Symposium on Biocomputing, PSB 2018 - Kohala Coast, United States
Duration: Jan 3 2018Jan 7 2018

Fingerprint

Metabolomics
Normal Distribution
Single Nucleotide Polymorphism
Linear Models
Genotype
Phenotype
Chromosomes, Human, Pair 11
Fatty Acids
Chromosomes
Nucleotides
Polymorphism
Fatty acids

All Science Journal Classification (ASJC) codes

  • Medicine(all)

Cite this

Westra, Jason ; Hartman, Nicholas ; Lake, Bethany ; Shearer, Gregory C. ; Tintle, Nathan. / Analyzing metabolomics data for association with genotypes using two-component gaussian mixture distributions. In: Pacific Symposium on Biocomputing. 2018 ; Vol. 0, No. 212669. pp. 496-506.
@article{6af154bfeb3641f196ccdfdb6c24cc05,
title = "Analyzing metabolomics data for association with genotypes using two-component gaussian mixture distributions",
abstract = "Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). We develop a likelihood ratio test on the mixing proportions of two-component Gaussian mixture distributions and consider more restrictive models to increase power in light of a priori biological knowledge. Data were simulated to validate the improved power of the likelihood ratio test and the restricted likelihood ratio test over a linear model and a log transformed linear model. Then, using real data from the Framingham Heart Study, we analyzed 20,315 SNPs on chromosome 11, demonstrating that the proposed likelihood ratio test identifies SNPs well known to participate in the desaturation of certain fatty acids. Our study both validates the approach of increasing power by using the likelihood ratio test that leverages Gaussian mixture models, and creates a model with improved sensitivity and interpretability.",
author = "Jason Westra and Nicholas Hartman and Bethany Lake and Shearer, {Gregory C.} and Nathan Tintle",
year = "2018",
month = "1",
day = "1",
doi = "10.1142/9789813235533_0045",
language = "English (US)",
volume = "0",
pages = "496--506",
journal = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",
issn = "2335-6936",
number = "212669",

}

Analyzing metabolomics data for association with genotypes using two-component gaussian mixture distributions. / Westra, Jason; Hartman, Nicholas; Lake, Bethany; Shearer, Gregory C.; Tintle, Nathan.

In: Pacific Symposium on Biocomputing, Vol. 0, No. 212669, 01.01.2018, p. 496-506.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Analyzing metabolomics data for association with genotypes using two-component gaussian mixture distributions

AU - Westra, Jason

AU - Hartman, Nicholas

AU - Lake, Bethany

AU - Shearer, Gregory C.

AU - Tintle, Nathan

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). We develop a likelihood ratio test on the mixing proportions of two-component Gaussian mixture distributions and consider more restrictive models to increase power in light of a priori biological knowledge. Data were simulated to validate the improved power of the likelihood ratio test and the restricted likelihood ratio test over a linear model and a log transformed linear model. Then, using real data from the Framingham Heart Study, we analyzed 20,315 SNPs on chromosome 11, demonstrating that the proposed likelihood ratio test identifies SNPs well known to participate in the desaturation of certain fatty acids. Our study both validates the approach of increasing power by using the likelihood ratio test that leverages Gaussian mixture models, and creates a model with improved sensitivity and interpretability.

AB - Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). We develop a likelihood ratio test on the mixing proportions of two-component Gaussian mixture distributions and consider more restrictive models to increase power in light of a priori biological knowledge. Data were simulated to validate the improved power of the likelihood ratio test and the restricted likelihood ratio test over a linear model and a log transformed linear model. Then, using real data from the Framingham Heart Study, we analyzed 20,315 SNPs on chromosome 11, demonstrating that the proposed likelihood ratio test identifies SNPs well known to participate in the desaturation of certain fatty acids. Our study both validates the approach of increasing power by using the likelihood ratio test that leverages Gaussian mixture models, and creates a model with improved sensitivity and interpretability.

UR - http://www.scopus.com/inward/record.url?scp=85048483905&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048483905&partnerID=8YFLogxK

U2 - 10.1142/9789813235533_0045

DO - 10.1142/9789813235533_0045

M3 - Conference article

C2 - 29218908

AN - SCOPUS:85048483905

VL - 0

SP - 496

EP - 506

JO - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

JF - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

SN - 2335-6936

IS - 212669

ER -