Sufficient direction factor model and its application to gene expression quantitative trait loci discovery

F. Jiang, Y. Ma, Y. Wei

Research output: Contribution to journalArticle

Abstract

Rapid improvement in technology has made it relatively cheap to collect genetic data, however statistical analysis of existing data is still much cheaper. Thus, secondary analysis of singlenucleotide polymorphism, SNP, data, i.e., reanalysing existing data in an effort to extract more information, is an attractive and cost-effective alternative to collecting new data. We study the relationship between gene expression and SNPs through a combination of factor analysis and dimension reduction estimation. To take advantage of the flexibility in traditional factor models where the latent factors are not required to be normal, we recommend using semiparametric sufficient dimension reduction methods in the joint estimation of the combined model. The resulting estimator is flexible and has superior performance relative to the existing estimator, which relies on additional assumptions on the latent factors. We quantify the asymptotic performance of the proposed parameter estimator and perform inference by assessing the estimation variability and by constructing confidence intervals. The new results enable us to identify, for the first time, statistically significant SNPs concerning gene-SNP relations in lung tissue from genotype-tissue expression data.

Original languageEnglish (US)
Article numberasz010
Pages (from-to)417-432
Number of pages16
JournalBiometrika
Volume106
Issue number2
DOIs
StatePublished - Jan 1 2019

Fingerprint

Quantitative Trait Loci
Factor Models
Gene expression
Gene Expression
Single Nucleotide Polymorphism
quantitative trait loci
Sufficient
gene expression
Statistical Data Interpretation
Tissue
Factor analysis
Polymorphism
Estimator
confidence interval
statistical analysis
Genes
lungs
Sufficient Dimension Reduction
genetic polymorphism
Statistical Factor Analysis

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Mathematics(all)
  • Agricultural and Biological Sciences (miscellaneous)
  • Agricultural and Biological Sciences(all)
  • Statistics, Probability and Uncertainty
  • Applied Mathematics

Cite this

@article{a7a1cec6131e4df6a1f6010fb71ee28d,
title = "Sufficient direction factor model and its application to gene expression quantitative trait loci discovery",
abstract = "Rapid improvement in technology has made it relatively cheap to collect genetic data, however statistical analysis of existing data is still much cheaper. Thus, secondary analysis of singlenucleotide polymorphism, SNP, data, i.e., reanalysing existing data in an effort to extract more information, is an attractive and cost-effective alternative to collecting new data. We study the relationship between gene expression and SNPs through a combination of factor analysis and dimension reduction estimation. To take advantage of the flexibility in traditional factor models where the latent factors are not required to be normal, we recommend using semiparametric sufficient dimension reduction methods in the joint estimation of the combined model. The resulting estimator is flexible and has superior performance relative to the existing estimator, which relies on additional assumptions on the latent factors. We quantify the asymptotic performance of the proposed parameter estimator and perform inference by assessing the estimation variability and by constructing confidence intervals. The new results enable us to identify, for the first time, statistically significant SNPs concerning gene-SNP relations in lung tissue from genotype-tissue expression data.",
author = "F. Jiang and Y. Ma and Y. Wei",
year = "2019",
month = "1",
day = "1",
doi = "10.1093/biomet/asz010",
language = "English (US)",
volume = "106",
pages = "417--432",
journal = "Biometrika",
issn = "0006-3444",
publisher = "Oxford University Press",
number = "2",

}

Sufficient direction factor model and its application to gene expression quantitative trait loci discovery. / Jiang, F.; Ma, Y.; Wei, Y.

In: Biometrika, Vol. 106, No. 2, asz010, 01.01.2019, p. 417-432.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Sufficient direction factor model and its application to gene expression quantitative trait loci discovery

AU - Jiang, F.

AU - Ma, Y.

AU - Wei, Y.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Rapid improvement in technology has made it relatively cheap to collect genetic data, however statistical analysis of existing data is still much cheaper. Thus, secondary analysis of singlenucleotide polymorphism, SNP, data, i.e., reanalysing existing data in an effort to extract more information, is an attractive and cost-effective alternative to collecting new data. We study the relationship between gene expression and SNPs through a combination of factor analysis and dimension reduction estimation. To take advantage of the flexibility in traditional factor models where the latent factors are not required to be normal, we recommend using semiparametric sufficient dimension reduction methods in the joint estimation of the combined model. The resulting estimator is flexible and has superior performance relative to the existing estimator, which relies on additional assumptions on the latent factors. We quantify the asymptotic performance of the proposed parameter estimator and perform inference by assessing the estimation variability and by constructing confidence intervals. The new results enable us to identify, for the first time, statistically significant SNPs concerning gene-SNP relations in lung tissue from genotype-tissue expression data.

AB - Rapid improvement in technology has made it relatively cheap to collect genetic data, however statistical analysis of existing data is still much cheaper. Thus, secondary analysis of singlenucleotide polymorphism, SNP, data, i.e., reanalysing existing data in an effort to extract more information, is an attractive and cost-effective alternative to collecting new data. We study the relationship between gene expression and SNPs through a combination of factor analysis and dimension reduction estimation. To take advantage of the flexibility in traditional factor models where the latent factors are not required to be normal, we recommend using semiparametric sufficient dimension reduction methods in the joint estimation of the combined model. The resulting estimator is flexible and has superior performance relative to the existing estimator, which relies on additional assumptions on the latent factors. We quantify the asymptotic performance of the proposed parameter estimator and perform inference by assessing the estimation variability and by constructing confidence intervals. The new results enable us to identify, for the first time, statistically significant SNPs concerning gene-SNP relations in lung tissue from genotype-tissue expression data.

UR - http://www.scopus.com/inward/record.url?scp=85071051904&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071051904&partnerID=8YFLogxK

U2 - 10.1093/biomet/asz010

DO - 10.1093/biomet/asz010

M3 - Article

C2 - 31097835

AN - SCOPUS:85071051904

VL - 106

SP - 417

EP - 432

JO - Biometrika

JF - Biometrika

SN - 0006-3444

IS - 2

M1 - asz010

ER -