Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units

Research output: Contribution to journalArticle

Abstract

In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package “sSDR,” publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.

Original languageEnglish (US)
Pages (from-to)529-539
Number of pages11
JournalBiometrics
Volume73
Issue number2
DOIs
StatePublished - Jun 2017

Fingerprint

Sufficient Dimension Reduction
Ordinary Least Squares
Least-Squares Analysis
least squares
Predictors
Regression
Unit
Dimension Reduction
Variable Selection
Regression Analysis
Genomics
High-dimensional
Integrate
Technology
Engineering
Distinct
engineering
regression analysis
Necessary
Regression analysis

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

@article{bcd1b15603914d0d918f39f0298c86ef,
title = "Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units",
abstract = "In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package “sSDR,” publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.",
author = "Yang Liu and Francesca Chiaromonte and Bing Li",
year = "2017",
month = "6",
doi = "10.1111/biom.12579",
language = "English (US)",
volume = "73",
pages = "529--539",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "2",

}

TY - JOUR

T1 - Structured Ordinary Least Squares

T2 - A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units

AU - Liu, Yang

AU - Chiaromonte, Francesca

AU - Li, Bing

PY - 2017/6

Y1 - 2017/6

N2 - In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package “sSDR,” publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.

AB - In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package “sSDR,” publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.

UR - http://www.scopus.com/inward/record.url?scp=84992520992&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84992520992&partnerID=8YFLogxK

U2 - 10.1111/biom.12579

DO - 10.1111/biom.12579

M3 - Article

C2 - 27649087

AN - SCOPUS:84992520992

VL - 73

SP - 529

EP - 539

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 2

ER -