Variable selection via additive conditional independence

Kuang Yao Lee, Bing Li, Hongyu Zhao

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

We propose a non-parametric variable selection method which does not rely on any regression model or predictor distribution. The method is based on a new statistical relationship, called additive conditional independence, that has been introduced recently for graphical models. Unlike most existing variable selection methods, which target the mean of the response, the method proposed targets a set of attributes of the response, such as its mean, variance or entire distribution. In addition, the additive nature of this approach offers non-parametric flexibility without employing multi-dimensional kernels. As a result it retains high accuracy for high dimensional predictors. We establish estimation consistency, convergence rate and variable selection consistency of the method proposed. Through simulation comparisons we demonstrate that the method proposed performs better than existing methods when the predictor affects several attributes of the response, and it performs competently in the classical setting where the predictors affect the mean only. We apply the new method to a data set concerning how gene expression levels affect the weight of mice.

Original languageEnglish (US)
Pages (from-to)1037-1055
Number of pages19
JournalJournal of the Royal Statistical Society. Series B: Statistical Methodology
Volume78
Issue number5
DOIs
StatePublished - Nov 1 2016

Fingerprint

Conditional Independence
Variable Selection
Predictors
Attribute
Target
Conditional independence
Variable selection
Graphical Models
Gene Expression
Convergence Rate
Mouse
Regression Model
High Accuracy
High-dimensional
Flexibility
Entire
kernel

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

@article{f9b5fb5548a04af78d644bfe9c478f87,
title = "Variable selection via additive conditional independence",
abstract = "We propose a non-parametric variable selection method which does not rely on any regression model or predictor distribution. The method is based on a new statistical relationship, called additive conditional independence, that has been introduced recently for graphical models. Unlike most existing variable selection methods, which target the mean of the response, the method proposed targets a set of attributes of the response, such as its mean, variance or entire distribution. In addition, the additive nature of this approach offers non-parametric flexibility without employing multi-dimensional kernels. As a result it retains high accuracy for high dimensional predictors. We establish estimation consistency, convergence rate and variable selection consistency of the method proposed. Through simulation comparisons we demonstrate that the method proposed performs better than existing methods when the predictor affects several attributes of the response, and it performs competently in the classical setting where the predictors affect the mean only. We apply the new method to a data set concerning how gene expression levels affect the weight of mice.",
author = "Lee, {Kuang Yao} and Bing Li and Hongyu Zhao",
year = "2016",
month = "11",
day = "1",
doi = "10.1111/rssb.12150",
language = "English (US)",
volume = "78",
pages = "1037--1055",
journal = "Journal of the Royal Statistical Society. Series B: Statistical Methodology",
issn = "1369-7412",
publisher = "Wiley-Blackwell",
number = "5",

}

Variable selection via additive conditional independence. / Lee, Kuang Yao; Li, Bing; Zhao, Hongyu.

In: Journal of the Royal Statistical Society. Series B: Statistical Methodology, Vol. 78, No. 5, 01.11.2016, p. 1037-1055.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Variable selection via additive conditional independence

AU - Lee, Kuang Yao

AU - Li, Bing

AU - Zhao, Hongyu

PY - 2016/11/1

Y1 - 2016/11/1

N2 - We propose a non-parametric variable selection method which does not rely on any regression model or predictor distribution. The method is based on a new statistical relationship, called additive conditional independence, that has been introduced recently for graphical models. Unlike most existing variable selection methods, which target the mean of the response, the method proposed targets a set of attributes of the response, such as its mean, variance or entire distribution. In addition, the additive nature of this approach offers non-parametric flexibility without employing multi-dimensional kernels. As a result it retains high accuracy for high dimensional predictors. We establish estimation consistency, convergence rate and variable selection consistency of the method proposed. Through simulation comparisons we demonstrate that the method proposed performs better than existing methods when the predictor affects several attributes of the response, and it performs competently in the classical setting where the predictors affect the mean only. We apply the new method to a data set concerning how gene expression levels affect the weight of mice.

AB - We propose a non-parametric variable selection method which does not rely on any regression model or predictor distribution. The method is based on a new statistical relationship, called additive conditional independence, that has been introduced recently for graphical models. Unlike most existing variable selection methods, which target the mean of the response, the method proposed targets a set of attributes of the response, such as its mean, variance or entire distribution. In addition, the additive nature of this approach offers non-parametric flexibility without employing multi-dimensional kernels. As a result it retains high accuracy for high dimensional predictors. We establish estimation consistency, convergence rate and variable selection consistency of the method proposed. Through simulation comparisons we demonstrate that the method proposed performs better than existing methods when the predictor affects several attributes of the response, and it performs competently in the classical setting where the predictors affect the mean only. We apply the new method to a data set concerning how gene expression levels affect the weight of mice.

UR - http://www.scopus.com/inward/record.url?scp=84958817789&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958817789&partnerID=8YFLogxK

U2 - 10.1111/rssb.12150

DO - 10.1111/rssb.12150

M3 - Article

VL - 78

SP - 1037

EP - 1055

JO - Journal of the Royal Statistical Society. Series B: Statistical Methodology

JF - Journal of the Royal Statistical Society. Series B: Statistical Methodology

SN - 1369-7412

IS - 5

ER -