Asymptotically distribution free tests in heteroscedastic unbalanced high dimensional anova

Haiyan Wang, Michael G. Akritas

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this paper, we develop the asymptotic theory for hypotheses testing in high-dimensional analysis of variance (HANOVA) when the distributions are completely unspecified. Most results in the literature have been restricted to observations of no more than two-way designs for continuous data. Here we formulate the local alternatives in terms of departures from the null distribution so that the responses can be either continuous or categorical. The asymptotic theory is presented for testing of main factor and interaction effects of up to order three in unbalanced designs with heteroscedastic variances and arbitrary number of factors. The test statistics are based on quadratic forms whose asymptotic theory is derived under non-classical settings where the number of variables is large while the number of replications may be limited. Simulation results show that the present test statistics perform well in both continuous and discrete HANOVA in type I error accuracy, power performance, and computing time. The proposed test is illustrated with a gene expression data analysis of Arabidopsis thaiana in response to multiple abiotic stresses.

Original languageEnglish (US)
Pages (from-to)1341-1377
Number of pages37
JournalStatistica Sinica
Volume21
Issue number3
DOIs
StatePublished - Jul 1 2011

Fingerprint

Distribution-free Test
Asymptotic Theory
High-dimensional
Dimensional Analysis
Analysis of variance
Test Statistic
Unbalanced Designs
Arabidopsis
Local Alternatives
Interaction Effects
Type I error
Null Distribution
Hypothesis Testing
Gene Expression Data
Quadratic form
Categorical
Replication
Data analysis
Testing
Computing

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

@article{a9edae7e156b45dbb520fab7faa43812,
title = "Asymptotically distribution free tests in heteroscedastic unbalanced high dimensional anova",
abstract = "In this paper, we develop the asymptotic theory for hypotheses testing in high-dimensional analysis of variance (HANOVA) when the distributions are completely unspecified. Most results in the literature have been restricted to observations of no more than two-way designs for continuous data. Here we formulate the local alternatives in terms of departures from the null distribution so that the responses can be either continuous or categorical. The asymptotic theory is presented for testing of main factor and interaction effects of up to order three in unbalanced designs with heteroscedastic variances and arbitrary number of factors. The test statistics are based on quadratic forms whose asymptotic theory is derived under non-classical settings where the number of variables is large while the number of replications may be limited. Simulation results show that the present test statistics perform well in both continuous and discrete HANOVA in type I error accuracy, power performance, and computing time. The proposed test is illustrated with a gene expression data analysis of Arabidopsis thaiana in response to multiple abiotic stresses.",
author = "Haiyan Wang and Akritas, {Michael G.}",
year = "2011",
month = "7",
day = "1",
doi = "10.5705/ss.2009.061",
language = "English (US)",
volume = "21",
pages = "1341--1377",
journal = "Statistica Sinica",
issn = "1017-0405",
publisher = "Institute of Statistical Science",
number = "3",

}

Asymptotically distribution free tests in heteroscedastic unbalanced high dimensional anova. / Wang, Haiyan; Akritas, Michael G.

In: Statistica Sinica, Vol. 21, No. 3, 01.07.2011, p. 1341-1377.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Asymptotically distribution free tests in heteroscedastic unbalanced high dimensional anova

AU - Wang, Haiyan

AU - Akritas, Michael G.

PY - 2011/7/1

Y1 - 2011/7/1

N2 - In this paper, we develop the asymptotic theory for hypotheses testing in high-dimensional analysis of variance (HANOVA) when the distributions are completely unspecified. Most results in the literature have been restricted to observations of no more than two-way designs for continuous data. Here we formulate the local alternatives in terms of departures from the null distribution so that the responses can be either continuous or categorical. The asymptotic theory is presented for testing of main factor and interaction effects of up to order three in unbalanced designs with heteroscedastic variances and arbitrary number of factors. The test statistics are based on quadratic forms whose asymptotic theory is derived under non-classical settings where the number of variables is large while the number of replications may be limited. Simulation results show that the present test statistics perform well in both continuous and discrete HANOVA in type I error accuracy, power performance, and computing time. The proposed test is illustrated with a gene expression data analysis of Arabidopsis thaiana in response to multiple abiotic stresses.

AB - In this paper, we develop the asymptotic theory for hypotheses testing in high-dimensional analysis of variance (HANOVA) when the distributions are completely unspecified. Most results in the literature have been restricted to observations of no more than two-way designs for continuous data. Here we formulate the local alternatives in terms of departures from the null distribution so that the responses can be either continuous or categorical. The asymptotic theory is presented for testing of main factor and interaction effects of up to order three in unbalanced designs with heteroscedastic variances and arbitrary number of factors. The test statistics are based on quadratic forms whose asymptotic theory is derived under non-classical settings where the number of variables is large while the number of replications may be limited. Simulation results show that the present test statistics perform well in both continuous and discrete HANOVA in type I error accuracy, power performance, and computing time. The proposed test is illustrated with a gene expression data analysis of Arabidopsis thaiana in response to multiple abiotic stresses.

UR - http://www.scopus.com/inward/record.url?scp=79958247017&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79958247017&partnerID=8YFLogxK

U2 - 10.5705/ss.2009.061

DO - 10.5705/ss.2009.061

M3 - Article

AN - SCOPUS:79958247017

VL - 21

SP - 1341

EP - 1377

JO - Statistica Sinica

JF - Statistica Sinica

SN - 1017-0405

IS - 3

ER -