An Optimal Semiparametric Method for Two-group Classification

Seungchul Baek, Osamu Komori, Yanyuan Ma

Research output: Contribution to journalArticle

Abstract

In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case-control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t-statistic for two-group classification. Biometrics 2015, 71: 404–416) proposed the generalized t-statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.

Original languageEnglish (US)
Pages (from-to)806-846
Number of pages41
JournalScandinavian Journal of Statistics
Volume45
Issue number3
DOIs
StatePublished - Sep 1 2018

Fingerprint

Semiparametric Methods
Group Classification
Linear Discriminant Function
Statistic
Variance-covariance Matrix
Multivariate Normal Distribution
Case-control Study
Discriminant Analysis
Biometrics
Optimality
Discriminant
Semiparametric methods
Simulation Study
Estimator
Statistics

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Baek, Seungchul ; Komori, Osamu ; Ma, Yanyuan. / An Optimal Semiparametric Method for Two-group Classification. In: Scandinavian Journal of Statistics. 2018 ; Vol. 45, No. 3. pp. 806-846.
@article{36f0bf33808a4003bba0ec37afd880e8,
title = "An Optimal Semiparametric Method for Two-group Classification",
abstract = "In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case-control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t-statistic for two-group classification. Biometrics 2015, 71: 404–416) proposed the generalized t-statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.",
author = "Seungchul Baek and Osamu Komori and Yanyuan Ma",
year = "2018",
month = "9",
day = "1",
doi = "10.1111/sjos.12323",
language = "English (US)",
volume = "45",
pages = "806--846",
journal = "Scandinavian Journal of Statistics",
issn = "0303-6898",
publisher = "Wiley-Blackwell",
number = "3",

}

An Optimal Semiparametric Method for Two-group Classification. / Baek, Seungchul; Komori, Osamu; Ma, Yanyuan.

In: Scandinavian Journal of Statistics, Vol. 45, No. 3, 01.09.2018, p. 806-846.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An Optimal Semiparametric Method for Two-group Classification

AU - Baek, Seungchul

AU - Komori, Osamu

AU - Ma, Yanyuan

PY - 2018/9/1

Y1 - 2018/9/1

N2 - In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case-control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t-statistic for two-group classification. Biometrics 2015, 71: 404–416) proposed the generalized t-statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.

AB - In the classical discriminant analysis, when two multivariate normal distributions with equal variance–covariance matrices are assumed for two groups, the classical linear discriminant function is optimal with respect to maximizing the standardized difference between the means of two groups. However, for a typical case-control study, the distributional assumption for the case group often needs to be relaxed in practice. Komori et al. (Generalized t-statistic for two-group classification. Biometrics 2015, 71: 404–416) proposed the generalized t-statistic to obtain a linear discriminant function, which allows for heterogeneity of case group. Their procedure has an optimality property in the class of consideration. We perform a further study of the problem and show that additional improvement is achievable. The approach we propose does not require a parametric distributional assumption on the case group. We further show that the new estimator is efficient, in that no further improvement is possible to construct the linear discriminant function more efficiently. We conduct simulation studies and real data examples to illustrate the finite sample performance and the gain that it produces in comparison with existing methods.

UR - http://www.scopus.com/inward/record.url?scp=85045910120&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045910120&partnerID=8YFLogxK

U2 - 10.1111/sjos.12323

DO - 10.1111/sjos.12323

M3 - Article

AN - SCOPUS:85045910120

VL - 45

SP - 806

EP - 846

JO - Scandinavian Journal of Statistics

JF - Scandinavian Journal of Statistics

SN - 0303-6898

IS - 3

ER -