Covariate selection for multilevel models with missing data

Miguel Marino, Orfeu M. Buxton, Yi Li

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Missing covariate data hamper variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods that are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data are present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyse the Healthy Directions–Small Business cancer prevention study, which evaluated a behavioural intervention programme targeting multiple risk-related behaviours in a working-class, multi-ethnic population.

Original languageEnglish (US)
Pages (from-to)31-46
Number of pages16
JournalStat
Volume6
Issue number1
DOIs
StatePublished - Jan 1 2017

Fingerprint

Multilevel Models
Missing Data
Covariates
Variable Selection
Multiplication
Predictors
Missing Covariates
Mixed Effects
Multiple Imputation
Lasso
Random Effects Model
Selection Procedures
Multi-class
Linear Regression Model
Deletion
Missing data
Multilevel models
Regression Model
Regularization
Cancer

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Marino, Miguel ; Buxton, Orfeu M. ; Li, Yi. / Covariate selection for multilevel models with missing data. In: Stat. 2017 ; Vol. 6, No. 1. pp. 31-46.
@article{49bfc194157847ec941596ec9e1c86a5,
title = "Covariate selection for multilevel models with missing data",
abstract = "Missing covariate data hamper variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods that are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data are present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyse the Healthy Directions–Small Business cancer prevention study, which evaluated a behavioural intervention programme targeting multiple risk-related behaviours in a working-class, multi-ethnic population.",
author = "Miguel Marino and Buxton, {Orfeu M.} and Yi Li",
year = "2017",
month = "1",
day = "1",
doi = "10.1002/sta4.133",
language = "English (US)",
volume = "6",
pages = "31--46",
journal = "Stat",
issn = "2049-1573",
publisher = "Wiley-Blackwell",
number = "1",

}

Covariate selection for multilevel models with missing data. / Marino, Miguel; Buxton, Orfeu M.; Li, Yi.

In: Stat, Vol. 6, No. 1, 01.01.2017, p. 31-46.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Covariate selection for multilevel models with missing data

AU - Marino, Miguel

AU - Buxton, Orfeu M.

AU - Li, Yi

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Missing covariate data hamper variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods that are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data are present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyse the Healthy Directions–Small Business cancer prevention study, which evaluated a behavioural intervention programme targeting multiple risk-related behaviours in a working-class, multi-ethnic population.

AB - Missing covariate data hamper variable selection in multilevel regression settings. Current variable selection techniques for multiply-imputed data commonly address missingness in the predictors through list-wise deletion and stepwise-selection methods that are problematic. Moreover, most variable selection methods are developed for independent linear regression models and do not accommodate multilevel mixed effects regression models with incomplete covariate data. We develop a novel methodology that is able to perform covariate selection across multiply-imputed data for multilevel random effects models when missing data are present. Specifically, we propose to stack the multiply-imputed data sets from a multiple imputation procedure and to apply a group variable selection procedure through group lasso regularization to assess the overall impact of each predictor on the outcome across the imputed data sets. Simulations confirm the advantageous performance of the proposed method compared with the competing methods. We applied the method to reanalyse the Healthy Directions–Small Business cancer prevention study, which evaluated a behavioural intervention programme targeting multiple risk-related behaviours in a working-class, multi-ethnic population.

UR - http://www.scopus.com/inward/record.url?scp=85008430132&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008430132&partnerID=8YFLogxK

U2 - 10.1002/sta4.133

DO - 10.1002/sta4.133

M3 - Article

C2 - 28239457

AN - SCOPUS:85008430132

VL - 6

SP - 31

EP - 46

JO - Stat

JF - Stat

SN - 2049-1573

IS - 1

ER -