Longitudinal data are very popular in practice, but they are often missing in either outcomes or time-dependent risk factors, making them highly unbalanced and complex. Missing data may contain various missing patterns or mechanisms, and how to properly handle it for unbiased and valid inference still presents a significant challenge. Here, we propose a novel semiparametric framework for analyzing longitudinal data with both missing responses and covariates that are missing at random and intermittent, a general and widely encountered situation in observational studies. Within this framework, we consider multiple robust estimation procedures based on innovative calibrated propensity scores, which offers additional relaxation of the misspecification of missing data mechanisms and shows more satisfactory numerical performance. Also, the corresponding robust information criterion on consistent variable selection for our proposed model is developed based on empirical likelihood-based methods. These advocated methods are evaluated in both theory and extensive simulation studies in a variety of situations, showing competing properties and advantages compared to the existing approaches. We illustrate the utility of our approach by analyzing the data from the HIV Epidemiology Research Study.
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Biochemistry, Genetics and Molecular Biology(all)
- Immunology and Microbiology(all)
- Agricultural and Biological Sciences(all)
- Applied Mathematics