TY - GEN

T1 - Dynamical Gaussian Process Latent Variable Model for Representation Learning from Longitudinal Data

AU - Le, Thanh

AU - Honavar, Vasant

N1 - Funding Information:
This work was funded in part by grants from the NIH NCATS through the grant UL1 TR002014 and by the NSF through the grants 1640834, and 1636795, the Penn State Center for Big Data Analytics and Discovery Informatics, the Edward Frymoyer Endowed Professorship in Information Sciences and Technology at Pennsylvania State University and the Sudha Murty Distinguished Visiting Chair in Neurocomputing and Data Science funded by the Pratiksha Trust at the Indian Institute of Science (both held by Vasant Honavar). The content is solely the responsibility of the authors and does not necessarily represent the official views of the sponsors.

PY - 2020/10/19

Y1 - 2020/10/19

N2 - Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (L-GPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. L-GPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of L-GPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradient-based optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that L-GPLVM substantially and consistently outperforms the state-of-the-art alternatives in recovering the missing observations even when the available data exhibits a high degree of sparsity. The compact representations of irregularly sampled and sparse longitudinal data can be used to perform a variety of machine learning tasks, including clustering, classification, and regression.

AB - Many real-world applications involve longitudinal data, consisting of observations of several variables, where different subsets of variables are sampled at irregularly spaced time points. We introduce the Longitudinal Gaussian Process Latent Variable Model (L-GPLVM), a variant of the Gaussian Process Latent Variable Model, for learning compact representations of such data. L-GPLVM overcomes a key limitation of the Dynamic Gaussian Process Latent Variable Model and its variants, which rely on the assumption that the data are fully observed over all of the sampled time points. We describe an effective approach to learning the parameters of L-GPLVM from sparse observations, by coupling the dynamical model with a Multitask Gaussian Process model for sampling of the missing observations at each step of the gradient-based optimization of the variational lower bound. We further show the advantage of the Sparse Process Convolution framework to learn the latent representation of sparsely and irregularly sampled longitudinal data with minimal computational overhead relative to a standard Latent Variable Model. We demonstrated experiments with synthetic data as well as variants of MOCAP data with varying degrees of sparsity of observations that show that L-GPLVM substantially and consistently outperforms the state-of-the-art alternatives in recovering the missing observations even when the available data exhibits a high degree of sparsity. The compact representations of irregularly sampled and sparse longitudinal data can be used to perform a variety of machine learning tasks, including clustering, classification, and regression.

UR - http://www.scopus.com/inward/record.url?scp=85096955201&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85096955201&partnerID=8YFLogxK

U2 - 10.1145/3412815.3416894

DO - 10.1145/3412815.3416894

M3 - Conference contribution

AN - SCOPUS:85096955201

T3 - FODS 2020 - Proceedings of the 2020 ACM-IMS Foundations of Data Science Conference

SP - 183

EP - 188

BT - FODS 2020 - Proceedings of the 2020 ACM-IMS Foundations of Data Science Conference

PB - Association for Computing Machinery, Inc

T2 - 2020 ACM-IMS Foundations of Data Science Conference, FODS 2020

Y2 - 19 October 2020 through 20 October 2020

ER -