Background: Gene clustering of periodic transcriptional profiles provides an opportunity to shed light on a variety of biological processes, but this technique relies critically upon the robust modeling of longitudinal covariance structure over time. Methodology: We propose a statistical method for functional clustering of periodic gene expression by modeling the covariance matrix of serial measurements through a general autoregressive moving-average process of order (p,q), the socalled ARMA(p,q). We derive a sophisticated EM algorithm to estimate the proportions of each gene cluster, the Fourier series parameters that define gene-specific differences in periodic expression trajectories, and the ARMA parameters that model the covariance structure within a mixture model framework. The orders p and q of the ARMA process that provide the best fit are identified by model selection criteria. Conclusions: Through simulated data we show that whenever it is necessary, employment of sophisticated covariance structures such as ARMA is crucial in order to obtain unbiased estimates of the mean structure parameters and increased precision of estimation. The methods were implemented on recently published time-course gene expression data in yeast and the procedure was shown to effectively identify interesting periodic clusters in the dataset. The new approach will provide a powerful tool for understanding biological functions on a genomic scale.
All Science Journal Classification (ASJC) codes