Metabolomics-based approaches are becoming increasingly popular to interrogate the chemical basis for phenotypic differences in biological systems. Successful metabolomics studies employ multivariate data analysis to compare large and highly complex datasets. A primary tool for unsupervised statistical analyses, principal component analysis (PCA), relies on the selection of a subsection of a maximum of three components from a larger model to visually represent similarity. The use of only three principal components limits the comprehensiveness of the model and can mask discrimination between samples. We have developed a new statistical metric, the composite score (CS), as a univariate statistic that incorporates multiple principal components to calculate a correlation matrix that enables quantitative comparisons of sample similarity between samples within one dataset based upon measured metabolome profiles. Composite score values were tabulated using profiles of complex extracts of dietary supplements from the plant Hydrastis canadensis (goldenseal) as a case study. Several outliers were unambiguously identified, and a PCA composite score network was developed to provide a graphical representation of the composite score matrix. Comparison with visualization using PCA score plots or dendrograms from hierarchical clustering analysis (HCA) demonstrates the utility of the composite score to as a tool for metabolomics studies that seek to quantify similarity among samples. An R-script for the calculation of composite score has been made available.
All Science Journal Classification (ASJC) codes
- Analytical Chemistry
- Environmental Chemistry