TY - JOUR
T1 - Composite score analysis for unsupervised comparison and network visualization of metabolomics data
AU - Kellogg, Joshua J.
AU - Kvalheim, Olav M.
AU - Cech, Nadja B.
PY - 2020/1/25
Y1 - 2020/1/25
N2 - Metabolomics-based approaches are becoming increasingly popular to interrogate the chemical basis for phenotypic differences in biological systems. Successful metabolomics studies employ multivariate data analysis to compare large and highly complex datasets. A primary tool for unsupervised statistical analyses, principal component analysis (PCA), relies on the selection of a subsection of a maximum of three components from a larger model to visually represent similarity. The use of only three principal components limits the comprehensiveness of the model and can mask discrimination between samples. We have developed a new statistical metric, the composite score (CS), as a univariate statistic that incorporates multiple principal components to calculate a correlation matrix that enables quantitative comparisons of sample similarity between samples within one dataset based upon measured metabolome profiles. Composite score values were tabulated using profiles of complex extracts of dietary supplements from the plant Hydrastis canadensis (goldenseal) as a case study. Several outliers were unambiguously identified, and a PCA composite score network was developed to provide a graphical representation of the composite score matrix. Comparison with visualization using PCA score plots or dendrograms from hierarchical clustering analysis (HCA) demonstrates the utility of the composite score to as a tool for metabolomics studies that seek to quantify similarity among samples. An R-script for the calculation of composite score has been made available.
AB - Metabolomics-based approaches are becoming increasingly popular to interrogate the chemical basis for phenotypic differences in biological systems. Successful metabolomics studies employ multivariate data analysis to compare large and highly complex datasets. A primary tool for unsupervised statistical analyses, principal component analysis (PCA), relies on the selection of a subsection of a maximum of three components from a larger model to visually represent similarity. The use of only three principal components limits the comprehensiveness of the model and can mask discrimination between samples. We have developed a new statistical metric, the composite score (CS), as a univariate statistic that incorporates multiple principal components to calculate a correlation matrix that enables quantitative comparisons of sample similarity between samples within one dataset based upon measured metabolome profiles. Composite score values were tabulated using profiles of complex extracts of dietary supplements from the plant Hydrastis canadensis (goldenseal) as a case study. Several outliers were unambiguously identified, and a PCA composite score network was developed to provide a graphical representation of the composite score matrix. Comparison with visualization using PCA score plots or dendrograms from hierarchical clustering analysis (HCA) demonstrates the utility of the composite score to as a tool for metabolomics studies that seek to quantify similarity among samples. An R-script for the calculation of composite score has been made available.
UR - http://www.scopus.com/inward/record.url?scp=85075331843&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075331843&partnerID=8YFLogxK
U2 - 10.1016/j.aca.2019.10.029
DO - 10.1016/j.aca.2019.10.029
M3 - Article
C2 - 31864629
AN - SCOPUS:85075331843
VL - 1095
SP - 38
EP - 47
JO - Analytica Chimica Acta
JF - Analytica Chimica Acta
SN - 0003-2670
ER -