Data integration in multi-dimensional data sets

Informational asymmetry in the valid correlation of subdivided samples

Qing T. Zeng, Juan Pablo Pratt, Jane Pak, Eun Young Kim, Dino Ravnic, Harold Huss, Steven J. Mentzer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Background: Flow cytometry is the only currently available high throughput technology that can measure multiple physical and molecular characteristics of individual cells. It is common in flow cytometry to measure a relatively large number of characteristics or features by performing separate experiments on subdivided samples. Correlating data from multiple experiments using certain shared features (e.g. cell size) could provide useful information on the combination pattern of the not shared features. Such correlation, however, are not always reliable. Methods: We developed a method to assess the correlation reliability by estimating the percentage of cells that can be unambiguously correlated between two samples. This method was evaluated using 81 pairs of subdivided samples of microspheres (artificial cells) with known molecular characteristics. Results: Strong correlation (R=0.85) was found between the estimated and actual percentage of unambiguous correlation. Conclusion: The correlation reliability we developed can be used to support data integration of experiments on subdivided samples.

Original languageEnglish (US)
Title of host publicationBiological and Medical Data Analysis - 7th International Symposium, ISBMDA 2006, Proceedings
Pages423-432
Number of pages10
Volume4345 LNBI
StatePublished - 2006
Event7th International Symposium on Biological and Medical Data Analysis, ISBMDA 2006 - Thessaloniki, Greece
Duration: Dec 7 2006Dec 8 2006

Other

Other7th International Symposium on Biological and Medical Data Analysis, ISBMDA 2006
CountryGreece
CityThessaloniki
Period12/7/0612/8/06

Fingerprint

Multidimensional Data
Data integration
Data Integration
Asymmetry
Flow cytometry
Valid
Flow Cytometry
Artificial Cells
Experiments
Percentage
Microspheres
Cell
Cell Size
Experiment
Throughput
Technology
High Throughput
Datasets

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

Zeng, Q. T., Pratt, J. P., Pak, J., Kim, E. Y., Ravnic, D., Huss, H., & Mentzer, S. J. (2006). Data integration in multi-dimensional data sets: Informational asymmetry in the valid correlation of subdivided samples. In Biological and Medical Data Analysis - 7th International Symposium, ISBMDA 2006, Proceedings (Vol. 4345 LNBI, pp. 423-432)
Zeng, Qing T. ; Pratt, Juan Pablo ; Pak, Jane ; Kim, Eun Young ; Ravnic, Dino ; Huss, Harold ; Mentzer, Steven J. / Data integration in multi-dimensional data sets : Informational asymmetry in the valid correlation of subdivided samples. Biological and Medical Data Analysis - 7th International Symposium, ISBMDA 2006, Proceedings. Vol. 4345 LNBI 2006. pp. 423-432
@inproceedings{4c3db1cfb9424dbfbca220e1ce27b250,
title = "Data integration in multi-dimensional data sets: Informational asymmetry in the valid correlation of subdivided samples",
abstract = "Background: Flow cytometry is the only currently available high throughput technology that can measure multiple physical and molecular characteristics of individual cells. It is common in flow cytometry to measure a relatively large number of characteristics or features by performing separate experiments on subdivided samples. Correlating data from multiple experiments using certain shared features (e.g. cell size) could provide useful information on the combination pattern of the not shared features. Such correlation, however, are not always reliable. Methods: We developed a method to assess the correlation reliability by estimating the percentage of cells that can be unambiguously correlated between two samples. This method was evaluated using 81 pairs of subdivided samples of microspheres (artificial cells) with known molecular characteristics. Results: Strong correlation (R=0.85) was found between the estimated and actual percentage of unambiguous correlation. Conclusion: The correlation reliability we developed can be used to support data integration of experiments on subdivided samples.",
author = "Zeng, {Qing T.} and Pratt, {Juan Pablo} and Jane Pak and Kim, {Eun Young} and Dino Ravnic and Harold Huss and Mentzer, {Steven J.}",
year = "2006",
language = "English (US)",
isbn = "3540680632",
volume = "4345 LNBI",
pages = "423--432",
booktitle = "Biological and Medical Data Analysis - 7th International Symposium, ISBMDA 2006, Proceedings",

}

Zeng, QT, Pratt, JP, Pak, J, Kim, EY, Ravnic, D, Huss, H & Mentzer, SJ 2006, Data integration in multi-dimensional data sets: Informational asymmetry in the valid correlation of subdivided samples. in Biological and Medical Data Analysis - 7th International Symposium, ISBMDA 2006, Proceedings. vol. 4345 LNBI, pp. 423-432, 7th International Symposium on Biological and Medical Data Analysis, ISBMDA 2006, Thessaloniki, Greece, 12/7/06.

Data integration in multi-dimensional data sets : Informational asymmetry in the valid correlation of subdivided samples. / Zeng, Qing T.; Pratt, Juan Pablo; Pak, Jane; Kim, Eun Young; Ravnic, Dino; Huss, Harold; Mentzer, Steven J.

Biological and Medical Data Analysis - 7th International Symposium, ISBMDA 2006, Proceedings. Vol. 4345 LNBI 2006. p. 423-432.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Data integration in multi-dimensional data sets

T2 - Informational asymmetry in the valid correlation of subdivided samples

AU - Zeng, Qing T.

AU - Pratt, Juan Pablo

AU - Pak, Jane

AU - Kim, Eun Young

AU - Ravnic, Dino

AU - Huss, Harold

AU - Mentzer, Steven J.

PY - 2006

Y1 - 2006

N2 - Background: Flow cytometry is the only currently available high throughput technology that can measure multiple physical and molecular characteristics of individual cells. It is common in flow cytometry to measure a relatively large number of characteristics or features by performing separate experiments on subdivided samples. Correlating data from multiple experiments using certain shared features (e.g. cell size) could provide useful information on the combination pattern of the not shared features. Such correlation, however, are not always reliable. Methods: We developed a method to assess the correlation reliability by estimating the percentage of cells that can be unambiguously correlated between two samples. This method was evaluated using 81 pairs of subdivided samples of microspheres (artificial cells) with known molecular characteristics. Results: Strong correlation (R=0.85) was found between the estimated and actual percentage of unambiguous correlation. Conclusion: The correlation reliability we developed can be used to support data integration of experiments on subdivided samples.

AB - Background: Flow cytometry is the only currently available high throughput technology that can measure multiple physical and molecular characteristics of individual cells. It is common in flow cytometry to measure a relatively large number of characteristics or features by performing separate experiments on subdivided samples. Correlating data from multiple experiments using certain shared features (e.g. cell size) could provide useful information on the combination pattern of the not shared features. Such correlation, however, are not always reliable. Methods: We developed a method to assess the correlation reliability by estimating the percentage of cells that can be unambiguously correlated between two samples. This method was evaluated using 81 pairs of subdivided samples of microspheres (artificial cells) with known molecular characteristics. Results: Strong correlation (R=0.85) was found between the estimated and actual percentage of unambiguous correlation. Conclusion: The correlation reliability we developed can be used to support data integration of experiments on subdivided samples.

UR - http://www.scopus.com/inward/record.url?scp=34547477876&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547477876&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540680632

SN - 9783540680635

VL - 4345 LNBI

SP - 423

EP - 432

BT - Biological and Medical Data Analysis - 7th International Symposium, ISBMDA 2006, Proceedings

ER -

Zeng QT, Pratt JP, Pak J, Kim EY, Ravnic D, Huss H et al. Data integration in multi-dimensional data sets: Informational asymmetry in the valid correlation of subdivided samples. In Biological and Medical Data Analysis - 7th International Symposium, ISBMDA 2006, Proceedings. Vol. 4345 LNBI. 2006. p. 423-432