Schema Matching and Data Integration with Consistent Naming on Protein Crystallization Screens

Midusha Shrestha, Truong X. Tran, Bidhan Bhattarai, Marc L. Pusey, Ramazan S. Aygun

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

The data representation as well as naming conventions used in commercial screen files by different companies make the automated analysis of crystallization experiments difficult and time-consuming. In order to reduce the human effort required to deal with this problem, we present an approach for computationally matching elements of two schemas using linguistic schema matching methods and then transform the input screen format to another format with naming defined by the user. This approach is tested on a number of commercial screens from different companies and the results of the experiments showed an overall accuracy of 97 percent on schema matching which is significantly better than the other two matchers we tested. Our tool enables mapping a screen file in one format to another format preferred by the expert using their preferred chemical names.

Original languageEnglish (US)
Article number8700291
Pages (from-to)2074-2085
Number of pages12
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume17
Issue number6
DOIs
StatePublished - Nov 1 2020

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Genetics
  • Applied Mathematics

Fingerprint

Dive into the research topics of 'Schema Matching and Data Integration with Consistent Naming on Protein Crystallization Screens'. Together they form a unique fingerprint.

Cite this