Data management for large-scale scientific computations in high performance distributed systems

A. Choudhary, M. Kandemir, H. Nagesh, J. No, X. Shen, V. Taylor, S. More, R. Thakur

Research output: Contribution to journalConference article

10 Citations (Scopus)

Abstract

With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a new environment which is built around an active meta-data management system (MDMS). The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques to the MDMS. We discuss the importance of an active MDMS and show how the three components, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our initial implementation and illustrate that significant improvements are made possible without undue programming effort.

Original languageEnglish (US)
Pages (from-to)263-272
Number of pages10
JournalIEEE International Symposium on High Performance Distributed Computing, Proceedings
StatePublished - Dec 1 1999
EventProceedings of the 1999 8th IEEE International Symposium on High Performance Distributed Computing - HPDC-8 - Redondo Beach, CA, USA
Duration: Aug 3 1999Aug 6 1999

Fingerprint

Information management
Metadata
User interfaces

All Science Journal Classification (ASJC) codes

  • Software
  • Computer Networks and Communications

Cite this

@article{a5dcdca51df14ae8b07b4c364d2f1717,
title = "Data management for large-scale scientific computations in high performance distributed systems",
abstract = "With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a new environment which is built around an active meta-data management system (MDMS). The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques to the MDMS. We discuss the importance of an active MDMS and show how the three components, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our initial implementation and illustrate that significant improvements are made possible without undue programming effort.",
author = "A. Choudhary and M. Kandemir and H. Nagesh and J. No and X. Shen and V. Taylor and S. More and R. Thakur",
year = "1999",
month = "12",
day = "1",
language = "English (US)",
pages = "263--272",
journal = "IEEE International Symposium on High Performance Distributed Computing, Proceedings",
issn = "1082-8907",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

Data management for large-scale scientific computations in high performance distributed systems. / Choudhary, A.; Kandemir, M.; Nagesh, H.; No, J.; Shen, X.; Taylor, V.; More, S.; Thakur, R.

In: IEEE International Symposium on High Performance Distributed Computing, Proceedings, 01.12.1999, p. 263-272.

Research output: Contribution to journalConference article

TY - JOUR

T1 - Data management for large-scale scientific computations in high performance distributed systems

AU - Choudhary, A.

AU - Kandemir, M.

AU - Nagesh, H.

AU - No, J.

AU - Shen, X.

AU - Taylor, V.

AU - More, S.

AU - Thakur, R.

PY - 1999/12/1

Y1 - 1999/12/1

N2 - With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a new environment which is built around an active meta-data management system (MDMS). The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques to the MDMS. We discuss the importance of an active MDMS and show how the three components, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our initial implementation and illustrate that significant improvements are made possible without undue programming effort.

AB - With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a new environment which is built around an active meta-data management system (MDMS). The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques to the MDMS. We discuss the importance of an active MDMS and show how the three components, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our initial implementation and illustrate that significant improvements are made possible without undue programming effort.

UR - http://www.scopus.com/inward/record.url?scp=0033365259&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033365259&partnerID=8YFLogxK

M3 - Conference article

AN - SCOPUS:0033365259

SP - 263

EP - 272

JO - IEEE International Symposium on High Performance Distributed Computing, Proceedings

JF - IEEE International Symposium on High Performance Distributed Computing, Proceedings

SN - 1082-8907

ER -