The prototype discussed in this article retrieves Essential Terrestrial Variable (ETV) web services and uses data-model workflows to transform ETV data for hydrological models in a distributed computing environment. The ETV workflow is a service layer to 100's of terabytes of national datasets bundled for fast data access in support of watershed modeling using the United States Geological Survey (USGS) Hydrological Unit Code (HUC) level-12 scale. The ETV data has been proposed as the Essential Terrestrial Data necessary to construct watershed models anywhere in the continental USA (Leonard and Duffy, 2013). Here, we present the hardware and software system designs to support the ETV, data-model, and model workflows using High Performance Computing (HPC) and service-oriented architecture.This infrastructure design is an important contribution to both how and where the workflows operate. We describe details of how these workflow services operate in a distributed manner for modeling CONUS HUC-12 catchments using the Penn State Integrated Hydrological Model (PIHM) as an example. The prototype is evaluated by generating data-model workflows for every CONUS HUC-12 and creating a repository of workflow provenance for every HUC-12 (~100km2) for use by researchers as a strategy to begin a new hydrological model study. The concept of provenance for data-model workflows developed here assures reproducibility of model simulations (e.g. reanalysis) from ETV datasets without storing model results which we have shown will require many petabytes of storage.
All Science Journal Classification (ASJC) codes
- Environmental Engineering
- Ecological Modeling