TY - JOUR
T1 - Deep learning approaches for improving prediction of daily stream temperature in data-scarce, unmonitored, and dammed basins
AU - Rahmani, Farshid
AU - Shen, Chaopeng
AU - Oliver, Samantha
AU - Lawson, Kathryn
AU - Appling, Alison
N1 - Funding Information:
FR was supported by the Pennsylvania Water Resources Research Center graduate internship G19AC00425. Funding for the internship and AA and SO was provided by the Integrated Water Prediction Program at the U.S. Geological Survey. CS was supported by the Office of Biological and Environmental Research of the U.S. Department of Energy under contract DE‐SC0016605. KL was supported by National Science Foundation Award OAC #1940190. Data sources have been cited in the paper, and all model inputs, outputs and code are archived in a data release (Rahmani, Shen, et al., 2021 ). The LSTM code for modelling streamflow is available at https://github.com/mhpi/hydroDL . CS and KL have financial interests in HydroSapient, Inc., a company that could potentially benefit from the results of this research. This interest has been reviewed by the University in accordance with its Individual Conflict of Interest policy, for the purpose of maintaining the objectivity and the integrity of research at The Pennsylvania State University. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Funding Information:
FR was supported by the Pennsylvania Water Resources Research Center graduate internship G19AC00425. Funding for the internship and AA and SO was provided by the Integrated Water Prediction Program at the U.S. Geological Survey. CS was supported by the Office of Biological and Environmental Research of the U.S. Department of Energy under contract DE-SC0016605. KL was supported by National Science Foundation Award OAC #1940190. Data sources have been cited in the paper, and all model inputs, outputs and code are archived in a data release (Rahmani, Shen, et al.,?2021). The LSTM code for modelling streamflow is available at https://github.com/mhpi/hydroDL. CS and KL have financial interests in HydroSapient, Inc., a company that could potentially benefit from the results of this research. This interest has been reviewed by the University in accordance with its Individual Conflict of Interest policy, for the purpose of maintaining the objectivity and the integrity of research at The Pennsylvania State University. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Publisher Copyright:
© 2021 John Wiley & Sons Ltd. This article has been contributed to by US Government employees and their work is in the public domain in the USA.
PY - 2021/11
Y1 - 2021/11
N2 - Basin-centric long short-term memory (LSTM) network models have recently been shown to be an exceptionally powerful tool for stream temperature (Ts) temporal prediction (training in one period and predicting in another period at the same sites). However, spatial extrapolation is a well-known challenge to modelling Ts and it is uncertain how an LSTM-based daily Ts model will perform in unmonitored or dammed basins. Here we compiled a new benchmark dataset consisting of >400 basins across the contiguous United States in different data availability groups (DAG, meaning the daily sampling frequency) with and without major dams, and studied how to assemble suitable training datasets for predictions in basins with or without temperature monitoring. For prediction in unmonitored basins (PUB), LSTM produced a root-mean-square error (RMSE) of 1.129°C and an R2 of 0.983. While these metrics declined from LSTM's temporal prediction performance, they far surpassed traditional models' PUB values, and were competitive with traditional models' temporal prediction on calibrated sites. Even for unmonitored basins with major reservoirs, we obtained a median RMSE of 1.202°C and an R2 of 0.984. For temporal prediction, the most suitable training set was the matching DAG that the basin could be grouped into (for example, the 60% DAG was most suitable for a basin with 61% data availability). However, for PUB, a training dataset including all basins with data was consistently preferred. An input-selection ensemble moderately mitigated attribute overfitting. Our results indicate there are influential latent processes not sufficiently described by the inputs (e.g., geology, wetland covers), but temporal fluctuations can still be predicted well, and LSTM appears to be a highly accurate Ts modelling tool even for spatial extrapolation.
AB - Basin-centric long short-term memory (LSTM) network models have recently been shown to be an exceptionally powerful tool for stream temperature (Ts) temporal prediction (training in one period and predicting in another period at the same sites). However, spatial extrapolation is a well-known challenge to modelling Ts and it is uncertain how an LSTM-based daily Ts model will perform in unmonitored or dammed basins. Here we compiled a new benchmark dataset consisting of >400 basins across the contiguous United States in different data availability groups (DAG, meaning the daily sampling frequency) with and without major dams, and studied how to assemble suitable training datasets for predictions in basins with or without temperature monitoring. For prediction in unmonitored basins (PUB), LSTM produced a root-mean-square error (RMSE) of 1.129°C and an R2 of 0.983. While these metrics declined from LSTM's temporal prediction performance, they far surpassed traditional models' PUB values, and were competitive with traditional models' temporal prediction on calibrated sites. Even for unmonitored basins with major reservoirs, we obtained a median RMSE of 1.202°C and an R2 of 0.984. For temporal prediction, the most suitable training set was the matching DAG that the basin could be grouped into (for example, the 60% DAG was most suitable for a basin with 61% data availability). However, for PUB, a training dataset including all basins with data was consistently preferred. An input-selection ensemble moderately mitigated attribute overfitting. Our results indicate there are influential latent processes not sufficiently described by the inputs (e.g., geology, wetland covers), but temporal fluctuations can still be predicted well, and LSTM appears to be a highly accurate Ts modelling tool even for spatial extrapolation.
UR - http://www.scopus.com/inward/record.url?scp=85121122215&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85121122215&partnerID=8YFLogxK
U2 - 10.1002/hyp.14400
DO - 10.1002/hyp.14400
M3 - Article
AN - SCOPUS:85121122215
SN - 0885-6087
VL - 35
JO - Hydrological Processes
JF - Hydrological Processes
IS - 11
M1 - e14400
ER -