Examining groundwater-flood and soil moisture-flood relationships across scales using national-scale data mining, deep learning and knowledge distillation

Project: Research project

Project Details


In many parts of the United States, it has been shown that groundwater levels and soil moisture, which quantifies the wetness of the soil, are connected via the mechanism of flood production. Water cannot infiltrate into the ground when groundwater is close to the surface and is thus forced to quickly run off to rivers, creating higher flooding risks. However, the relationship between groundwater and floods has been found to be highly diverse and difficult to predict. Depending on terrain, groundwater depth, and many other factors, floods lead groundwater increase in some cases while groundwater can lead floods in others. Previous research from selected experimental watersheds have not resulted in a comprehensive and transferable understanding of the controlling processes. This project will take a big-data, machine learning approach to enhance our understanding of this relationship, allowing us to heuristically exploit previously under-utilized groundwater data for flood predictions and reducing damages. Using learning patterns from national-scale groundwater and streamflow data, the machine learning algorithms will create plausible groundwater-flood relationships. Taking advantage of the big hydrologic data from available satellite missions, this project will create shared undergraduate course modules to enhance student's ability to work with big data and increase their awareness of global water issues.

This research advances hydrologic science by answering the following overarching question: at catchment scales, do groundwater levels in the catchment provide predictive power for flood threshold functions and baseflow? We will address this question in multiple small steps. We will identify the kinds of groundwater-rainfall-runoff (GW-P-Q) relations that can be found over the Continental United States. These relations are quantified by the correlations between water table depths and flood thresholds (and baseflow) at different lags and time scales. We will seek the factors dictate the type of GW-P-Q relations and whether these relations are stable across seasons and years. We will employ two approaches: a human-directed classification analysis, and a knowledge distillation scheme based on deep learning (DL), a rapidly advancing group of techniques supporting the recent surge in artificial intelligence. In the first approach, we will use classification and regression tree to identify factors that could explain the GW-P-Q relations. In the DL-based approach, we will train continental-scale time series DL models using all available data to forecast discharge. This approach addresses the issue with classification trees in which not enough data are available for branch nodes. Through a novel knowledge distillation procedure, we transfer the knowledge gained in the deep network to more interpretable formats, including explicit mathematical formula. Results from the study will provide a comprehensive understanding of GW-P-Q relations where regional patterns and physical controls emerge. Besides gaining new knowledge, a significant by-product is the trained DL models. They can be used as a flood forecasting tool to integrate recent soil moisture and groundwater observations, which have not been exploited until now. The educational activity will mesh with the research activity by engaging undergraduate students in handling, visualizing and interpreting big hydrologic data.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

Effective start/end date7/1/1812/31/21


  • National Science Foundation: $249,862.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.