Vast quantities of health, environmental, and behavioral data are being generated today, yet they remain locked in digital silos. For example, data from health care providers, such as hospitals, provide a dynamic view of health of individuals and populations from birth to death. At the same time, government institutions and industry have released troves of economic, environmental, and behavioral datasets, such as indicators of income/poverty, adverse exposure (e.g., air pollution), and ecological factors (e.g., climate) to the public domain. How are economic, environmental, and behavioral factors linked with health? This project will put together numerous sources of large environmental and clinical data streams to enable the scientific community to address this question. By breaking current data silos, the broader scientific impacts will be wide. First, this effort will foster new routes of biomedical investigation for the big data community. Second, the project will enable discoveries that will have behavioral, economic, environmental, and public health relevance.
This project will aim to assemble a first-ever data warehouse containing numerous health/clinical, environmental, behavioral, and economic data streams to ultimately enable causal discovery between these data sources. First, the team will integrate numerous health data streams by leveraging the Observational Health Data Sciences and Informatics (OHDSI, www.ohdsi.org) network, a virtual data repository that contains millions of longitudinal patient measurements, such as drugs and disease diagnoses. Second, the team will build a centralized data warehouse that contains important environmental, behavioral, and economic data across the United States, such as the Environmental Protection Agency air pollution AirData, the United States Census data on income and occupation statistics, and the National Oceanic Administration Association for climate and weather-related information. Third, the team will disseminate emerging computational methods for causal inference and machine learning to enable researchers to find causal links between environmental, economic, behavioral, and clinical factors. The team will leverage our broad collaborative network consisting of academic big data researchers, federal-level institutes (e.g., EPA, NOAA), and hospitals (e.g., Partners HealthCare) to integrate these data and to disseminate cutting edge machine learning tools. Lastly, the project will create training resources (e.g., interactive how-to guides), coordinate cross-institution student internships, and lead a hands-on workshop to demonstrate use of the integrated data warehouse. The ultimate goal of the project is to facilitate community-led and collaborative causal discovery through dissemination of integrated and open big data and analytics tools.
|Effective start/end date||1/1/17 → 12/31/21|
- National Science Foundation: $95,367.00