XPS: FULL: DSD: End-to-end Acceleration of Genomic Workflows on Emerging Heterogeneous Supercomputers

Project: Research project

Project Details


The proposed research harnesses parallelism to accelerate the

pervasive bioinformatics workflow of detecting genetic variations.

This workflow determines the genetic variants present in an

individual, given DNA sequencing data. The variant detection workflow

is an integral part of current genomic data analysis, and several

studies have linked genetic variants to diseases. Typical instances

of this workflow currently take several hours to multiple days to

complete with state-of-the-art software, and current algorithms and

software are unable to exploit and benefit from even modest levels of

hardware parallelism. Most prior approaches to parallelization and

performance tuning of genomic data analysis pipelines have targeted

computation, I/O, or network data transfer bottlenecks in isolation,

and consequently, are limited in the overall performance improvement

they can achieve. This project targets end-to-end acceleration

methodologies and uses emerging heterogeneous supercomputers to

reduce workflow time-to-completion.

The project focuses on holistic methodologies to accelerate multiple

components within the genetic variant detection workflow. It explores

lightweight data reorganizations at multiple granularities to enhance

locality, investigates compute-, communication-, and I/O task

cotuning, locality-aware load-balancing, and coordinated resource

partitioning to exploit high-performance computing platforms. A key

goal of the proposed research is to design domain-specific

optimizations targeting the massive parallelism and scalability

potential of current heterogeneous supercomputers, so that the

developed techniques can be easily transferred and applied to

dedicated academic cluster and commercial computational environments.

Outreach efforts target undergraduate students through recruiting

workshops and attract them to interdisciplinary graduate programs.

Curriculum development activities emphasize cross-layer parallelism.

For further information, see project web site at


Effective start/end date9/1/142/29/20


  • National Science Foundation: $849,984.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.