Abstract: The state of big data analytics in the field of HIV/AIDS research is critically lacking. Decreasing cost of sequencing stimulated the development of novel software tools and analysis frameworks. The bulk of these efforts has been driven by truly expansive (and well-funded) collaborative projects such as the 1000 genomes, ENCODE, modENCODE, GTEx, the Human Microbiome, the Cancer Genome Atlas, and others. While these projects hardened many aspects of NGS data analysis and manipulation, as well as established standards for data representation (e.g. BAM, VCF, CRAM formats) they were facing a set of challenges that is markedly distinct from those faced by HIV researchers, e.g. long stable genomes with few mutations (i.e., human) versus short variable genomes with many mutations (i.e., HIV). Consequently, the development of HIV-specific tools and applications for next generation sequencing (NGS) has largely been the domain of individual labs, independently designing sensible ad hoc, yet disaggregated, solutions to common problems, resulting in a fragmented field largely without accepted standards and gaps between available solutions and the needs of end users. The current practice of writing ?full-stack? custom in-house solutions for NGS analyses is not scalable, not maintainable, largely fails to leverage the developments from other domains of NGS data analysis, and hampers the adoption of this transformative technology in HIV research. The specific aims of this proposal address practical aspects of HIV/AIDS-related NGS analysis by assembling proven and newly developed tools and modules into ?data to answer? series of workflows, and creating a publicly available and accessible turnkey solution suitable for a large proportion of HIV/AIDS researchers needing to perform routine and bespoke analyses of NGS data..
|Effective start/end date||6/26/17 → 5/31/18|
- National Institute of Allergy and Infectious Diseases: $716,247.00