Streaming breakpoint graph analytics for accelerating and parallelizing the computation of DCJ median of three genomes

Zhaoming Yin, Jijun Tang, Stephen W. Schaeffer, David A. Bader

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations


The problem of finding the median of three genomes is the key process in building the most parsimonious phylogenetic trees from genome rearrangement data. The median problem using Double-Cut-and-Join (DCJ) distance is NP-hard and the best exact algorithm is based on a branch-and-bound best-first search strategy to explore sub-graph patterns in Multiple BreakPoint Graph (MBG). In this paper, by taking advantage of the "streaming" property of MBG, we introduce the "footprint-based" data structure to reduce the space requirement of a single search nodes from O(v2) to O(v); minimize the redundant computation in counting cycles/paths to update bounds, which leads to dramatically decrease of workload of a single search node. Additional heuristic of branching strategy is introduced to help reducing the searching space. Last but not least, the introduction of a multi-thread shared memory parallel algorithm with two load balancing strategies bring in additional benefit by distributing search work efficiently among different processors. We conduct extensive experiments on simulated datasets and our results show significant improvement on all datasets. And we test our DCJ median algorithm with GASTS, a state of the art software phylogenetic tree construction package. On the real high resolution Drosophila data set, our exact algorithm run as fast as the heuristic algorithm and help construct a better phylogenetic tree.

Original languageEnglish (US)
Pages (from-to)561-570
Number of pages10
JournalProcedia Computer Science
StatePublished - 2013
Event13th Annual International Conference on Computational Science, ICCS 2013 - Barcelona, Spain
Duration: Jun 5 2013Jun 7 2013

All Science Journal Classification (ASJC) codes

  • Computer Science(all)


Dive into the research topics of 'Streaming breakpoint graph analytics for accelerating and parallelizing the computation of DCJ median of three genomes'. Together they form a unique fingerprint.

Cite this