TY - GEN

T1 - On the DCJ median problem

AU - Shao, Mingfu

AU - Moret, Bernard M.E.

PY - 2014

Y1 - 2014

N2 - As many whole genomes are sequenced, comparative genomics is moving from pairwise comparisons to multiway comparisons framed within a phylogenetic tree. A central problem in this process is the inference of data for internal nodes of the tree from data given at the leaves. When phrased as an optimization problem, this problem reduces to computing a median of three genomes under the operations (evolutionary changes) of interest. We focus on the universal rearrangement operation known as double-cut-and join (DCJ) and present three contributions to the DCJ median problem. First, we describe a new strategy to find so-called adequate subgraphs in the multiple breakpoint graph, using a seed genome. We show how to compute adequate subgraphs w.r.t. this seed genome using a network flow formulation. Second, we prove that the upper bound of the median distance computed from the triangle inequality is tight. Finally, we study the question of whether the median distance can reach its lower and upper bounds. We derive a necessary and sufficient condition for the median distance to reach its lower bound and a necessary condition for it to reach its upper bound and design algorithms to test for these conditions.

AB - As many whole genomes are sequenced, comparative genomics is moving from pairwise comparisons to multiway comparisons framed within a phylogenetic tree. A central problem in this process is the inference of data for internal nodes of the tree from data given at the leaves. When phrased as an optimization problem, this problem reduces to computing a median of three genomes under the operations (evolutionary changes) of interest. We focus on the universal rearrangement operation known as double-cut-and join (DCJ) and present three contributions to the DCJ median problem. First, we describe a new strategy to find so-called adequate subgraphs in the multiple breakpoint graph, using a seed genome. We show how to compute adequate subgraphs w.r.t. this seed genome using a network flow formulation. Second, we prove that the upper bound of the median distance computed from the triangle inequality is tight. Finally, we study the question of whether the median distance can reach its lower and upper bounds. We derive a necessary and sufficient condition for the median distance to reach its lower bound and a necessary condition for it to reach its upper bound and design algorithms to test for these conditions.

UR - http://www.scopus.com/inward/record.url?scp=84958534648&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84958534648&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-07566-2_28

DO - 10.1007/978-3-319-07566-2_28

M3 - Conference contribution

AN - SCOPUS:84958534648

SN - 9783319075655

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 273

EP - 282

BT - Combinatorial Pattern Matching - 25th Annual Symposium, CPM 2014, Proceedings

PB - Springer Verlag

T2 - 25th Annual Symposium on Combinatorial Pattern Matching, CPM 2014

Y2 - 16 June 2014 through 18 June 2014

ER -