An exact algorithm to compute the DCJ distance for genomes with duplicate genes

Mingfu Shao, Yu Lin, Bernard Moret

Research output: Chapter in Book/Report/Conference proceedingConference contribution

18 Scopus citations

Abstract

Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this paper, we propose an ILP (integer linear programming) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse and rat genomes, where once again our method outperforms MSOAR.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings
PublisherSpringer Verlag
Pages280-292
Number of pages13
ISBN (Print)9783319052687
DOIs
StatePublished - Jan 1 2014
Event18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014 - Pittsburgh, PA, United States
Duration: Apr 2 2014Apr 5 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8394 LNBI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other18th Annual International Conference on Research in Computational Molecular Biology, RECOMB 2014
CountryUnited States
CityPittsburgh, PA
Period4/2/144/5/14

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Shao, M., Lin, Y., & Moret, B. (2014). An exact algorithm to compute the DCJ distance for genomes with duplicate genes. In Research in Computational Molecular Biology - 18th Annual International Conference, RECOMB 2014, Proceedings (pp. 280-292). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8394 LNBI). Springer Verlag. https://doi.org/10.1007/978-3-319-05269-4_22