An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes

Mingfu Shao, Yu Lin, Bernard M.E. Moret

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.

Original languageEnglish (US)
Pages (from-to)425-435
Number of pages11
JournalJournal of Computational Biology
Volume22
Issue number5
DOIs
StatePublished - May 1 2015

Fingerprint

Duplicate Genes
Exact Algorithms
Join
Genome
Genes
Gene
Edit Distance
Linear Programming
Integer Linear Programming
Linear programming
Formulation
Computing
NP-hard Problems
Rearrangement
Preprocessing
Assign
Linear Time
Mouse
Optimality
Simplify

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics

Cite this

@article{8c026e92ec8d4ceaaef9a58c0237b02e,
title = "An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes",
abstract = "Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.",
author = "Mingfu Shao and Yu Lin and Moret, {Bernard M.E.}",
year = "2015",
month = "5",
day = "1",
doi = "10.1089/cmb.2014.0096",
language = "English (US)",
volume = "22",
pages = "425--435",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "5",

}

An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes. / Shao, Mingfu; Lin, Yu; Moret, Bernard M.E.

In: Journal of Computational Biology, Vol. 22, No. 5, 01.05.2015, p. 425-435.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An exact algorithm to compute the double-cut-and-join distance for genomes with duplicate genes

AU - Shao, Mingfu

AU - Lin, Yu

AU - Moret, Bernard M.E.

PY - 2015/5/1

Y1 - 2015/5/1

N2 - Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.

AB - Computing the edit distance between two genomes is a basic problem in the study of genome evolution. The double-cut-and-join (DCJ) model has formed the basis for most algorithmic research on rearrangements over the last few years. The edit distance under the DCJ model can be computed in linear time for genomes without duplicate genes, while the problem becomes NP-hard in the presence of duplicate genes. In this article, we propose an integer linear programming (ILP) formulation to compute the DCJ distance between two genomes with duplicate genes. We also provide an efficient preprocessing approach to simplify the ILP formulation while preserving optimality. Comparison on simulated genomes demonstrates that our method outperforms MSOAR in computing the edit distance, especially when the genomes contain long duplicated segments. We also apply our method to assign orthologous gene pairs among human, mouse, and rat genomes, where once again our method outperforms MSOAR.

UR - http://www.scopus.com/inward/record.url?scp=84929658202&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84929658202&partnerID=8YFLogxK

U2 - 10.1089/cmb.2014.0096

DO - 10.1089/cmb.2014.0096

M3 - Article

C2 - 25517208

AN - SCOPUS:84929658202

VL - 22

SP - 425

EP - 435

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 5

ER -