Theory and A heuristic for the minimum path flow decomposition problem

Mingfu Shao, Carl Kingsford

Research output: Contribution to journalArticle

Abstract

Motivated by multiple genome assembly problems and other applications, we study the following minimum path flow decomposition problem: Given a directed acyclic graph G=(V,E) with source s and sink t and a flow f, compute a set of s-t paths P and assign weight w(p) for p P such that f(e) = ∑ p P: E p w(p)f(e)=, E and |P| is minimized. We develop some fundamental theory for this problem, upon which we design an efficient heuristic. Specifically, we prove that the gap between the optimal number of paths and a known upper bound is determined by the nontrivial equations within the flow values. This result gives rise to the framework of our heuristic: To iteratively reduce the gap through identifying such equations. We also define an operation on certain independent substructures of the graph, and prove that this operation does not affect the optimality but can transform the graph into one with desired property that facilitates reducing the gap. We apply and test our algorithm on both simulated random instances and perfect splice graph instances, and also compare it with the existing state-of-art algorithm for flow decomposition. The results illustrate that our algorithm can achieve very high accuracy on these instances, and also that our algorithm significantly improves on the previous algorithms. An implementation of our algorithm is freely available at https://github.com/Kingsford-Group/catfish.

Original languageEnglish (US)
Article number8126870
Pages (from-to)658-670
Number of pages13
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume16
Issue number2
DOIs
StatePublished - Mar 1 2019

Fingerprint

Problem Decomposition
Heuristics
Decomposition
Path
Catfishes
Perfect Graphs
Directed Acyclic Graph
Substructure
Graph in graph theory
Assign
Optimality
High Accuracy
Genome
Genes
Transform
Upper bound
Weights and Measures
Decompose

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Genetics
  • Applied Mathematics

Cite this

@article{3217cc9c2cde49aeb0c4bcf70398d0d1,
title = "Theory and A heuristic for the minimum path flow decomposition problem",
abstract = "Motivated by multiple genome assembly problems and other applications, we study the following minimum path flow decomposition problem: Given a directed acyclic graph G=(V,E) with source s and sink t and a flow f, compute a set of s-t paths P and assign weight w(p) for p P such that f(e) = ∑ p P: E p w(p)f(e)=, E and |P| is minimized. We develop some fundamental theory for this problem, upon which we design an efficient heuristic. Specifically, we prove that the gap between the optimal number of paths and a known upper bound is determined by the nontrivial equations within the flow values. This result gives rise to the framework of our heuristic: To iteratively reduce the gap through identifying such equations. We also define an operation on certain independent substructures of the graph, and prove that this operation does not affect the optimality but can transform the graph into one with desired property that facilitates reducing the gap. We apply and test our algorithm on both simulated random instances and perfect splice graph instances, and also compare it with the existing state-of-art algorithm for flow decomposition. The results illustrate that our algorithm can achieve very high accuracy on these instances, and also that our algorithm significantly improves on the previous algorithms. An implementation of our algorithm is freely available at https://github.com/Kingsford-Group/catfish.",
author = "Mingfu Shao and Carl Kingsford",
year = "2019",
month = "3",
day = "1",
doi = "10.1109/TCBB.2017.2779509",
language = "English (US)",
volume = "16",
pages = "658--670",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "2",

}

Theory and A heuristic for the minimum path flow decomposition problem. / Shao, Mingfu; Kingsford, Carl.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 16, No. 2, 8126870, 01.03.2019, p. 658-670.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Theory and A heuristic for the minimum path flow decomposition problem

AU - Shao, Mingfu

AU - Kingsford, Carl

PY - 2019/3/1

Y1 - 2019/3/1

N2 - Motivated by multiple genome assembly problems and other applications, we study the following minimum path flow decomposition problem: Given a directed acyclic graph G=(V,E) with source s and sink t and a flow f, compute a set of s-t paths P and assign weight w(p) for p P such that f(e) = ∑ p P: E p w(p)f(e)=, E and |P| is minimized. We develop some fundamental theory for this problem, upon which we design an efficient heuristic. Specifically, we prove that the gap between the optimal number of paths and a known upper bound is determined by the nontrivial equations within the flow values. This result gives rise to the framework of our heuristic: To iteratively reduce the gap through identifying such equations. We also define an operation on certain independent substructures of the graph, and prove that this operation does not affect the optimality but can transform the graph into one with desired property that facilitates reducing the gap. We apply and test our algorithm on both simulated random instances and perfect splice graph instances, and also compare it with the existing state-of-art algorithm for flow decomposition. The results illustrate that our algorithm can achieve very high accuracy on these instances, and also that our algorithm significantly improves on the previous algorithms. An implementation of our algorithm is freely available at https://github.com/Kingsford-Group/catfish.

AB - Motivated by multiple genome assembly problems and other applications, we study the following minimum path flow decomposition problem: Given a directed acyclic graph G=(V,E) with source s and sink t and a flow f, compute a set of s-t paths P and assign weight w(p) for p P such that f(e) = ∑ p P: E p w(p)f(e)=, E and |P| is minimized. We develop some fundamental theory for this problem, upon which we design an efficient heuristic. Specifically, we prove that the gap between the optimal number of paths and a known upper bound is determined by the nontrivial equations within the flow values. This result gives rise to the framework of our heuristic: To iteratively reduce the gap through identifying such equations. We also define an operation on certain independent substructures of the graph, and prove that this operation does not affect the optimality but can transform the graph into one with desired property that facilitates reducing the gap. We apply and test our algorithm on both simulated random instances and perfect splice graph instances, and also compare it with the existing state-of-art algorithm for flow decomposition. The results illustrate that our algorithm can achieve very high accuracy on these instances, and also that our algorithm significantly improves on the previous algorithms. An implementation of our algorithm is freely available at https://github.com/Kingsford-Group/catfish.

UR - http://www.scopus.com/inward/record.url?scp=85037588476&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037588476&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2017.2779509

DO - 10.1109/TCBB.2017.2779509

M3 - Article

C2 - 29990201

AN - SCOPUS:85037588476

VL - 16

SP - 658

EP - 670

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 2

M1 - 8126870

ER -