TY - JOUR
T1 - Theory and A heuristic for the minimum path flow decomposition problem
AU - Shao, Mingfu
AU - Kingsford, Carl
N1 - Funding Information:
The authors thank Meiyue Shao and Guillaume Marc¸ais for their helpful discussions and suggestions. This research is funded in part by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4554 to Carl Kingsford, by the US National Science Foundation (CCF-1256087, CCF-1319998) and by the US National Institutes of Health (R21HG006913, R01HG007104). C.K. received support as an Alfred P. Sloan Research Fellow.
Publisher Copyright:
© 2017 IEEE.
PY - 2019/3/1
Y1 - 2019/3/1
N2 - Motivated by multiple genome assembly problems and other applications, we study the following minimum path flow decomposition problem: Given a directed acyclic graph G=(V,E) with source s and sink t and a flow f, compute a set of s-t paths P and assign weight w(p) for p P such that f(e) = ∑ p P: E p w(p)f(e)=, E and |P| is minimized. We develop some fundamental theory for this problem, upon which we design an efficient heuristic. Specifically, we prove that the gap between the optimal number of paths and a known upper bound is determined by the nontrivial equations within the flow values. This result gives rise to the framework of our heuristic: To iteratively reduce the gap through identifying such equations. We also define an operation on certain independent substructures of the graph, and prove that this operation does not affect the optimality but can transform the graph into one with desired property that facilitates reducing the gap. We apply and test our algorithm on both simulated random instances and perfect splice graph instances, and also compare it with the existing state-of-art algorithm for flow decomposition. The results illustrate that our algorithm can achieve very high accuracy on these instances, and also that our algorithm significantly improves on the previous algorithms. An implementation of our algorithm is freely available at https://github.com/Kingsford-Group/catfish.
AB - Motivated by multiple genome assembly problems and other applications, we study the following minimum path flow decomposition problem: Given a directed acyclic graph G=(V,E) with source s and sink t and a flow f, compute a set of s-t paths P and assign weight w(p) for p P such that f(e) = ∑ p P: E p w(p)f(e)=, E and |P| is minimized. We develop some fundamental theory for this problem, upon which we design an efficient heuristic. Specifically, we prove that the gap between the optimal number of paths and a known upper bound is determined by the nontrivial equations within the flow values. This result gives rise to the framework of our heuristic: To iteratively reduce the gap through identifying such equations. We also define an operation on certain independent substructures of the graph, and prove that this operation does not affect the optimality but can transform the graph into one with desired property that facilitates reducing the gap. We apply and test our algorithm on both simulated random instances and perfect splice graph instances, and also compare it with the existing state-of-art algorithm for flow decomposition. The results illustrate that our algorithm can achieve very high accuracy on these instances, and also that our algorithm significantly improves on the previous algorithms. An implementation of our algorithm is freely available at https://github.com/Kingsford-Group/catfish.
UR - http://www.scopus.com/inward/record.url?scp=85037588476&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85037588476&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2017.2779509
DO - 10.1109/TCBB.2017.2779509
M3 - Article
C2 - 29990201
AN - SCOPUS:85037588476
VL - 16
SP - 658
EP - 670
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
SN - 1545-5963
IS - 2
M1 - 8126870
ER -