TY - JOUR

T1 - Mining dual networks

T2 - Models, algorithms, and applications

AU - Wu, Yubao

AU - Zhu, Xiaofeng

AU - Li, Li

AU - Fan, Wei

AU - Jin, Ruoming

AU - Zhang, Xiang

N1 - Funding Information:
This work was partially supported by the National Science Foundation grants IIS-1218036, IIS-1162374, IIS-0953950, the National Basic Research Program of China (No. 2014CB340401), the NIH grant R01 HG003054, the NIH/NIGMS grant R01 GM103309, and the OSC (Ohio Supercomputer Center) grant PGS0218.

PY - 2016/5

Y1 - 2016/5

N2 - Finding the densest subgraph in a single graph is a fundamental problem that has been extensively studied. In many emerging applications, there exist dual networks. For example, in genetics, it is important to use protein interactions to interpret genetic interactions. In this application, one network represents physical interactions among nodes, for example, protein-protein interactions, and another network represents conceptual interactions, for example, genetic interactions. Edges in the conceptual network are usually derived based on certain correlation measure or statistical test measuring the strength of the interaction. Two nodes with strong conceptual interaction may not have direct physical interaction. In this article, we propose the novel dual-network model and investigate the problem of finding the densest connected subgraph (DCS), which has the largest density in the conceptual network and is also connected in the physical network. Density in the conceptual network represents the average strength of the measured interacting signals among the set of nodes. Connectivity in the physical network shows how they interact physically. Such pattern cannot beidentified using the existing algorithms for a single network. Weshow that even though finding the densest subgraph in a single network is polynomial time solvable, the DCS problem is NP-hard. We develop a two-step approach to solve the DCS problem. In the first step, we effectively prune the dual networks, while guarantee that the optimal solution is contained in the remaining networks. For the second step, we develop two efficient greedy methods based on different search strategies to find the DCS. Different variations of the DCS problem are also studied. We perform extensive experiments on a variety of real and synthetic dual networks to evaluate the effectiveness and efficiency of the developed methods.

AB - Finding the densest subgraph in a single graph is a fundamental problem that has been extensively studied. In many emerging applications, there exist dual networks. For example, in genetics, it is important to use protein interactions to interpret genetic interactions. In this application, one network represents physical interactions among nodes, for example, protein-protein interactions, and another network represents conceptual interactions, for example, genetic interactions. Edges in the conceptual network are usually derived based on certain correlation measure or statistical test measuring the strength of the interaction. Two nodes with strong conceptual interaction may not have direct physical interaction. In this article, we propose the novel dual-network model and investigate the problem of finding the densest connected subgraph (DCS), which has the largest density in the conceptual network and is also connected in the physical network. Density in the conceptual network represents the average strength of the measured interacting signals among the set of nodes. Connectivity in the physical network shows how they interact physically. Such pattern cannot beidentified using the existing algorithms for a single network. Weshow that even though finding the densest subgraph in a single network is polynomial time solvable, the DCS problem is NP-hard. We develop a two-step approach to solve the DCS problem. In the first step, we effectively prune the dual networks, while guarantee that the optimal solution is contained in the remaining networks. For the second step, we develop two efficient greedy methods based on different search strategies to find the DCS. Different variations of the DCS problem are also studied. We perform extensive experiments on a variety of real and synthetic dual networks to evaluate the effectiveness and efficiency of the developed methods.

UR - http://www.scopus.com/inward/record.url?scp=84973482184&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84973482184&partnerID=8YFLogxK

U2 - 10.1145/2785970

DO - 10.1145/2785970

M3 - Article

AN - SCOPUS:84973482184

VL - 10

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

SN - 1556-4681

IS - 4

M1 - 40

ER -