TY - JOUR
T1 - Remember where you came from
T2 - 43rd International Conference on Very Large Data Bases, VLDB 2017
AU - Wu, Yubao
AU - Bian, Yuchen
AU - Zhang, Xiang
N1 - Funding Information:
This work was partially supported by the National Science Foundation grants IIS-1162374, CAREER, and the NIH grant R01GM115833.
Publisher Copyright:
© 2016. VLDB Endowment.
PY - 2016
Y1 - 2016
N2 - Measuring the proximity between different nodes is a fundamental problem in graph analysis. Random walk based proximity measures have been shown to be effective and widely used. Most existing random walk measures are based on the first-order Markov model, i.e., they assume that the next step of the random surfer only depends on the current node. However, this assumption neither holds in many real- life applications nor captures the clustering structure in the graph. To address the limitation of the existing first-order measures, in this paper, we study the second-order random walk measures, which take the previously visited node into consideration. While the existing first-order measures are built on node-to-node transition probabilities, in the second-order random walk, we need to consider the edge-to-edge transition probabilities. Using incidence matrices, we develop simple and elegant matrix representations for the second-order proximity measures. A desirable property of the developed measures is that they degenerate to their original first-order forms when the effect of the previous step is zero. We further develop Monte Carlo methods to efficiently compute the second-order measures and provide theoretical performance guarantees. Experimental results show that in a variety of applications, the second-order measures can dramatically improve the performance compared to their first-order counterparts.
AB - Measuring the proximity between different nodes is a fundamental problem in graph analysis. Random walk based proximity measures have been shown to be effective and widely used. Most existing random walk measures are based on the first-order Markov model, i.e., they assume that the next step of the random surfer only depends on the current node. However, this assumption neither holds in many real- life applications nor captures the clustering structure in the graph. To address the limitation of the existing first-order measures, in this paper, we study the second-order random walk measures, which take the previously visited node into consideration. While the existing first-order measures are built on node-to-node transition probabilities, in the second-order random walk, we need to consider the edge-to-edge transition probabilities. Using incidence matrices, we develop simple and elegant matrix representations for the second-order proximity measures. A desirable property of the developed measures is that they degenerate to their original first-order forms when the effect of the previous step is zero. We further develop Monte Carlo methods to efficiently compute the second-order measures and provide theoretical performance guarantees. Experimental results show that in a variety of applications, the second-order measures can dramatically improve the performance compared to their first-order counterparts.
UR - http://www.scopus.com/inward/record.url?scp=85020385238&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020385238&partnerID=8YFLogxK
U2 - 10.14778/3015270.3015272
DO - 10.14778/3015270.3015272
M3 - Conference article
AN - SCOPUS:85020385238
SN - 2150-8097
VL - 10
SP - 13
EP - 24
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 1
Y2 - 28 August 2017 through 1 September 2017
ER -