Fast inference for the latent space network model using a case-control approximate likelihood

Adrian E. Raftery, Xiaoyue Niu, Peter D. Hoff, Ka Yee Yeung

Research output: Contribution to journalArticle

37 Citations (Scopus)

Abstract

Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), whereNis the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control loglikelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online.

Original languageEnglish (US)
Pages (from-to)901-919
Number of pages19
JournalJournal of Computational and Graphical Statistics
Volume21
Issue number4
DOIs
StatePublished - 2012

Fingerprint

Case-control
Network Model
Likelihood
Likelihood Function
Model
Unbiased estimator
Epidemiology
Protein-protein Interaction
Social Sciences
Markov Chain Monte Carlo
False Positive
Computational Cost
Genome
Network model
Inference
Clustering
Model-based
Evaluate
Approximation
Vertex of a graph

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Discrete Mathematics and Combinatorics
  • Statistics, Probability and Uncertainty

Cite this

@article{c16ad913bc384983a7ffda59ed69df55,
title = "Fast inference for the latent space network model using a case-control approximate likelihood",
abstract = "Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), whereNis the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control loglikelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online.",
author = "Raftery, {Adrian E.} and Xiaoyue Niu and Hoff, {Peter D.} and Yeung, {Ka Yee}",
year = "2012",
doi = "10.1080/10618600.2012.679240",
language = "English (US)",
volume = "21",
pages = "901--919",
journal = "Journal of Computational and Graphical Statistics",
issn = "1061-8600",
publisher = "American Statistical Association",
number = "4",

}

Fast inference for the latent space network model using a case-control approximate likelihood. / Raftery, Adrian E.; Niu, Xiaoyue; Hoff, Peter D.; Yeung, Ka Yee.

In: Journal of Computational and Graphical Statistics, Vol. 21, No. 4, 2012, p. 901-919.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Fast inference for the latent space network model using a case-control approximate likelihood

AU - Raftery, Adrian E.

AU - Niu, Xiaoyue

AU - Hoff, Peter D.

AU - Yeung, Ka Yee

PY - 2012

Y1 - 2012

N2 - Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), whereNis the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control loglikelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online.

AB - Network models are widely used in social sciences and genome sciences. The latent space model proposed by Hoff et al. (2002), and extended by Handcock et al. (2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), whereNis the number of nodes. This makes it infeasible for large networks. In this article, we propose an approximation of the log-likelihood function. We adapt the case-control idea from epidemiology and construct a case-control loglikelihood, which is an unbiased estimator of the log-full likelihood. Replacing the full likelihood by the case-control likelihood in the Markov chain Monte Carlo estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links. Supplemental materials are available online.

UR - http://www.scopus.com/inward/record.url?scp=84866401750&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866401750&partnerID=8YFLogxK

U2 - 10.1080/10618600.2012.679240

DO - 10.1080/10618600.2012.679240

M3 - Article

C2 - 27570438

AN - SCOPUS:84866401750

VL - 21

SP - 901

EP - 919

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

SN - 1061-8600

IS - 4

ER -