Adaptive algorithms for diagnosing large-scale failures in computer networks

Srikar Tati, Bong Jun Ko, Guohong Cao, Ananthram Swami, Thomas F. La Porta

Research output: Contribution to journalArticle

2 Scopus citations

Abstract

We propose a greedy algorithm, Cluster-MAX-COVERAGE (CMC), to efficiently diagnose large-scale clustered failures. We primarily address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100 percent accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. When there are fewer positive and negative symptoms at a reporting node, CMC performs much better than existing algorithms. We also propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both independent and clustered failures. During a series of failures that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.

Original languageEnglish (US)
Article number6767126
Pages (from-to)646-656
Number of pages11
JournalIEEE Transactions on Parallel and Distributed Systems
Volume26
Issue number3
DOIs
Publication statusPublished - Mar 1 2015

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this