Adaptive algorithms for diagnosing large-scale failures in computer networks

Srikar Tati, Bong Jun Ko, Guohong Cao, Ananthram Swami, Thomas F. La Porta

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.

Original languageEnglish (US)
Title of host publication2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
DOIs
StatePublished - Oct 1 2012
Event42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012 - Boston, MA, United States
Duration: Jun 25 2012Jun 28 2012

Publication series

NameProceedings of the International Conference on Dependable Systems and Networks

Other

Other42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012
CountryUnited States
CityBoston, MA
Period6/25/126/28/12

Fingerprint

Adaptive algorithms
Computer networks

All Science Journal Classification (ASJC) codes

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this

Tati, S., Ko, B. J., Cao, G., Swami, A., & La Porta, T. F. (2012). Adaptive algorithms for diagnosing large-scale failures in computer networks. In 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012 [6263917] (Proceedings of the International Conference on Dependable Systems and Networks). https://doi.org/10.1109/DSN.2012.6263917
Tati, Srikar ; Ko, Bong Jun ; Cao, Guohong ; Swami, Ananthram ; La Porta, Thomas F. / Adaptive algorithms for diagnosing large-scale failures in computer networks. 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012. 2012. (Proceedings of the International Conference on Dependable Systems and Networks).
@inproceedings{2dfa491c273e4207bc3613474072b16a,
title = "Adaptive algorithms for diagnosing large-scale failures in computer networks",
abstract = "In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100{\%} accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.",
author = "Srikar Tati and Ko, {Bong Jun} and Guohong Cao and Ananthram Swami and {La Porta}, {Thomas F.}",
year = "2012",
month = "10",
day = "1",
doi = "10.1109/DSN.2012.6263917",
language = "English (US)",
isbn = "9781467316248",
series = "Proceedings of the International Conference on Dependable Systems and Networks",
booktitle = "2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012",

}

Tati, S, Ko, BJ, Cao, G, Swami, A & La Porta, TF 2012, Adaptive algorithms for diagnosing large-scale failures in computer networks. in 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012., 6263917, Proceedings of the International Conference on Dependable Systems and Networks, 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012, Boston, MA, United States, 6/25/12. https://doi.org/10.1109/DSN.2012.6263917

Adaptive algorithms for diagnosing large-scale failures in computer networks. / Tati, Srikar; Ko, Bong Jun; Cao, Guohong; Swami, Ananthram; La Porta, Thomas F.

2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012. 2012. 6263917 (Proceedings of the International Conference on Dependable Systems and Networks).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Adaptive algorithms for diagnosing large-scale failures in computer networks

AU - Tati, Srikar

AU - Ko, Bong Jun

AU - Cao, Guohong

AU - Swami, Ananthram

AU - La Porta, Thomas F.

PY - 2012/10/1

Y1 - 2012/10/1

N2 - In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.

AB - In this paper, we propose an algorithm to efficiently diagnose large-scale clustered failures. The algorithm, Cluster-MAX-COVERAGE (CMC), is based on greedy approach. We address the challenge of determining faults with incomplete symptoms. CMC makes novel use of both positive and negative symptoms to output a hypothesis list with a low number of false negatives and false positives quickly. CMC requires reports from about half as many nodes as other existing algorithms to determine failures with 100% accuracy. Moreover, CMC accomplishes this gain significantly faster (sometimes by two orders of magnitude) than an algorithm that matches its accuracy. Furthermore, we propose an adaptive algorithm called Adaptive-MAX-COVERAGE (AMC) that performs efficiently during both kinds of failures, i.e., independent and clustered. During a series of failues that include both independent and clustered, AMC results in a reduced number of false negatives and false positives.

UR - http://www.scopus.com/inward/record.url?scp=84866677413&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866677413&partnerID=8YFLogxK

U2 - 10.1109/DSN.2012.6263917

DO - 10.1109/DSN.2012.6263917

M3 - Conference contribution

AN - SCOPUS:84866677413

SN - 9781467316248

T3 - Proceedings of the International Conference on Dependable Systems and Networks

BT - 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012

ER -

Tati S, Ko BJ, Cao G, Swami A, La Porta TF. Adaptive algorithms for diagnosing large-scale failures in computer networks. In 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2012. 2012. 6263917. (Proceedings of the International Conference on Dependable Systems and Networks). https://doi.org/10.1109/DSN.2012.6263917