Monitoring continuous state violation in datacenters: Exploring the time dimension

Shicong Meng, Ting Wang, Ling Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

Monitoring global states of an application deployed over distributed nodes becomes prevalent in today's datacenters. State monitoring requires not only correct monitoring results but also minimum communication cost for efficiency and scalability. Most existing work adopts an instantaneous state monitoring approach, which triggers state alerts whenever a constraint is violated. Such an approach, however, may cause frequent and unnecessary state alerts due to unpredictable monitored value bursts and momentary outliers that are common in large-scale Internet applications. These false alerts may further lead to expensive and problematic counter-measures. To address this issue, we introduce window-based state monitoring in this paper. Window-based state monitoring evaluates whether state violation is continuous within a time window, and thus, gains immunity to short-term value bursts and outliers. Furthermore, we find that exploring the monitoring time window at distributed nodes achieves significant communication savings over instantaneous monitoring. Based on this finding, we develop WISE, a system that efficiently performs WIndow-based StatE monitoring at datacenter-scale. WISE is highlighted with three sets of techniques. First, WISE uses distributed filtering time windows and intelligently avoids global information collecting to achieve communication efficiency, while guaranteeing monitoring correctness at the same time. Second, WISE provides a suite of performance tuning techniques to minimize communication cost based on a sophisticated cost model. Third, WISE also employs a set of novel performance optimization techniques. Extensive experiments over both real world and synthetic traces show that WISE achieves a 50%-90% reduction in communication cost compared with existing instantaneous monitoring approaches and simple alternative schemes.

Original languageEnglish (US)
Title of host publication26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings
Pages968-979
Number of pages12
DOIs
StatePublished - Jun 1 2010
Event26th IEEE International Conference on Data Engineering, ICDE 2010 - Long Beach, CA, United States
Duration: Mar 1 2010Mar 6 2010

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Other

Other26th IEEE International Conference on Data Engineering, ICDE 2010
CountryUnited States
CityLong Beach, CA
Period3/1/103/6/10

Fingerprint

Monitoring
Communication
Costs
Scalability
Tuning
Internet
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Information Systems

Cite this

Meng, S., Wang, T., & Liu, L. (2010). Monitoring continuous state violation in datacenters: Exploring the time dimension. In 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings (pp. 968-979). [5447923] (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2010.5447923
Meng, Shicong ; Wang, Ting ; Liu, Ling. / Monitoring continuous state violation in datacenters : Exploring the time dimension. 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings. 2010. pp. 968-979 (Proceedings - International Conference on Data Engineering).
@inproceedings{2e7b1888b566468995bef8f8574471b6,
title = "Monitoring continuous state violation in datacenters: Exploring the time dimension",
abstract = "Monitoring global states of an application deployed over distributed nodes becomes prevalent in today's datacenters. State monitoring requires not only correct monitoring results but also minimum communication cost for efficiency and scalability. Most existing work adopts an instantaneous state monitoring approach, which triggers state alerts whenever a constraint is violated. Such an approach, however, may cause frequent and unnecessary state alerts due to unpredictable monitored value bursts and momentary outliers that are common in large-scale Internet applications. These false alerts may further lead to expensive and problematic counter-measures. To address this issue, we introduce window-based state monitoring in this paper. Window-based state monitoring evaluates whether state violation is continuous within a time window, and thus, gains immunity to short-term value bursts and outliers. Furthermore, we find that exploring the monitoring time window at distributed nodes achieves significant communication savings over instantaneous monitoring. Based on this finding, we develop WISE, a system that efficiently performs WIndow-based StatE monitoring at datacenter-scale. WISE is highlighted with three sets of techniques. First, WISE uses distributed filtering time windows and intelligently avoids global information collecting to achieve communication efficiency, while guaranteeing monitoring correctness at the same time. Second, WISE provides a suite of performance tuning techniques to minimize communication cost based on a sophisticated cost model. Third, WISE also employs a set of novel performance optimization techniques. Extensive experiments over both real world and synthetic traces show that WISE achieves a 50{\%}-90{\%} reduction in communication cost compared with existing instantaneous monitoring approaches and simple alternative schemes.",
author = "Shicong Meng and Ting Wang and Ling Liu",
year = "2010",
month = "6",
day = "1",
doi = "10.1109/ICDE.2010.5447923",
language = "English (US)",
isbn = "9781424454440",
series = "Proceedings - International Conference on Data Engineering",
pages = "968--979",
booktitle = "26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings",

}

Meng, S, Wang, T & Liu, L 2010, Monitoring continuous state violation in datacenters: Exploring the time dimension. in 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings., 5447923, Proceedings - International Conference on Data Engineering, pp. 968-979, 26th IEEE International Conference on Data Engineering, ICDE 2010, Long Beach, CA, United States, 3/1/10. https://doi.org/10.1109/ICDE.2010.5447923

Monitoring continuous state violation in datacenters : Exploring the time dimension. / Meng, Shicong; Wang, Ting; Liu, Ling.

26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings. 2010. p. 968-979 5447923 (Proceedings - International Conference on Data Engineering).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Monitoring continuous state violation in datacenters

T2 - Exploring the time dimension

AU - Meng, Shicong

AU - Wang, Ting

AU - Liu, Ling

PY - 2010/6/1

Y1 - 2010/6/1

N2 - Monitoring global states of an application deployed over distributed nodes becomes prevalent in today's datacenters. State monitoring requires not only correct monitoring results but also minimum communication cost for efficiency and scalability. Most existing work adopts an instantaneous state monitoring approach, which triggers state alerts whenever a constraint is violated. Such an approach, however, may cause frequent and unnecessary state alerts due to unpredictable monitored value bursts and momentary outliers that are common in large-scale Internet applications. These false alerts may further lead to expensive and problematic counter-measures. To address this issue, we introduce window-based state monitoring in this paper. Window-based state monitoring evaluates whether state violation is continuous within a time window, and thus, gains immunity to short-term value bursts and outliers. Furthermore, we find that exploring the monitoring time window at distributed nodes achieves significant communication savings over instantaneous monitoring. Based on this finding, we develop WISE, a system that efficiently performs WIndow-based StatE monitoring at datacenter-scale. WISE is highlighted with three sets of techniques. First, WISE uses distributed filtering time windows and intelligently avoids global information collecting to achieve communication efficiency, while guaranteeing monitoring correctness at the same time. Second, WISE provides a suite of performance tuning techniques to minimize communication cost based on a sophisticated cost model. Third, WISE also employs a set of novel performance optimization techniques. Extensive experiments over both real world and synthetic traces show that WISE achieves a 50%-90% reduction in communication cost compared with existing instantaneous monitoring approaches and simple alternative schemes.

AB - Monitoring global states of an application deployed over distributed nodes becomes prevalent in today's datacenters. State monitoring requires not only correct monitoring results but also minimum communication cost for efficiency and scalability. Most existing work adopts an instantaneous state monitoring approach, which triggers state alerts whenever a constraint is violated. Such an approach, however, may cause frequent and unnecessary state alerts due to unpredictable monitored value bursts and momentary outliers that are common in large-scale Internet applications. These false alerts may further lead to expensive and problematic counter-measures. To address this issue, we introduce window-based state monitoring in this paper. Window-based state monitoring evaluates whether state violation is continuous within a time window, and thus, gains immunity to short-term value bursts and outliers. Furthermore, we find that exploring the monitoring time window at distributed nodes achieves significant communication savings over instantaneous monitoring. Based on this finding, we develop WISE, a system that efficiently performs WIndow-based StatE monitoring at datacenter-scale. WISE is highlighted with three sets of techniques. First, WISE uses distributed filtering time windows and intelligently avoids global information collecting to achieve communication efficiency, while guaranteeing monitoring correctness at the same time. Second, WISE provides a suite of performance tuning techniques to minimize communication cost based on a sophisticated cost model. Third, WISE also employs a set of novel performance optimization techniques. Extensive experiments over both real world and synthetic traces show that WISE achieves a 50%-90% reduction in communication cost compared with existing instantaneous monitoring approaches and simple alternative schemes.

UR - http://www.scopus.com/inward/record.url?scp=77952771893&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77952771893&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2010.5447923

DO - 10.1109/ICDE.2010.5447923

M3 - Conference contribution

AN - SCOPUS:77952771893

SN - 9781424454440

T3 - Proceedings - International Conference on Data Engineering

SP - 968

EP - 979

BT - 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings

ER -

Meng S, Wang T, Liu L. Monitoring continuous state violation in datacenters: Exploring the time dimension. In 26th IEEE International Conference on Data Engineering, ICDE 2010 - Conference Proceedings. 2010. p. 968-979. 5447923. (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2010.5447923