Ranking causal anomalies for system fault diagnosis via temporal and dynamical analysis on vanishing correlations

Wei Cheng, Jingchao Ni, Kai Zhang, Haifeng Chen, Guofei Jiang, Yu Shi, Xiang Zhang, Wei Wang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Detecting system anomalies is an important problem in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be powerful in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: (1) fault propagation in the network is ignored, (2) the root casual anomalies may not always be the nodes with a high percentage of vanishing correlations, (3) temporal patterns of vanishing correlations are not exploited for robust detection, and (4) prior knowledge on anomalous nodes are not exploited for (semi-)supervised detection. To address these limitations, in this article we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectivelymodel fault propagation over the entire invariant network and can perform joint inference on both the structural and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations and can compensate for unstructuredmeasurement noise in the system. Moreover, when the prior knowledge on the anomalous status of some nodes are available at certain time points, our approach is able to leverage them to further enhance the anomaly inference accuracy. When the prior knowledge is noisy, our approach also automatically learns reliable information and reduces impacts from noises. By performing extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets, we demonstrate the effectiveness of our approach.

Original languageEnglish (US)
Article number3046946
JournalACM Transactions on Knowledge Discovery from Data
Volume11
Issue number4
DOIs
StatePublished - Jun 1 2017

Fingerprint

Invariance
Failure analysis
Large scale systems
Information systems
Coal
Experiments
Cyber Physical System

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Cheng, Wei ; Ni, Jingchao ; Zhang, Kai ; Chen, Haifeng ; Jiang, Guofei ; Shi, Yu ; Zhang, Xiang ; Wang, Wei. / Ranking causal anomalies for system fault diagnosis via temporal and dynamical analysis on vanishing correlations. In: ACM Transactions on Knowledge Discovery from Data. 2017 ; Vol. 11, No. 4.
@article{98c1398629d44af88589a777c7d2a5cd,
title = "Ranking causal anomalies for system fault diagnosis via temporal and dynamical analysis on vanishing correlations",
abstract = "Detecting system anomalies is an important problem in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be powerful in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: (1) fault propagation in the network is ignored, (2) the root casual anomalies may not always be the nodes with a high percentage of vanishing correlations, (3) temporal patterns of vanishing correlations are not exploited for robust detection, and (4) prior knowledge on anomalous nodes are not exploited for (semi-)supervised detection. To address these limitations, in this article we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectivelymodel fault propagation over the entire invariant network and can perform joint inference on both the structural and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations and can compensate for unstructuredmeasurement noise in the system. Moreover, when the prior knowledge on the anomalous status of some nodes are available at certain time points, our approach is able to leverage them to further enhance the anomaly inference accuracy. When the prior knowledge is noisy, our approach also automatically learns reliable information and reduces impacts from noises. By performing extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets, we demonstrate the effectiveness of our approach.",
author = "Wei Cheng and Jingchao Ni and Kai Zhang and Haifeng Chen and Guofei Jiang and Yu Shi and Xiang Zhang and Wei Wang",
year = "2017",
month = "6",
day = "1",
doi = "10.1145/3046946",
language = "English (US)",
volume = "11",
journal = "ACM Transactions on Knowledge Discovery from Data",
issn = "1556-4681",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

Ranking causal anomalies for system fault diagnosis via temporal and dynamical analysis on vanishing correlations. / Cheng, Wei; Ni, Jingchao; Zhang, Kai; Chen, Haifeng; Jiang, Guofei; Shi, Yu; Zhang, Xiang; Wang, Wei.

In: ACM Transactions on Knowledge Discovery from Data, Vol. 11, No. 4, 3046946, 01.06.2017.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Ranking causal anomalies for system fault diagnosis via temporal and dynamical analysis on vanishing correlations

AU - Cheng, Wei

AU - Ni, Jingchao

AU - Zhang, Kai

AU - Chen, Haifeng

AU - Jiang, Guofei

AU - Shi, Yu

AU - Zhang, Xiang

AU - Wang, Wei

PY - 2017/6/1

Y1 - 2017/6/1

N2 - Detecting system anomalies is an important problem in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be powerful in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: (1) fault propagation in the network is ignored, (2) the root casual anomalies may not always be the nodes with a high percentage of vanishing correlations, (3) temporal patterns of vanishing correlations are not exploited for robust detection, and (4) prior knowledge on anomalous nodes are not exploited for (semi-)supervised detection. To address these limitations, in this article we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectivelymodel fault propagation over the entire invariant network and can perform joint inference on both the structural and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations and can compensate for unstructuredmeasurement noise in the system. Moreover, when the prior knowledge on the anomalous status of some nodes are available at certain time points, our approach is able to leverage them to further enhance the anomaly inference accuracy. When the prior knowledge is noisy, our approach also automatically learns reliable information and reduces impacts from noises. By performing extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets, we demonstrate the effectiveness of our approach.

AB - Detecting system anomalies is an important problem in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be powerful in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: (1) fault propagation in the network is ignored, (2) the root casual anomalies may not always be the nodes with a high percentage of vanishing correlations, (3) temporal patterns of vanishing correlations are not exploited for robust detection, and (4) prior knowledge on anomalous nodes are not exploited for (semi-)supervised detection. To address these limitations, in this article we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectivelymodel fault propagation over the entire invariant network and can perform joint inference on both the structural and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations and can compensate for unstructuredmeasurement noise in the system. Moreover, when the prior knowledge on the anomalous status of some nodes are available at certain time points, our approach is able to leverage them to further enhance the anomaly inference accuracy. When the prior knowledge is noisy, our approach also automatically learns reliable information and reduces impacts from noises. By performing extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets, we demonstrate the effectiveness of our approach.

UR - http://www.scopus.com/inward/record.url?scp=85023164652&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023164652&partnerID=8YFLogxK

U2 - 10.1145/3046946

DO - 10.1145/3046946

M3 - Article

AN - SCOPUS:85023164652

VL - 11

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

SN - 1556-4681

IS - 4

M1 - 3046946

ER -