Towards provenance-based anomaly detection in MapReduce

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages647-656
Number of pages10
ISBN (Electronic)9781479980062
DOIs
StatePublished - Jul 7 2015
Event15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015 - Shenzhen, China
Duration: May 4 2015May 7 2015

Publication series

NameProceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015

Other

Other15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015
CountryChina
CityShenzhen
Period5/4/155/7/15

Fingerprint

Data acquisition
Processing
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Computer Networks and Communications
  • Software

Cite this

Liao, C., & Squicciarini, A. (2015). Towards provenance-based anomaly detection in MapReduce. In Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015 (pp. 647-656). [7152530] (Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CCGrid.2015.16
Liao, Cong ; Squicciarini, Anna. / Towards provenance-based anomaly detection in MapReduce. Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 647-656 (Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015).
@inproceedings{46b308d64d8c4a2a81cb7e0404f63d2f,
title = "Towards provenance-based anomaly detection in MapReduce",
abstract = "MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.",
author = "Cong Liao and Anna Squicciarini",
year = "2015",
month = "7",
day = "7",
doi = "10.1109/CCGrid.2015.16",
language = "English (US)",
series = "Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "647--656",
booktitle = "Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015",
address = "United States",

}

Liao, C & Squicciarini, A 2015, Towards provenance-based anomaly detection in MapReduce. in Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015., 7152530, Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015, Institute of Electrical and Electronics Engineers Inc., pp. 647-656, 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015, Shenzhen, China, 5/4/15. https://doi.org/10.1109/CCGrid.2015.16

Towards provenance-based anomaly detection in MapReduce. / Liao, Cong; Squicciarini, Anna.

Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 647-656 7152530 (Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Towards provenance-based anomaly detection in MapReduce

AU - Liao, Cong

AU - Squicciarini, Anna

PY - 2015/7/7

Y1 - 2015/7/7

N2 - MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.

AB - MapReduce enables parallel and distributed processing of vast amount of data on a cluster of machines. However, such computing paradigm is subject to threats posed by malicious and cheating nodes or compromised user submitted code that could tamper data and computation since users maintain little control as the computation is carried out in a distributed fashion. In this paper, we focus on the analysis and detection of anomalies during the process of MapReduce computation. Accordingly, we develop a computational provenance system that captures provenance data related to MapReduce computation within the MapReduce framework in Hadoop. In particular, we identify a set of invariants against aggregated provenance information, which are later analyzed to uncover anomalies indicating possible tampering of data and computation. We conduct a series of experiments to show the efficiency and effectiveness of our proposed provenance system.

UR - http://www.scopus.com/inward/record.url?scp=84941242786&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941242786&partnerID=8YFLogxK

U2 - 10.1109/CCGrid.2015.16

DO - 10.1109/CCGrid.2015.16

M3 - Conference contribution

T3 - Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015

SP - 647

EP - 656

BT - Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Liao C, Squicciarini A. Towards provenance-based anomaly detection in MapReduce. In Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 647-656. 7152530. (Proceedings - 2015 IEEE/ACM 15th International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2015). https://doi.org/10.1109/CCGrid.2015.16