Process mining on noisy logs - Can log sanitization help to improve performance?

Hsin Jung Cheng, Akhil Kumar

Research output: Contribution to journalArticle

19 Citations (Scopus)

Abstract

Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.

Original languageEnglish (US)
Pages (from-to)138-149
Number of pages12
JournalDecision Support Systems
Volume79
DOIs
StatePublished - Nov 12 2015

Fingerprint

Noise
Benchmarking
Quality Improvement
Classifiers
Process mining
Process model
Process Model
Classifier
Datasets
Experiments
Evaluation
Fidelity
Experiment
Reference model
Benchmark
Rule-based
Real World

All Science Journal Classification (ASJC) codes

  • Management Information Systems
  • Information Systems
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Information Systems and Management

Cite this

@article{f5cfd412a50b4f88bb513a34e1538209,
title = "Process mining on noisy logs - Can log sanitization help to improve performance?",
abstract = "Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.",
author = "Cheng, {Hsin Jung} and Akhil Kumar",
year = "2015",
month = "11",
day = "12",
doi = "10.1016/j.dss.2015.08.003",
language = "English (US)",
volume = "79",
pages = "138--149",
journal = "Decision Support Systems",
issn = "0167-9236",
publisher = "Elsevier",

}

Process mining on noisy logs - Can log sanitization help to improve performance? / Cheng, Hsin Jung; Kumar, Akhil.

In: Decision Support Systems, Vol. 79, 12.11.2015, p. 138-149.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Process mining on noisy logs - Can log sanitization help to improve performance?

AU - Cheng, Hsin Jung

AU - Kumar, Akhil

PY - 2015/11/12

Y1 - 2015/11/12

N2 - Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.

AB - Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.

UR - http://www.scopus.com/inward/record.url?scp=84941280475&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84941280475&partnerID=8YFLogxK

U2 - 10.1016/j.dss.2015.08.003

DO - 10.1016/j.dss.2015.08.003

M3 - Article

AN - SCOPUS:84941280475

VL - 79

SP - 138

EP - 149

JO - Decision Support Systems

JF - Decision Support Systems

SN - 0167-9236

ER -