Process mining on noisy logs - Can log sanitization help to improve performance?

Hsin Jung Cheng, Akhil Kumar

Research output: Contribution to journalArticle

23 Scopus citations

Abstract

Process mining techniques are designed to read process logs and extract process models from them. However, real world logs are often noisy and such logs produce bad, spaghetti-like process models. We propose a technique to sanitize noisy logs by first building a classifier on a subset of the log, and applying the classifier rules to remove noisy traces from the log. The improvement in the quality of the resulting process models is evaluated on synthetic logs from benchmark models of increasing complexity on both behavioral and structural recall and precision metrics. The results show that mined models produced from such preprocessed logs are superior on several evaluation metrics. They show better fidelity to the reference models, and are also more compact with fewer elements. A nice feature of the rule based approach is that it generalizes to any noise pattern since the nature of noise varies from one log to another. The rules can also be explained and may be further modified manually. We also give results from experiments with a real dataset.

Original languageEnglish (US)
Pages (from-to)138-149
Number of pages12
JournalDecision Support Systems
Volume79
DOIs
StatePublished - Nov 12 2015

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Management Information Systems
  • Information Systems
  • Developmental and Educational Psychology
  • Arts and Humanities (miscellaneous)
  • Information Systems and Management

Cite this