A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs

State-of-the-art process discovery methods construct free-choice process models from event logs. Consequently, the constructed models do not take into account indirect dependencies between events. Whenever the input behaviour is not free-choice, these methods fail to provide a precise model. In this paper, we propose a novel approach for enhancing free-choice process models by adding non-free-choice constructs discovered a-posteriori via region-based techniques. This allows us to benefit from the performance of existing process discovery methods and the accuracy of the employed fundamental synthesis techniques. We prove that the proposed approach preserves fitness with respect to the event log while improving the precision when indirect dependencies exist. The approach has been implemented and tested on both synthetic and real-life datasets. The results show its effectiveness in repairing models discovered from event logs.

Download Full-text

Fire now, fire later: alarm-based systems for prescriptive process monitoring

Knowledge and Information Systems ◽

10.1007/s10115-021-01633-w ◽

2021 ◽

Author(s):

Stephan A. Fahrenkrog-Petersen ◽

Niek Tax ◽

Irene Teinemaa ◽

Marlon Dumas ◽

Massimiliano de Leoni ◽

...

Keyword(s):

Process Monitoring ◽

Cost Model ◽

Real Life ◽

Cost Benefit ◽

Model Parameters ◽

Life Event ◽

Event Logs ◽

Future State ◽

Process Instance ◽

The Cost

AbstractPredictive process monitoring is a family of techniques to analyze events produced during the execution of a business process in order to predict the future state or the final outcome of running process instances. Existing techniques in this field are able to predict, at each step of a process instance, the likelihood that it will lead to an undesired outcome. These techniques, however, focus on generating predictions and do not prescribe when and how process workers should intervene to decrease the cost of undesired outcomes. This paper proposes a framework for prescriptive process monitoring, which extends predictive monitoring with the ability to generate alarms that trigger interventions to prevent an undesired outcome or mitigate its effect. The framework incorporates a parameterized cost model to assess the cost–benefit trade-off of generating alarms. We show how to optimize the generation of alarms given an event log of past process executions and a set of cost model parameters. The proposed approaches are empirically evaluated using a range of real-life event logs. The experimental results show that the net cost of undesired outcomes can be minimized by changing the threshold for generating alarms, as the process instance progresses. Moreover, introducing delays for triggering alarms, instead of triggering them as soon as the probability of an undesired outcome exceeds a threshold, leads to lower net costs.

Download Full-text

Quality Dimensions in Process Discovery: The Importance of Fitness, Precision, Generalization and Simplicity

International Journal of Cooperative Information Systems ◽

10.1142/s0218843014400012 ◽

2014 ◽

Vol 23 (01) ◽

pp. 1440001 ◽

Cited By ~ 47

Author(s):

J. C. A. M. Buijs ◽

B. F. van Dongen ◽

W. M. P. van der Aalst

Keyword(s):

Process Models ◽

Discovery Process ◽

Process Discovery ◽

Event Logs ◽

Quality Dimensions ◽

Discovery Algorithms

Process discovery algorithms typically aim at discovering process models from event logs that best describe the recorded behavior. Often, the quality of a process discovery algorithm is measured by quantifying to what extent the resulting model can reproduce the behavior in the log, i.e. replay fitness. At the same time, there are other measures that compare a model with recorded behavior in terms of the precision of the model and the extent to which the model generalizes the behavior in the log. Furthermore, many measures exist to express the complexity of a model irrespective of the log.In this paper, we first discuss several quality dimensions related to process discovery. We further show that existing process discovery algorithms typically consider at most two out of the four main quality dimensions: replay fitness, precision, generalization and simplicity. Moreover, existing approaches cannot steer the discovery process based on user-defined weights for the four quality dimensions.This paper presents the ETM algorithm which allows the user to seamlessly steer the discovery process based on preferences with respect to the four quality dimensions. We show that all dimensions are important for process discovery. However, it only makes sense to consider precision, generalization and simplicity if the replay fitness is acceptable.

Download Full-text

Constructing Regular Expressions from Real-Life Event Logs

Lecture Notes in Computer Science - Analysis of Images, Social Networks and Texts ◽

10.1007/978-3-030-11027-7_26 ◽

2018 ◽

pp. 274-280

Author(s):

Polina D. Tarantsova ◽

Anna A. Kalenkova

Keyword(s):

Real Life ◽

Regular Expressions ◽

Life Event ◽

Event Logs

Download Full-text

Specification-driven predictive business process monitoring

Software & Systems Modeling ◽

10.1007/s10270-019-00761-w ◽

2019 ◽

Vol 19 (6) ◽

pp. 1307-1343

Author(s):

Ario Santoso ◽

Michael Felderer

Keyword(s):

Process Monitoring ◽

Business Process ◽

Real Life ◽

Life Event ◽

Prediction Task ◽

Event Logs ◽

Specific Prediction ◽

Business Process Monitoring ◽

The Given ◽

Future Information

Abstract Predictive analysis in business process monitoring aims at forecasting the future information of a running business process. The prediction is typically made based on the model extracted from historical process execution logs (event logs). In practice, different business domains might require different kinds of predictions. Hence, it is important to have a means for properly specifying the desired prediction tasks, and a mechanism to deal with these various prediction tasks. Although there have been many studies in this area, they mostly focus on a specific prediction task. This work introduces a language for specifying the desired prediction tasks, and this language allows us to express various kinds of prediction tasks. This work also presents a mechanism for automatically creating the corresponding prediction model based on the given specification. Differently from previous studies, instead of focusing on a particular prediction task, we present an approach to deal with various prediction tasks based on the given specification of the desired prediction tasks. We also provide an implementation of the approach which is used to conduct experiments using real-life event logs.

Download Full-text

Improving efficiency for discovering business processes containing invisible tasks in non-free choice

Journal Of Big Data ◽

10.1186/s40537-021-00487-x ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Riyanarto Sarno ◽

Kelly Rossa Sungkono ◽

Muhammad Taufiqulsa’di ◽

Hendra Darmawan ◽

Achmad Fahmi ◽

...

Keyword(s):

Process Model ◽

Free Choice ◽

Business Processes ◽

Computing Time ◽

Time Efficiency ◽

Process Discovery ◽

Event Logs ◽

Event Log ◽

Discovery Algorithms ◽

Formula Method

AbstractProcess discovery helps companies automatically discover their existing business processes based on the vast, stored event log. The process discovery algorithms have been developed rapidly to discover several types of relations, i.e., choice relations, non-free choice relations with invisible tasks. Invisible tasks in non-free choice, introduced by $$\alpha ^{\$ }$$ α $ method, is a type of relationship that combines the non-free choice and the invisible task. $$\alpha ^{\$ }$$ α $ proposed rules of ordering relations of two activities for determining invisible tasks in non-free choice. The event log records sequences of activities, so the rules of $$\alpha ^{\$ }$$ α $ check the combination of invisible task within non-free choice. The checking processes are time-consuming and result in high computing times of $$\alpha ^{\$ }$$ α $ . This research proposes Graph-based Invisible Task (GIT) method to discover efficiently invisible tasks in non-free choice. GIT method develops sequences of business activities as graphs and determines rules to discover invisible tasks in non-free choice based on relationships of the graphs. The analysis of the graph relationships by rules of GIT is more efficient than the iterative process of checking combined activities by $$\alpha ^{\$ }$$ α $ . This research measures the time efficiency of storing the event log and discovering a process model to evaluate GIT algorithm. Graph database gains highest storing computing time of batch event logs; however, this database obtains low storing computing time of streaming event logs. Furthermore, based on an event log with 99 traces, GIT algorithm discovers a process model 42 times faster than α++ and 43 times faster than α$. GIT algorithm can also handle 981 traces, while α++ and α$ has maximum traces at 99 traces. Discovering a process model by GIT algorithm has less time complexity than that by $$\alpha ^{\$ }$$ α $ , wherein GIT obtains $$O(n^{3} )$$ O ( n 3 ) and $$\alpha ^{\$ }$$ α $ obtains $$O(n^{4} )$$ O ( n 4 ) . Those results of the evaluation show a significant improvement of GIT method in term of time efficiency.

Download Full-text

An Experimental Analytics on Discovering Work Transference Networks from Workflow Enactment Event Logs

Applied Sciences ◽

10.3390/app9112368 ◽

2019 ◽

Vol 9 (11) ◽

pp. 2368 ◽

Cited By ~ 1

Author(s):

Hyun Ahn ◽

Dinh-Lam Pham ◽

Kwanghoon Pio Kim

Keyword(s):

Social Network ◽

Human Resource ◽

Real Life ◽

Resource Planning ◽

Life Event ◽

Human Resource Planning ◽

Network Discovery ◽

Event Logs ◽

Organizational Aspect ◽

Enterprise Social Network

Work transference network is a type of enterprise social network centered on the interactions among performers participating in the workflow processes. It is thought that the work transference networks hidden in workflow enactment histories are able to denote not only the structure of the enterprise social network among performers but also imply the degrees of relevancy and intensity between them. The purpose of this paper is to devise a framework that can discover and analyze work transference networks from workflow enactment event logs. The framework includes a series of conceptual definitions to formally describe the overall procedure of the network discovery. To support this conceptual framework, we implement a system that provides functionalities for the discovery, analysis and visualization steps. As a sanity check for the framework, we carry out a mining experiment on a dataset of real-life event logs by using the implemented system. The experiment results show that the framework is valid in discovering transference networks correctly and providing primitive knowledge pertaining to the discovered networks. Finally, we expect that the analytics of the work transference network facilitates assessing the workflow fidelity in human resource planning and its observed performance, and eventually enhances the workflow process from the organizational aspect.

Download Full-text

Discovering high-level BPMN process models from event data

Business Process Management Journal ◽

10.1108/bpmj-02-2018-0051 ◽

2019 ◽

Vol 25 (5) ◽

pp. 995-1019 ◽

Cited By ~ 1

Author(s):

Anna Kalenkova ◽

Andrea Burattin ◽

Massimiliano de Leoni ◽

Wil van der Aalst ◽

Alessandro Sperduti

Keyword(s):

Process Model ◽

Process Mining ◽

Real Life ◽

Process Models ◽

Holistic View ◽

Life Event ◽

Content Type ◽

Event Logs ◽

Wide Range ◽

High Level

Purpose The purpose of this paper is to demonstrate that process mining techniques can help to discover process models from event logs, using conventional high-level process modeling languages, such as Business Process Model and Notation (BPMN), leveraging their representational bias. Design/methodology/approach The integrated discovery approach presented in this work is aimed to mine: control, data and resource perspectives within one process diagram, and, if possible, construct a hierarchy of subprocesses improving the model readability. The proposed approach is defined as a sequence of steps, performed to discover a model, containing various perspectives and presenting a holistic view of a process. This approach was implemented within an open-source process mining framework called ProM and proved its applicability for the analysis of real-life event logs. Findings This paper shows that the proposed integrated approach can be applied to real-life event logs of information systems from different domains. The multi-perspective process diagrams obtained within the approach are of good quality and better than models discovered using a technique that does not consider hierarchy. Moreover, due to the decomposition methods applied, the proposed approach can deal with large event logs, which cannot be handled by methods that do not use decomposition. Originality/value The paper consolidates various process mining techniques, which were never integrated before and presents a novel approach for the discovery of multi-perspective hierarchical BPMN models. This approach bridges the gap between well-known process mining techniques and a wide range of BPMN-complaint tools.

Download Full-text

Performance of an automated process model discovery – the logistics process of a manufacturing company

Engineering Management in Production and Services ◽

10.2478/emj-2019-0014 ◽

2019 ◽

Vol 11 (2) ◽

pp. 106-118

Author(s):

Michal Halaška ◽

Roman Šperka

Keyword(s):

Industry 4.0 ◽

Process Models ◽

Controlled Environment ◽

Process Discovery ◽

Event Logs ◽

Manufacturing Company ◽

Automated Process ◽

Automated Discovery ◽

Logistics Process ◽

Discovery Algorithms

AbstractThe simulation and modelling paradigms have significantly shifted in recent years under the influence of the Industry 4.0 concept. There is a requirement for a much higher level of detail and a lower level of abstraction within the simulation of a modelled system that continuously develops. Consequently, higher demands are placed on the construction of automated process models. Such a possibility is provided by automated process discovery techniques. Thus, the paper aims to investigate the performance of automated process discovery techniques within the controlled environment. The presented paper aims to benchmark the automated discovery techniques regarding realistic simulation models within the controlled environment and, more specifically, the logistics process of a manufacturing company. The study is based on a hybrid simulation of logistics in a manufacturing company that implemented the AnyLogic framework. The hybrid simulation is modelled using the BPMN notation using BIMP, the business process modelling software, to acquire data in the form of event logs. Next, five chosen automated process discovery techniques are applied to the event logs, and the results are evaluated. Based on the evaluation of benchmark results received using the chosen discovery algorithms, it is evident that the discovery algorithms have a better overall performance using more extensive event logs both in terms of fitness and precision. Nevertheless, the discovery techniques perform better in the case of smaller data sets, with less complex process models. Typically, automated discovery techniques have to address scalability issues due to the high amount of data present in the logs. However, as demonstrated, the process discovery techniques can also encounter issues of opposite nature. While discovery techniques typically have to address scalability issues due to large datasets, in the case of companies with long delivery cycles, long processing times and parallel production, which is common for the industrial sector, they have to address issues with incompleteness and lack of information in datasets. The management of business companies is becoming essential for companies to stay competitive through efficiency. The issues encountered within the simulation model will be amplified through both vertical and horizontal integration of the supply chain within the Industry 4.0. The impact of vertical integration in the BPMN model and the chosen case identifier is demonstrated. Without the assumption of smart manufacturing, it would be impossible to use a single case identifier throughout the entire simulation. The entire process would have to be divided into several subprocesses.

Download Full-text

Filtering Infrequent Behavior in Business Process Discovery by Using the Minimum Expectation

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2020040101 ◽

2020 ◽

Vol 14 (2) ◽

pp. 1-15

Author(s):

Ying Huang ◽

Liyun Zhong ◽

Yan Chen

Keyword(s):

Negative Influence ◽

Large Datasets ◽

Process Models ◽

Process Discovery ◽

Event Logs ◽

Event Log ◽

Process Execution ◽

Process Event ◽

Discovery Algorithms

The aim of process discovery is to discover process models from the process execution data stored in event logs. In the era of “Big Data,” one of the key challenges is to analyze the large amounts of collected data in meaningful and scalable ways. Most process discovery algorithms assume that all the data in an event log fully comply with the process execution specification, and the process event logs are no exception. However, real event logs contain large amounts of noise and data from irrelevant infrequent behavior. The infrequent behavior or noise has a negative influence on the process discovery procedure. This article presents a technique to remove infrequent behavior from event logs by calculating the minimum expectation of the process event log. The method was evaluated in detail, and the results showed that its application in existing process discovery algorithms significantly improves the quality of the discovered process models and that it scales well to large datasets.

Download Full-text