Event Log Preprocessing for Process Mining: A Review

Process Mining allows organizations to obtain actual business process models from event logs (discovery), to compare the event log or the resulting process model in the discovery task with the existing reference model of the same process (conformance), and to detect issues in the executed process to improve (enhancement). An essential element in the three tasks of process mining (discovery, conformance, and enhancement) is data cleaning, used to reduce the complexity inherent to real-world event data, to be easily interpreted, manipulated, and processed in process mining tasks. Thus, new techniques and algorithms for event data preprocessing have been of interest in the research community in business process. In this paper, we conduct a systematic literature review and provide, for the first time, a survey of relevant approaches of event data preprocessing for business process mining tasks. The aim of this work is to construct a categorization of techniques or methods related to event data preprocessing and to identify relevant challenges around these techniques. We present a quantitative and qualitative analysis of the most popular techniques for event log preprocessing. We also study and present findings about how a preprocessing technique can improve a process mining task. We also discuss the emerging future challenges in the domain of data preprocessing, in the context of process mining. The results of this study reveal that the preprocessing techniques in process mining have demonstrated a high impact on the performance of the process mining tasks. The data cleaning requirements are dependent on the characteristics of the event logs (voluminous, a high variability in the set of traces size, changes in the duration of the activities. In this scenario, most of the surveyed works use more than a single preprocessing technique to improve the quality of the event log. Trace-clustering and trace/event level filtering resulted in being the most commonly used preprocessing techniques due to easy of implementation, and they adequately manage noise and incompleteness in the event logs.

Download Full-text

Towards Aspects Identification in Business Process Through Process Mining

10.5753/sbsi.2015.5883 ◽

2015 ◽

Cited By ~ 1

Author(s):

Bruna Brandão ◽

Flávia Santoro ◽

Leonardo Azevedo

Keyword(s):

Programming Languages ◽

Business Process ◽

Process Model ◽

Process Mining ◽

Process Models ◽

Preliminary Evaluation ◽

Crosscutting Concerns ◽

Business Process Models ◽

Event Logs ◽

Initial Results

In business process models, elements can be scattered (repeated) within different processes, making it difficult to handle changes, analyze process for improvements, or check crosscutting impacts. These scattered elements are named as Aspects. Similar to the aspect-oriented paradigm in programming languages, in BPM, aspect handling has the goal to modularize the crosscutting concerns spread across the models. This process modularization facilitates the management of the process (reuse, maintenance and understanding). The current approaches for aspect identification are made manually; thus, resulting in the problem of subjectivity and lack of systematization. This paper proposes a method to automatically identify aspects in business process from its event logs. The method is based on mining techniques and it aims to solve the problem of the subjectivity identification made by specialists. The initial results from a preliminary evaluation showed evidences that the method identified correctly the aspects present in the process model.

Download Full-text

An Integrated Approach for Discovering Process Models According to Business Process Types

ASM Science Journal ◽

10.32802/asmscj.2021.767 ◽

2021 ◽

Vol 16 ◽

pp. 1-14

Author(s):

Zineb Lamghari

Keyword(s):

Business Process ◽

Domain Knowledge ◽

Process Model ◽

Process Mining ◽

Integrated Approach ◽

Process Models ◽

Management Support ◽

Event Data ◽

Process Discovery ◽

Discovery Algorithms

Process discovery technique aims at automatically generating a process model that accurately describes a Business Process (BP) based on event data. Related discovery algorithms consider recorded events are only resulting from an operational BP type. While the management community defines three BP types, which are: Management, Support and Operational. They distinguish each BP type by different proprieties like the main business process objective as domain knowledge. This puts forward the lack of process discovery technique in obtaining process models according to business process types (Management and Support). In this paper, we demonstrate that business process types can guide the process discovery technique in generating process models. A special interest is given to the use of process mining to deal with this challenge.

Download Full-text

Detecting Complex Control-Flow Constructs for Choosing Process Discovery Techniques

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3914.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1389-1393

Keyword(s):

Business Process ◽

Free Choice ◽

Process Mining ◽

Control Flow ◽

Process Models ◽

Event Data ◽

Process Discovery ◽

Event Logs ◽

Complex Process ◽

Mining Algorithms

Process models are the analytical illustration of an organization’s activity. They are very primordial to map out the current business process of an organization, build a baseline of process enhancement and construct future processes where the enhancements are incorporated. To achieve this, in the field of process mining, algorithms have been proposed to build process models using the information recorded in the event logs. However, for complex process configurations, these algorithms cannot correctly build complex process structures. These structures are invisible tasks, non-free choice constructs, and short loops. The ability of each discovery algorithm in discovering the process constructs is different. In this work, we propose a framework responsible of detecting from event logs the complex constructs existing in the data. By identifying the existing constructs, one can choose the process discovery techniques suitable for the event data in question. The proposed framework has been implemented in ProM as a plugin. The evaluation results demonstrate that the constructs can correctly be identified.

Download Full-text

Conformance Checking of Dwelling Time Using a Token-based Method

Journal of Information Systems Engineering and Business Intelligence ◽

10.20473/jisebi.7.2.129-137 ◽

2021 ◽

Vol 7 (2) ◽

pp. 129

Author(s):

Bambang Jokonowo ◽

Nenden Siti Fatonah ◽

Emelia Akashah Patah Akhir

Keyword(s):

Business Process ◽

Process Model ◽

Process Mining ◽

Standard Operating Procedure ◽

Event Data ◽

Container Port ◽

Conformance Checking ◽

Dwelling Time ◽

Event Log ◽

Fitness Calculation

Background: Standard operating procedure (SOP) is a series of business activities to achieve organisational goals, with each activity carried to be recorded and stored in the information system together with its location (e.g., SCM, ERP, LMS, CRM). The activity is known as event data and is stored in a database known as an event log.Objective: Based on the event log, we can calculate the fitness to determine whether the business process SOP is following the actual business process.Methods: This study obtains the event log from a terminal operating system (TOS), which records the dwelling time at the container port. The conformance checking using token-based replay method calculates fitness by comparing the event log with the process model.Results: The findings using the Alpha algorithm resulted in the most traversed traces (a, b, n, o, p). The fitness calculation returns 1.0 were produced, missing, and remaining tokens are replied to each of the other traces.Conclusion: Thus, if the process mining produces a fitness of more than 0.80, this shows that the process model is following the actual business process. Keywords: Conformance Checking, Dwelling time, Event log, Fitness, Process Discovery, Process Mining

Download Full-text

https://ijsea.com/archive/volume10/issue9/IJSEA10091005.pdf

International Journal of Science and Engineering Applications ◽

10.7753/ijsea1009.1006 ◽

2021 ◽

Vol 10 (9) ◽

pp. 144-147

Author(s):

Huiling LI ◽

Xuan SU ◽

Shuaipeng ZHANG

Keyword(s):

Process Model ◽

Large Scale ◽

Process Mining ◽

Data Set ◽

Systems Model ◽

Event Logs ◽

Event Log ◽

Low Efficiency ◽

Sampling Approach

Massive amounts of business process event logs are collected and stored by modern information systems. Model discovery aims to discover a process model from such event logs, however, most of the existing approaches still suffer from low efficiency when facing large-scale event logs. Event log sampling techniques provide an effective scheme to improve the efficiency of process discovery, but the existing techniques still cannot guarantee the quality of model mining. Therefore, a sampling approach based on set coverage algorithm named set coverage sampling approach is proposed. The proposed sampling approach has been implemented in the open-source process mining toolkit ProM. Furthermore, experiments using a real event log data set from conformance checking and time performance analysis show that the proposed event log sampling approach can greatly improve the efficiency of log sampling on the premise of ensuring the quality of model mining.

Download Full-text

Functional Integration with Process Mining and Process Analyzing for Structural and Behavioral Properness Validation of Processes Discovered from Event Log Datasets

Applied Sciences ◽

10.3390/app10041493 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1493 ◽

Cited By ~ 1

Author(s):

Kwanghoon Pio Kim

Keyword(s):

Process Model ◽

Large Scale ◽

Process Mining ◽

Functional Integration ◽

Structural Complexity ◽

Integrated Approach ◽

Parallel Process ◽

Process Models ◽

Massively Parallel ◽

Event Log

In this paper, we propose an integrated approach for seamlessly and effectively providing the mining and the analyzing functionalities to redesigning work for very large-scale and massively parallel process models that are discovered from their enactment event logs. The integrated approach especially aims at analyzing not only their structural complexity and correctness but also their animation-based behavioral properness, and becomes concretized to a sophisticated analyzer. The core function of the analyzer is to discover a very large-scale and massively parallel process model from a process log dataset and to validate the structural complexity and the syntactical and behavioral properness of the discovered process model. Finally, this paper writes up the detailed description of the system architecture with its functional integration of process mining and process analyzing. More precisely, we excogitate a series of functional algorithms for extracting the structural constructs and for visualizing the behavioral properness of those discovered very large-scale and massively parallel process models. As experimental validation, we apply the proposed approach and analyzer to a couple of process enactment event log datasets available on the website of the 4TU.Centre for Research Data.

Download Full-text

Functional Integration with Process Mining and Process Analyzing for Structural and Behavioral Properness Validation of Discovered Processes from Event Log Datasets

10.20944/preprints202002.0122.v1 ◽

2020 ◽

Author(s):

Kwanghoon Kim

Keyword(s):

Process Model ◽

Large Scale ◽

Process Mining ◽

Functional Integration ◽

Structural Complexity ◽

Integrated Approach ◽

Parallel Process ◽

Process Models ◽

Massively Parallel ◽

Event Log

Process (or business process) management systems fulfill defining, executing, monitoring and managing process models deployed on process-aware enterprises. Accordingly, the functional formation of the systems is made up of three subsystems such as modeling subsystem, enacting subsystem and mining subsystem. In recent times, the mining subsystem has been becoming an essential subsystem. Many enterprises have successfully completed the introduction and application of the process automation technology through the modeling subsystem and the enacting subsystem. According as the time has come to the phase of redesigning and reengineering the deployed process models, from now on it is important for the mining subsystem to cooperate with the analyzing subsystem; the essential cooperation capability is to provide seamless integrations between the designing works with the modeling subsystem and the redesigning work with the mining subsystem. In other words, we need to seamlessly integrate the discovery functionality of the mining subsystem and the analyzing functionality of the modeling subsystem. This integrated approach might be suitable very well when those deployed process models discovered by the mining subsystem are complex and very large-scaled, in particular. In this paper, we propose an integrated approach for seamlessly as well as effectively providing the mining and the analyzing functionalities to the redesigning work on very large-scale and massively parallel process models that are discovered from their enactment event logs. The integrated approach especially aims at analyzing not only their structural complexity and correctness but also their animation-based behavioral properness, and becomes concretized to a sophisticated analyzer. The core function of the analyzer is to discover a very large-scale and massively parallel process model from a process log dataset and to validate the structural complexity and the syntactical and behavioral properness of the discovered process model. Finally, this paper writes up the detailed description of the system architecture with its functional integration of process mining and process analyzing. And more precisely, we excogitate a series of functional algorithms for extracting the structural constructs as well as for visualizing the behavioral properness on those discovered very large-scale and massively parallel process models. As experimental validation, we apply the proposed approach and analyzer to a couple of process enactment event log datasets available on the website of the 4TU.Centre for Research Data.

Download Full-text

Conformance Checking Techniques of Process Mining: A Survey

10.3233/apc210213 ◽

2021 ◽

Author(s):

Ashok Kumar Saini ◽

Ruchi Kamra ◽

Utpal Shrivastava

Keyword(s):

Information Systems ◽

Process Model ◽

Process Mining ◽

Quality Parameters ◽

Process Models ◽

Conformance Checking ◽

Business Goals ◽

Key Concepts ◽

Event Log ◽

Challenges And Opportunities

Conformance Checking (CC) techniques enable us to gives the deviation between modelled behavior and actual execution behavior. The majority of organizations have Process-Aware Information Systems for recording the insights of the system. They have the process model to show how the process will be executed. The key intention of Process Mining is to extracting facts from the event log and used them for analysis, ratification, improvement, and redesigning of a process. Researchers have proposed various CC techniques for specific applications and process models. This paper has a detailed study of key concepts and contributions of Process Mining. It also helps in achieving business goals. The current challenges and opportunities in Process Mining are also discussed. The survey is based on CC techniques proposed by researchers with key objectives like quality parameters, perspective, algorithm types, tools, and achievements.

Download Full-text

An Optimization Approach for Mining of Process Models with Infrequent Behaviors Integrating Data Flow and Control Flow

Scientific Programming ◽

10.1155/2021/8874316 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Li-li Wang ◽

Xian-wen Fang ◽

Esther Asare ◽

Fang Huan

Keyword(s):

Process Model ◽

Data Flow ◽

Process Mining ◽

Control Flow ◽

Process Models ◽

Frequent Pattern ◽

Optimization Approach ◽

Event Log ◽

Flow Information ◽

And Control

Infrequent behaviors of business process refer to behaviors that occur in very exceptional cases, and their occurrence frequency is low as their required conditions are rarely fulfilled. Hence, a strong coupling relationship between infrequent behavior and data flow exists. Furthermore, some infrequent behaviors may reveal very important information about the process. Thus, not all infrequent behaviors should be disregarded as noise, and identifying infrequent but correct behaviors in the event log is vital to process mining from the perspective of data flow. Existing process mining approaches construct a process model from frequent behaviors in the event log, mostly concentrating on control flow only, without considering infrequent behavior and data flow information. In this paper, we focus on data flow to extract infrequent but correct behaviors from logs. For an infrequent trace, frequent patterns and interactive behavior profiles are combined to find out which part of the behavior in the trace occurs in low frequency. And, conditional dependency probability is used to analyze the influence strength of the data flow information on infrequent behavior. An approach for identifying effective infrequent behaviors based on the frequent pattern under data awareness is proposed correspondingly. Subsequently, an optimization approach for mining of process models with infrequent behaviors integrating data flow and control flow is also presented. The experiments on synthetic and real-life event logs show that the proposed approach can distinguish effective infrequent behaviors from noise compared with others. The proposed approaches greatly improve the fitness of the mined process model without significantly decreasing its precision.

Download Full-text

A task-level parallelism approach for process discovery

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.14748 ◽

2018 ◽

Vol 7 (4) ◽

pp. 2446

Author(s):

Muktikanta Sahu ◽

Rupjit Chakraborty ◽

Gopal Krishna Nayak

Keyword(s):

Process Model ◽

Process Mining ◽

Programming Model ◽

Parallel Implementation ◽

Primary Objective ◽

Process Models ◽

Task Parallelism ◽

Process Discovery ◽

Event Logs ◽

Computationally Intensive

Building process models from the available data in the event logs is the primary objective of Process discovery. Alpha algorithm is one of the popular algorithms accessible for ascertaining a process model from the event logs in process mining. The steps involved in the Alpha algorithm are computationally rigorous and this problem further manifolds with the exponentially increasing event log data. In this work, we have exploited task parallelism in the Alpha algorithm for process discovery by using MPI programming model. The proposed work is based on distributed memory parallelism available in MPI programming for performance improvement. Independent and computationally intensive steps in the Alpha algorithm are identified and task parallelism is exploited. The execution time of serial as well as parallel implementation of Alpha algorithm are measured and used for calculating the extent of speedup achieved. The maximum and minimum speedups obtained are 3.97x and 3.88x respectively with an average speedup of 3.94x.

Download Full-text