Vector space models for trace clustering: a comparative study

Process mining explores event logs to offer valuable insights to business process managers. Some types of business processes are hard to mine, including unstructured and knowledge-intensive processes. Then, trace clustering is usually applied to event logs aiming to break it into sublogs, making it more amenable to the typical process mining task. However, applying clustering algorithms involves decisions, such as how traces are represented, that can lead to better results. In this paper, we compare four vector space models for trace clustering, using them with an agglomerative clustering algorithm in synthetic and real-world event logs. Our analyses suggest the embeddings-based vector space model can properly handle trace clustering in unstructured processes.

Download Full-text

Process mining for knowledge-intensive business processes

Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business - i-KNOW '15 ◽

10.1145/2809563.2809580 ◽

2015 ◽

Cited By ~ 3

Author(s):

Marian Benner-Wickner ◽

Tobias Brückmann ◽

Volker Gruhn ◽

Matthias Book

Keyword(s):

Business Processes ◽

Process Mining ◽

Knowledge Intensive

Download Full-text

Design of an Unsupervised Machine Learning-Based Movie Recommender System

10.20944/preprints202001.0124.v1 ◽

2020 ◽

Author(s):

Debby Cintia Ganesha Putri ◽

Jenq-Shiou Leu ◽

Pavel Seda

Keyword(s):

Recommender System ◽

Clustering Algorithm ◽

System Development ◽

Clustering Algorithms ◽

Mean Shift ◽

Computational Time ◽

Agglomerative Clustering ◽

Method Performance ◽

Cluster Validity Indices ◽

Validity Indices

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and, Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used Average Similarity, Computational Time, Association Rule with Apriori algorithm, and Clustering Performance Evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies-Bouldin Index.

Download Full-text

Database-Less Extraction of Event Logs from Redo Logs

Business Information Systems ◽

10.52825/bis.v1i.66 ◽

2021 ◽

pp. 73-82

Author(s):

Dorina Bano ◽

Tom Lichtenstein ◽

Finn Klessascheck ◽

Mathias Weske

Keyword(s):

Performance Analysis ◽

Real World ◽

Business Processes ◽

Process Mining ◽

Detailed Knowledge ◽

System Failure ◽

Conformance Checking ◽

Event Logs ◽

Event Log ◽

And Performance

Process mining is widely adopted in organizations to gain deep insights about running business processes. This can be achieved by applying different process mining techniques like discovery, conformance checking, and performance analysis. These techniques are applied on event logs, which need to be extracted from the organization’s databases beforehand. This not only implies access to databases, but also detailed knowledge about the database schema, which is often not available. In many real-world scenarios, however, process execution data is available as redo logs. Such logs are used to bring a database into a consistent state in case of a system failure. This paper proposes a semi-automatic approach to extract an event log from redo logs alone. It does not require access to the database or knowledge of the databaseschema. The feasibility of the proposed approach is evaluated on two synthetic redo logs.

Download Full-text

Applied Sequence Clustering Techniques for Process Mining

Handbook of Research on Business Process Modeling ◽

10.4018/978-1-60566-288-6.ch022 ◽

2011 ◽

pp. 481-502 ◽

Cited By ~ 6

Author(s):

Diogo R. Ferreira

Keyword(s):

Business Processes ◽

Process Mining ◽

Behavioral Patterns ◽

Daily Work ◽

Sequence Clustering ◽

Event Logs ◽

Common Trend ◽

First Case ◽

Insight Into

This chapter introduces the principles of sequence clustering and presents two case studies where the technique is used to discover behavioral patterns in event logs. In the first case study, the goal is to understand the way members of a software team perform their daily work, and the application of sequence clustering reveals a set of behavioral patterns that are related to some of the main processes being carried out by that team. In the second case study, the goal is to analyze the event history recorded in a technical support database in order to determine whether the recorded behavior complies with a predefined issue handling process. In this case, the application of sequence clustering confirms that all behavioral patterns share a common trend that resembles the original process. Throughout the chapter, special attention is given to the need for data preprocessing in order to obtain results that provide insight into the typical behavior of business processes.

Download Full-text

Generating Block-Structured Parallel Process Models by Demonstration

Applied Sciences ◽

10.3390/app11041876 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1876

Author(s):

Julijana Lekić ◽

Dragan Milićev ◽

Dragan Stanković

Keyword(s):

Business Process ◽

Business Processes ◽

Process Mining ◽

Original Method ◽

Process Models ◽

Programming By Demonstration ◽

Event Logs ◽

Process Specification ◽

Common Purpose ◽

Block Structured

Programming by demonstration (PBD) is a technique which allows end users to create, modify, accommodate, and expand programs by demonstrating what the program is supposed to do. Although the ideal of common-purpose programming by demonstration or by examples has been rejected as practically unrealistic, this approach has found its application and shown potentials when limited to specific narrow domains and ranges of applications. In this paper, the original method of applying the principles of programming by demonstration in the area of process mining (PM) to interactive construction of block-structured parallel business processes models is presented. A technique and tool that enable interactive process mining and incremental discovery of process models have been described in this paper. The idea is based on the following principle: using a demonstrational user interface, a user demonstrates scenarios of execution of parallel business process activities, and the system gives a generalized model process specification. A modified process mining technique with the α|| algorithm applied on weakly complete event logs is used for creating parallel business process models using demonstration.

Download Full-text

Metamodel of the Artifact-Centric Approach to Event Log Extraction from ERP Systems

International Journal of Decision Support System Technology ◽

10.4018/ijdsst.2016040102 ◽

2016 ◽

Vol 8 (2) ◽

pp. 18-28 ◽

Cited By ~ 5

Author(s):

Ana Pajić ◽

Dragana Bečejski-Vujaklija

Keyword(s):

Enterprise Resource Planning ◽

Business Processes ◽

Process Mining ◽

Resource Planning ◽

Erp Systems ◽

Huge Amount ◽

Erp System ◽

Event Logs ◽

Event Log ◽

Data Source

Enterprise Resource Planning (ERP) systems handle a huge amount of data related to the actual execution of business processes and the goal is to discover from transaction log a model of how the business processes are actually carried out. The authors' work captures the knowledge of existing approaches and tools in converting the data from transaction logs to event logs for process mining techniques. They conduct a detailed analysis of the artifact-centric approach concepts and describe its constructs by the ontological metamodel. The underlying logical and semantically rich structure of the approach is presented through the model definition. The paper specifies how concepts of the data source are mapped onto the concept of the event log. Dynamics NAV ERP system is used as an example to illustrate the data-oriented structure of ERP system.

Download Full-text

A Cluster Analysis Method of Software Development Activities Based on Event Log

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191204144931 ◽

2019 ◽

Vol 13 ◽

Author(s):

MingJing Tang ◽

Tong Li ◽

Rui Zhu ◽

ZiFei Ma

Keyword(s):

Software Development ◽

Development Process ◽

Clustering Algorithm ◽

Process Mining ◽

Development Project ◽

Software Development Process ◽

Log Data ◽

Event Logs ◽

Event Log ◽

Process Event

Background: Event log data generated in the software development process contains historical information and future trends of software development activities. The mining and analysis of event log data contribute to identify and discover software development activities and provide effective support for software development process mining and modeling. Method: Firstly, deep learning model (Word2vec) has used for feature extraction and vectorization of software development process event logs. Then, K-means clustering algorithm and silhouette coefficient measure has used for clustering and clustering effect evaluation of vectorized software development process event logs. Results: This paper obtained the mapping relationship between software development activities and events, and realized the identification and discovery of software development activities. Conclusion: A practical software development project (jEdit) is given to prove the feasibility, rationality and effectiveness of our proposed method. This work provides effective support for software development process mining and software development behavior guidance.

Download Full-text

Design of an Unsupervised Machine Learning-Based Movie Recommender System

Symmetry ◽

10.3390/sym12020185 ◽

2020 ◽

Vol 12 (2) ◽

pp. 185 ◽

Cited By ~ 3

Author(s):

Debby Cintia Ganesha Putri ◽

Jenq-Shiou Leu ◽

Pavel Seda

Keyword(s):

Recommender System ◽

Clustering Algorithm ◽

System Development ◽

Clustering Algorithms ◽

Mean Shift ◽

Computational Time ◽

Agglomerative Clustering ◽

Method Performance ◽

Cluster Validity Indices ◽

Validity Indices

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used average similarity, computational time, association rule with Apriori algorithm, and clustering performance evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies–Bouldin Index.

Download Full-text

A new similarity measure for vector space models in text classification and information retrieval

Journal of Information Science ◽

10.1177/0165551520968055 ◽

2020 ◽

pp. 016555152096805

Author(s):

Mete Eminagaoglu

Keyword(s):

Information Retrieval ◽

Vector Space ◽

Similarity Measure ◽

Text Classification ◽

Pearson Correlation ◽

Clustering Algorithms ◽

Similarity Measures ◽

Manhattan Distance ◽

Vector Space Models ◽

Classification Information

There are various models, methodologies and algorithms that can be used today for document classification, information retrieval and other text mining applications and systems. One of them is the vector space–based models, where distance metrics or similarity measures lie at the core of such models. Vector space–based model is one of the fast and simple alternatives for the processing of textual data; however, its accuracy, precision and reliability still need significant improvements. In this study, a new similarity measure is proposed, which can be effectively used for vector space models and related algorithms such as k-nearest neighbours ( k-NN) and Rocchio as well as some clustering algorithms such as K-means. The proposed similarity measure is tested with some universal benchmark data sets in Turkish and English, and the results are compared with some other standard metrics such as Euclidean distance, Manhattan distance, Chebyshev distance, Canberra distance, Bray–Curtis dissimilarity, Pearson correlation coefficient and Cosine similarity. Some successful and promising results have been obtained, which show that this proposed similarity measure could be alternatively used within all suitable algorithms and models for information retrieval, document clustering and text classification.

Download Full-text

An operational support approach for Mining Unstructured Business Processes

Revista de Informática Teórica e Aplicada ◽

10.22456/2175-2745.106277 ◽

2021 ◽

Vol 28 (1) ◽

pp. 22-38

Author(s):

Zineb Lamghari ◽

Maryam Radgui ◽

Rajaa Saidi ◽

Moulay Driss Rahmani

Keyword(s):

Business Process ◽

Support System ◽

Business Processes ◽

Support Systems ◽

Process Mining ◽

Complex Structure ◽

Mining Activities ◽

Event Logs ◽

Operational Support System

The refined process mining framework contains a set of activities that use extracted information from event logs, discovered models and normative ones. Among these activities, we find those dealing with running events in a Structured Business Process (SBP) context, which are the Detect, the Predict and the Recommend activities. These three activities are nominated as an operational support system that aims at detecting deviations, predicting events and recommending actions. In this regard, operational support systems perform well on SBP while, it stills a challenging task for an Unstructured Business Process (UBP). This puts forward the difficulty of predicting events and recommending actions for UBP, because of its complex structure. In this context, simplification and structuring operations must be applied. Therefore, the intervention of other process mining activities is required for business process simplification and structuring. To this end, we present an operational support approach dealing with UBP, using the refined process mining framework activities.

Download Full-text