scholarly journals Vector space models for trace clustering: a comparative study

2021 ◽  
Author(s):  
Mateus Alex dos Santos Luna ◽  
André Paulino Lima ◽  
Thaís Rodrigues Neubauer ◽  
Marcelo Fantinato ◽  
Sarajane Marques Peres

Process mining explores event logs to offer valuable insights to business process managers. Some types of business processes are hard to mine, including unstructured and knowledge-intensive processes. Then, trace clustering is usually applied to event logs aiming to break it into sublogs, making it more amenable to the typical process mining task. However, applying clustering algorithms involves decisions, such as how traces are represented, that can lead to better results. In this paper, we compare four vector space models for trace clustering, using them with an agglomerative clustering algorithm in synthetic and real-world event logs. Our analyses suggest the embeddings-based vector space model can properly handle trace clustering in unstructured processes.

Author(s):  
Debby Cintia Ganesha Putri ◽  
Jenq-Shiou Leu ◽  
Pavel Seda

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and, Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used Average Similarity, Computational Time, Association Rule with Apriori algorithm, and Clustering Performance Evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies-Bouldin Index.


2021 ◽  
pp. 73-82
Author(s):  
Dorina Bano ◽  
Tom Lichtenstein ◽  
Finn Klessascheck ◽  
Mathias Weske

Process mining is widely adopted in organizations to gain deep insights about running business processes. This can be achieved by applying different process mining techniques like discovery, conformance checking, and performance analysis. These techniques are applied on event logs, which need to be extracted from the organization’s databases beforehand. This not only implies access to databases, but also detailed knowledge about the database schema, which is often not available. In many real-world scenarios, however, process execution data is available as redo logs. Such logs are used to bring a database into a consistent state in case of a system failure. This paper proposes a semi-automatic approach to extract an event log from redo logs alone. It does not require access to the database or knowledge of the databaseschema. The feasibility of the proposed approach is evaluated on two synthetic redo logs.


Author(s):  
Diogo R. Ferreira

This chapter introduces the principles of sequence clustering and presents two case studies where the technique is used to discover behavioral patterns in event logs. In the first case study, the goal is to understand the way members of a software team perform their daily work, and the application of sequence clustering reveals a set of behavioral patterns that are related to some of the main processes being carried out by that team. In the second case study, the goal is to analyze the event history recorded in a technical support database in order to determine whether the recorded behavior complies with a predefined issue handling process. In this case, the application of sequence clustering confirms that all behavioral patterns share a common trend that resembles the original process. Throughout the chapter, special attention is given to the need for data preprocessing in order to obtain results that provide insight into the typical behavior of business processes.


2021 ◽  
Vol 11 (4) ◽  
pp. 1876
Author(s):  
Julijana Lekić ◽  
Dragan Milićev ◽  
Dragan Stanković

Programming by demonstration (PBD) is a technique which allows end users to create, modify, accommodate, and expand programs by demonstrating what the program is supposed to do. Although the ideal of common-purpose programming by demonstration or by examples has been rejected as practically unrealistic, this approach has found its application and shown potentials when limited to specific narrow domains and ranges of applications. In this paper, the original method of applying the principles of programming by demonstration in the area of process mining (PM) to interactive construction of block-structured parallel business processes models is presented. A technique and tool that enable interactive process mining and incremental discovery of process models have been described in this paper. The idea is based on the following principle: using a demonstrational user interface, a user demonstrates scenarios of execution of parallel business process activities, and the system gives a generalized model process specification. A modified process mining technique with the α|| algorithm applied on weakly complete event logs is used for creating parallel business process models using demonstration.


2016 ◽  
Vol 8 (2) ◽  
pp. 18-28 ◽  
Author(s):  
Ana Pajić ◽  
Dragana Bečejski-Vujaklija

Enterprise Resource Planning (ERP) systems handle a huge amount of data related to the actual execution of business processes and the goal is to discover from transaction log a model of how the business processes are actually carried out. The authors' work captures the knowledge of existing approaches and tools in converting the data from transaction logs to event logs for process mining techniques. They conduct a detailed analysis of the artifact-centric approach concepts and describe its constructs by the ontological metamodel. The underlying logical and semantically rich structure of the approach is presented through the model definition. The paper specifies how concepts of the data source are mapped onto the concept of the event log. Dynamics NAV ERP system is used as an example to illustrate the data-oriented structure of ERP system.


Author(s):  
MingJing Tang ◽  
Tong Li ◽  
Rui Zhu ◽  
ZiFei Ma

Background: Event log data generated in the software development process contains historical information and future trends of software development activities. The mining and analysis of event log data contribute to identify and discover software development activities and provide effective support for software development process mining and modeling. Method: Firstly, deep learning model (Word2vec) has used for feature extraction and vectorization of software development process event logs. Then, K-means clustering algorithm and silhouette coefficient measure has used for clustering and clustering effect evaluation of vectorized software development process event logs. Results: This paper obtained the mapping relationship between software development activities and events, and realized the identification and discovery of software development activities. Conclusion: A practical software development project (jEdit) is given to prove the feasibility, rationality and effectiveness of our proposed method. This work provides effective support for software development process mining and software development behavior guidance.


Symmetry ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 185 ◽  
Author(s):  
Debby Cintia Ganesha Putri ◽  
Jenq-Shiou Leu ◽  
Pavel Seda

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used average similarity, computational time, association rule with Apriori algorithm, and clustering performance evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies–Bouldin Index.


2020 ◽  
pp. 016555152096805
Author(s):  
Mete Eminagaoglu

There are various models, methodologies and algorithms that can be used today for document classification, information retrieval and other text mining applications and systems. One of them is the vector space–based models, where distance metrics or similarity measures lie at the core of such models. Vector space–based model is one of the fast and simple alternatives for the processing of textual data; however, its accuracy, precision and reliability still need significant improvements. In this study, a new similarity measure is proposed, which can be effectively used for vector space models and related algorithms such as k-nearest neighbours ( k-NN) and Rocchio as well as some clustering algorithms such as K-means. The proposed similarity measure is tested with some universal benchmark data sets in Turkish and English, and the results are compared with some other standard metrics such as Euclidean distance, Manhattan distance, Chebyshev distance, Canberra distance, Bray–Curtis dissimilarity, Pearson correlation coefficient and Cosine similarity. Some successful and promising results have been obtained, which show that this proposed similarity measure could be alternatively used within all suitable algorithms and models for information retrieval, document clustering and text classification.


2021 ◽  
Vol 28 (1) ◽  
pp. 22-38
Author(s):  
Zineb Lamghari ◽  
Maryam Radgui ◽  
Rajaa Saidi ◽  
Moulay Driss Rahmani

The refined process mining framework contains a set of activities that use extracted information from event logs, discovered models and normative ones. Among these activities, we find those dealing with running events in a Structured Business Process (SBP) context, which are the Detect, the Predict and the Recommend activities. These three activities are nominated as an operational support system that aims at detecting deviations, predicting events and recommending actions. In this regard, operational support systems perform well on SBP while, it stills a challenging task for an Unstructured Business Process (UBP). This puts forward the difficulty of predicting events and recommending actions for UBP, because of its complex structure. In this context, simplification and structuring operations must be applied. Therefore, the intervention of other process mining activities is required for business process simplification and structuring. To this end, we present an operational support approach dealing with UBP, using the refined process mining framework activities.


Sign in / Sign up

Export Citation Format

Share Document