Text-Aware Predictive Monitoring of Business Processes

Business Information Systems ◽

10.52825/bis.v1i.62 ◽

2021 ◽

pp. 221-232

Author(s):

Marco Pegoraro ◽

Merih Seran Uysal ◽

David Benedikt Georgi ◽

Wil M.P Van der Aalst

Keyword(s):

Natural Language ◽

Business Processes ◽

Short Term Memory ◽

Control Flow ◽

Language Models ◽

Historical Event ◽

Prediction Methods ◽

Event Data ◽

Text Documents ◽

Process Prediction

The real-time prediction of business processes using historical event data is an important capability of modern business process monitoring systems. Existing process prediction methods are able to also exploit the data perspective of recorded events, in addition to the control-flow perspective. However, while well-structured numerical or categorical attributes are considered in many prediction techniques, almost no technique is able to utilize text documents written in natural language, which can hold information critical to the prediction task. In this paper, we illustrate the design, implementation, and evaluation of a novel text-aware process prediction model based on Long Short-Term Memory (LSTM) neural networks and natural language models. The proposed model can take categorical, numerical and textual attributes in event data into account to predict the activity and timestamp of the next event, the outcome, and the cycle time of a running process instance. Experiments show that the text-aware model is able to outperform state-of-the-art process prediction methods on simulated and real-world event logs containing textual data.

Download Full-text

A Context-Aware Neural Embedding for Function-Level Vulnerability Detection

Algorithms ◽

10.3390/a14110335 ◽

2021 ◽

Vol 14 (11) ◽

pp. 335

Author(s):

Hongwei Wei ◽

Guanjun Lin ◽

Lin Li ◽

Heming Jia

Keyword(s):

Data Flow ◽

Short Term Memory ◽

Source Code ◽

Control Flow ◽

Language Models ◽

Software Systems ◽

Support Vector ◽

Context Aware ◽

Vulnerability Detection ◽

Feature Representations

Exploitable vulnerabilities in software systems are major security concerns. To date, machine learning (ML) based solutions have been proposed to automate and accelerate the detection of vulnerabilities. Most ML techniques aim to isolate a unit of source code, be it a line or a function, as being vulnerable. We argue that a code segment is vulnerable if it exists in certain semantic contexts, such as the control flow and data flow; therefore, it is important for the detection to be context aware. In this paper, we evaluate the performance of mainstream word embedding techniques in the scenario of software vulnerability detection. Based on the evaluation, we propose a supervised framework leveraging pre-trained context-aware embeddings from language models (ELMo) to capture deep contextual representations, further summarized by a bidirectional long short-term memory (Bi-LSTM) layer for learning long-range code dependency. The framework takes directly a source code function as an input and produces corresponding function embeddings, which can be treated as feature sets for conventional ML classifiers. Experimental results showed that the proposed framework yielded the best performance in its downstream detection tasks. Using the feature representations generated by our framework, random forest and support vector machine outperformed four baseline systems on our data sets, demonstrating that the framework incorporated with ELMo can effectively capture the vulnerable data flow patterns and facilitate the vulnerability detection task.

Download Full-text

A Hybrid Siamese Neural Network for Natural Language Inference in Cyber-Physical Systems

ACM Transactions on Internet Technology ◽

10.1145/3418208 ◽

2021 ◽

Vol 21 (2) ◽

pp. 1-25

Author(s):

Pin Ni ◽

Yuming Li ◽

Gangmin Li ◽

Victor Chang

Keyword(s):

Natural Language ◽

Language Processing ◽

Short Term Memory ◽

Physical World ◽

Heterogeneous Data ◽

Cyber Physical Systems ◽

Physical Systems ◽

Language Data ◽

Text Language ◽

Different Sources

Cyber-Physical Systems (CPS), as a multi-dimensional complex system that connects the physical world and the cyber world, has a strong demand for processing large amounts of heterogeneous data. These tasks also include Natural Language Inference (NLI) tasks based on text from different sources. However, the current research on natural language processing in CPS does not involve exploration in this field. Therefore, this study proposes a Siamese Network structure that combines Stacked Residual Long Short-Term Memory (bidirectional) with the Attention mechanism and Capsule Network for the NLI module in CPS, which is used to infer the relationship between text/language data from different sources. This model is mainly used to implement NLI tasks and conduct a detailed evaluation in three main NLI benchmarks as the basic semantic understanding module in CPS. Comparative experiments prove that the proposed method achieves competitive performance, has a certain generalization ability, and can balance the performance and the number of trained parameters.

Download Full-text

Analyzing Information Leakage of Updates to Natural Language Models

Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security ◽

10.1145/3372297.3417880 ◽

2020 ◽

Author(s):

Santiago Zanella-Béguelin ◽

Lukas Wutschitz ◽

Shruti Tople ◽

Victor Rühle ◽

Andrew Paverd ◽

...

Keyword(s):

Natural Language ◽

Information Leakage ◽

Language Models

Download Full-text

Learning the language of viral evolution and escape

Science ◽

10.1126/science.abd7331 ◽

2021 ◽

Vol 371 (6526) ◽

pp. 284-288 ◽

Cited By ~ 4

Author(s):

Brian Hie ◽

Ellen D. Zhong ◽

Bonnie Berger ◽

Bryan Bryson

Keyword(s):

Immune System ◽

Natural Language ◽

Vaccine Development ◽

Sequence Data ◽

Viral Evolution ◽

Machine Learning Algorithms ◽

Language Models ◽

Viral Escape ◽

Human Immune System ◽

Influenza Hemagglutinin

The ability for viruses to mutate and evade the human immune system and cause infection, called viral escape, remains an obstacle to antiviral and vaccine development. Understanding the complex rules that govern escape could inform therapeutic design. We modeled viral escape with machine learning algorithms originally developed for human natural language. We identified escape mutations as those that preserve viral infectivity but cause a virus to look different to the immune system, akin to word changes that preserve a sentence’s grammaticality but change its meaning. With this approach, language models of influenza hemagglutinin, HIV-1 envelope glycoprotein (HIV Env), and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Spike viral proteins can accurately predict structural escape patterns using sequence data alone. Our study represents a promising conceptual bridge between natural language and viral evolution.

Download Full-text

lncLocator 2.0: a cell-line-specific subcellular localization predictor for long non-coding RNAs with interpretable deep learning

Bioinformatics ◽

10.1093/bioinformatics/btab127 ◽

2021 ◽

Author(s):

Yang Lin ◽

Xiaoyong Pan ◽

Hong-Bin Shen

Keyword(s):

Subcellular Localization ◽

Cell Line ◽

Cell Lines ◽

Short Term Memory ◽

Computational Method ◽

Language Models ◽

Supplementary Information ◽

Deep Model ◽

A Cell ◽

Non Coding Rnas

Abstract Motivation Long non-coding RNAs (lncRNAs) are generally expressed in a tissue-specific way, and subcellular localizations of lncRNAs depend on the tissues or cell lines that they are expressed. Previous computational methods for predicting subcellular localizations of lncRNAs do not take this characteristic into account, they train a unified machine learning model for pooled lncRNAs from all available cell lines. It is of importance to develop a cell-line-specific computational method to predict lncRNA locations in different cell lines. Results In this study, we present an updated cell-line-specific predictor lncLocator 2.0, which trains an end-to-end deep model per cell line, for predicting lncRNA subcellular localization from sequences.We first construct benchmark datasets of lncRNA subcellular localizations for 15 cell lines. Then we learn word embeddings using natural language models, and these learned embeddings are fed into convolutional neural network, long short-term memory and multilayer perceptron to classify subcellular localizations. lncLocator 2.0 achieves varying effectiveness for different cell lines and demonstrates the necessity of training cell-line-specific models. Furthermore, we adopt Integrated Gradients to explain the proposed model in lncLocator 2.0, and find some potential patterns that determine the subcellular localizations of lncRNAs, suggesting that the subcellular localization of lncRNAs is linked to some specific nucleotides. Availability The lncLocator 2.0 is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator2 and the source code can be found at https://github.com/Yang-J-LIN/lncLocator2. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An Evaluation of Patient Safety Event Report Categories Using Unsupervised Topic Modeling

Methods of Information in Medicine ◽

10.3414/me15-01-0010 ◽

2015 ◽

Vol 54 (04) ◽

pp. 338-345 ◽

Cited By ~ 10

Author(s):

A. Fong ◽

R. Ratwani

Keyword(s):

Patient Safety ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Topic Modeling ◽

Free Text ◽

Event Data ◽

Event Type ◽

Modeling Approach ◽

Safety Event

SummaryObjective: Patient safety event data repositories have the potential to dramatically improve safety if analyzed and leveraged appropriately. These safety event reports often consist of both structured data, such as general event type categories, and unstructured data, such as free text descriptions of the event. Analyzing these data, particularly the rich free text narratives, can be challenging, especially with tens of thousands of reports. To overcome the resource intensive manual review process of the free text descriptions, we demonstrate the effectiveness of using an unsupervised natural language processing approach.Methods: An unsupervised natural language processing technique, called topic modeling, was applied to a large repository of patient safety event data to identify topics, or themes, from the free text descriptions of the data. Entropy measures were used to evaluate and compare these topics to the general event type categories that were originally assigned by the event reporter.Results: Measures of entropy demonstrated that some topics generated from the un-supervised modeling approach aligned with the clinical general event type categories that were originally selected by the individual entering the report. Importantly, several new latent topics emerged that were not originally identified. The new topics provide additional insights into the patient safety event data that would not otherwise easily be detected.Conclusion: The topic modeling approach provides a method to identify topics or themes that may not be immediately apparent and has the potential to allow for automatic reclassification of events that are ambiguously classified by the event reporter.

Download Full-text

Efficient natural language classification algorithm for detecting duplicate unsupervised features

Informatics and Automation - Информатика и автоматизация ◽

10.15622/ia.2021.3.5 ◽

2021 ◽

Vol 20 (3) ◽

pp. 623-653

Author(s):

Saud Altaf ◽

Sofia Iqbal ◽

Muhammad Waseem Soomro

Keyword(s):

Natural Language ◽

Short Term Memory ◽

Short Term ◽

Vocabulary Size ◽

Language Understanding ◽

Inverse Document Frequency ◽

Classification Technique ◽

Document Frequency ◽

Text Features ◽

Long Short Term Memory

This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Download Full-text

Predicting Temporal Exceptions in Concurrent Workflows

Models for Capitalizing on Web Engineering Advancements ◽

10.4018/978-1-4666-0023-2.ch011 ◽

2012 ◽

pp. 196-218 ◽

Cited By ~ 1

Author(s):

Iok-Fai Leong ◽

Yain-Whar Si ◽

Robert P. Biuk-Aghai

Keyword(s):

Business Processes ◽

Critical Path ◽

Workflow Management ◽

Business Environment ◽

Complex Problem ◽

Control Flow ◽

Prediction Algorithm ◽

Task Completion ◽

Ripple Effect ◽

Long Duration

Current Workflow Management Systems (WfMS) are capable of managing simultaneous workflows designed to support different business processes of an organization. These departmental workflows are considered to be interrelated since they are often executed concurrently and are required to share a limited number of resources. However, unexpected events from the business environment and lack of proper resources can cause delays in activities. Deadline violations caused by such delays are called temporal exceptions. Predicting temporal exceptions in concurrent workflows is a complex problem since any delay in a task can cause a ripple effect on the remaining tasks from the parent workflow as well as from the other interrelated workflows. In addition, different types of loops are often embedded in the workflows for representing iterative activities, and presence of such control flow patterns in workflows can further increase the difficulty in estimation of task completion time. In this chapter, the authors describe a critical path based approach for predicting temporal exceptions in concurrent workflows that are required to share limited resources. This approach allows predicting temporal exceptions in multiple attempts while workflows are being executed. The accuracy of the proposed prediction algorithm is analyzed based on a number of simulation scenarios. The result shows that the proposed algorithm is effective in predicting exceptions for instances where long duration tasks are scheduled (or executed) at the early phase of the workflow.

Download Full-text

A Detection Method for Abnormal Transactions in E-Commerce Based on Extended Data Flow Conformance Checking

Wireless Communications and Mobile Computing ◽

10.1155/2022/4434714 ◽

2022 ◽

Vol 2022 ◽

pp. 1-14

Author(s):

Yadi Wang ◽

Wangyang Yu ◽

Peng Teng ◽

Guanjun Liu ◽

Dongming Xiang

Keyword(s):

Anomaly Detection ◽

Business Processes ◽

Data Flow ◽

Detection Algorithm ◽

Control Flow ◽

Smart Devices ◽

Integration Model ◽

Whole Process ◽

Mobile Transaction ◽

And Control

With the development of smart devices and mobile communication technologies, e-commerce has spread over all aspects of life. Abnormal transaction detection is important in e-commerce since abnormal transactions can result in large losses. Additionally, integrating data flow and control flow is important in the research of process modeling and data analysis since it plays an important role in the correctness and security of business processes. This paper proposes a novel method of detecting abnormal transactions via an integration model of data and control flows. Our model, called Extended Data Petri net (DPNE), integrates the data interaction and behavior of the whole process from the user logging into the e-commerce platform to the end of the payment, which also covers the mobile transaction process. We analyse the structure of the model, design the anomaly detection algorithm of relevant data, and illustrate the rationality and effectiveness of the whole system model. Through a case study, it is proved that each part of the system can respond well, and the system can judge each activity of every mobile transaction. Finally, the anomaly detection results are obtained by some comprehensive analysis.

Download Full-text

Improving the performance of process discovery algorithms by instance selection

Computer Science and Information Systems ◽

10.2298/csis200127028s ◽

2020 ◽

Vol 17 (3) ◽

pp. 927-958

Author(s):

Mohammadreza Sani ◽

Sebastiaan van Zelst ◽

Aalst van der

Keyword(s):

Process Model ◽

Business Processes ◽

Process Models ◽

Instance Selection ◽

Event Data ◽

Process Discovery ◽

Selection Strategies ◽

Speed Up ◽

The Right ◽

Discovery Algorithms

Process discovery algorithms automatically discover process models based on event data that is captured during the execution of business processes. These algorithms tend to use all of the event data to discover a process model. When dealing with large event logs, it is no longer feasible using standard hardware in limited time. A straightforward approach to overcome this problem is to down-size the event data by means of sampling. However, little research has been conducted on selecting the right sample, given the available time and characteristics of event data. This paper evaluates various subset selection methods and evaluates their performance on real event data. The proposed methods have been implemented in both the ProM and the RapidProM platforms. Our experiments show that it is possible to considerably speed up discovery using instance selection strategies. Furthermore, results show that applying biased selection of the process instances compared to random sampling will result in simpler process models with higher quality.

Download Full-text