Introduction of Item Constraints to Discover Characteristic Sequential Patterns

Emerging Perspectives in Big Data Warehousing - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-5516-2.ch011 ◽

2019 ◽

pp. 279-292

Author(s):

Shigeaki Sakurai

Keyword(s):

Background Knowledge ◽

Structured Data ◽

Sequential Patterns ◽

Sequential Data ◽

Special Case ◽

Attribute Value

This chapter introduces a method that discovers characteristic sequential patterns from sequential data based on background knowledge. The sequential data is composed of rows of items. This chapter focuses on the sequential data based on the tabular structured data. That is, each item is composed of an attribute and an attribute value. Also, this chapter focuses on item constraints in order to describe the background knowledge. The constraints describe the combination of items included in sequential patterns. They can represent the interests of analysts. Therefore, they can easily discover sequential patterns coinciding to the interests of the analysts as characteristic sequential patterns. In addition, this chapter focuses on the special case of the item constraints. It is constrained at the last item of the sequential patterns. The discovered patterns are used to the analysis of cause, and reason and can predict the last item in the case that the sub-sequence is given. This chapter introduces the property of the item constraints for the last item.

Download Full-text

Discovery of Characteristic Sequential Patterns Based on Two Types of Constraints

International Journal of Extreme Automation and Connectivity in Healthcare ◽

10.4018/ijeach.2019010105 ◽

2019 ◽

Vol 1 (1) ◽

pp. 40-54

Author(s):

Shigeaki Sakurai

Keyword(s):

Pattern Discovery ◽

Evaluation Criteria ◽

Background Knowledge ◽

Structured Data ◽

Sequential Patterns ◽

Time Constraints ◽

Sequential Data ◽

Healthcare Data ◽

Discovery Method ◽

Attribute Value

This article proposes a method for discovering characteristic sequential patterns from sequential data by using background knowledge. In the case of the tabular structured data, each item is composed of an attribute and an attribute value. This article focuses on two types of constraints describing background knowledge. The first one is time constraints. It can flexibly describe relationships related to the time between items. The second one is item constraints, it can select items included in sequential patterns. These constraints can represent the background knowledge representing the interests of analysts. Therefore, they can easily discover sequential patterns coinciding the interests as characteristic sequential patterns. Lastly, this article verifies the effect of the pattern discovery method based on both the evaluation criteria of sequential patterns and the background knowledge. The method can be applied to the analysis of the healthcare data.

Download Full-text

Semi-Supervised Classification via Hypergraph Convolutional Extreme Learning Machine

Applied Sciences ◽

10.3390/app11093867 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3867

Author(s):

Zhewei Liu ◽

Zijia Zhang ◽

Yaoming Cai ◽

Yilin Miao ◽

Zhikun Chen

Keyword(s):

Extreme Learning Machine ◽

Supervised Classification ◽

Structured Data ◽

Model Complex ◽

Euclidean Domain ◽

Noise Data ◽

Data Points ◽

Learning Machine ◽

Random Hypergraph ◽

Special Case

Extreme Learning Machine (ELM) is characterized by simplicity, generalization ability, and computational efficiency. However, previous ELMs fail to consider the inherent high-order relationship among data points, resulting in being powerless on structured data and poor robustness on noise data. This paper presents a novel semi-supervised ELM, termed Hypergraph Convolutional ELM (HGCELM), based on using hypergraph convolution to extend ELM into the non-Euclidean domain. The method inherits all the advantages from ELM, and consists of a random hypergraph convolutional layer followed by a hypergraph convolutional regression layer, enabling it to model complex intraclass variations. We show that the traditional ELM is a special case of the HGCELM model in the regular Euclidean domain. Extensive experimental results show that HGCELM remarkably outperforms eight competitive methods on 26 classification benchmarks.

Download Full-text

An Abstract System for Converting and Recovering Texts like Structured Information

10.21203/rs.3.rs-906361/v1 ◽

2021 ◽

Author(s):

Edgardo Samuel Barraza Verdesoto ◽

Richard de Jesus Gil Herrera ◽

Marlly Yaneth Rojas Ortiz

Keyword(s):

Structured Data ◽

Spanish Language ◽

Scientific Models ◽

Algebraic Structures ◽

Abstract System ◽

Language Generation ◽

Web Contents ◽

Structured Information ◽

The Brain ◽

Special Case

Abstract This paper introduces an abstract system for converting texts into structured information. The proposed architecture incorporates several strategies based on scientific models of how the brain records and recovers memories, and approaches that convert texts into structured data. The applications of this proposal are vast because, in general, the information that can be expressed like a text way, such as reports, emails, web contents, etc., is considered unstructured and, hence, the repositories based on a SQL do not capable to deal efficiently with this kind of data. The model in which was based on this proposal divides a sentence into clusters of words which in turn are transformed into members of a taxonomy of algebraic structures. The algebraic structures must comply properties of Abelian groups. Methodologically, an incremental prototyping approach has been applied to develop a satisfactory architecture that can be adapted to any language. A special case is studied, this deals with the Spanish language. The developed abstract system is a framework that permits to implements applications that convert unstructured textual information to structured information, this can be useful in contexts such as Natural Language Generation, Data Mining, dynamically generation of theories, among others.

Download Full-text

Data Pattern Tutor for AprioriAll and PrefixSpan

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch083 ◽

2011 ◽

pp. 531-537

Author(s):

Mohammed Alshalalfa

Keyword(s):

Data Mining ◽

Computer Literacy ◽

User Study ◽

Sequential Patterns ◽

Sequential Data ◽

Educational Value ◽

Data Mining Algorithms ◽

New Meanings ◽

Mining Works ◽

Mining Algorithms

Data mining can be described as data processing using sophisticated data search capabilities and statistical algorithms to discover patterns and correlations in large pre-existing databases (Agrawal & Srikant 1995; Zhao & Sourav 2003). From these patterns, new and important information can be obtained that will lead to the discovery of new meanings which can then be translated into enhancements in many current fields. In this paper, we focus on the usability of sequential data mining algorithms. Based on a conducted user study, many of these algorithms are difficult to comprehend. Our goal is to make an interface that acts as a “tutor” to help the users understand better how data mining works. We consider two of the algorithms more commonly used by our students for discovering sequential patterns, namely the AprioriAll and the PrefixSpan algorithms. We hope to generate some educational value, such that the tool could be used as a teaching aid for comprehending data mining algorithms. We concentrated our effort to develop the user interface to be easy to use by naïve end users with minimum computer literacy; the interface is intended to be used by beginners. This will help in having a wider audience and users for the developed tool.

Download Full-text

Mining of Sequential Patterns using Directed Graphs

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2242.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4002-4007

Keyword(s):

Pattern Mining ◽

Directed Graphs ◽

Real Life ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequential Data ◽

Sequence Database ◽

Directed Paths ◽

Digraph Model

Sequential pattern mining is one of the important functionalities of data mining. It is used for analyzing sequential database and discovers sequential patterns. It is focused for extracting interesting subsequences from a set of sequences. Various factors such as rate of occurrence, length, and profit are used to define the interestingness of subsequence derived from the sequence database. Sequential pattern mining has abundant real-life applications since sequential data is logically programmed as sequences of cipher in many fields such as bioinformatics, e-learning, market basket analysis, texts, and webpage click-stream analysis. A large diversity of competent algorithms such as Prefixspan, GSP and Freespan have been proposed during the past few years. In this paper we propose a data model for organizing the sequential database, which consists of a directed graph DGS (cycles and several edges are allowed) and an organization of directed paths in DGS to represent a sequential data for discovering sequential pattern3 from a sequence database. Competent algorithms for constructing the digraph model (DGS) for extracting all sequential patterns and mining association rules are proposed. A number of theoretical parameters of digraph model are also introduced, which lead to more understanding of the problem.

Download Full-text

A Review of Kernel Methods Based Approaches to Classification and Clustering of Sequential Patterns, Part I

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch002 ◽

2012 ◽

pp. 24-50

Author(s):

Dileep A. D. ◽

Veena T. ◽

C. Chandra Sekhar

Keyword(s):

Pattern Classification ◽

Kernel Methods ◽

Pattern Analysis ◽

Kernel Functions ◽

Sequential Pattern ◽

Sequential Patterns ◽

Sequential Data ◽

Feature Vectors ◽

Varying Length ◽

Classification And Clustering

Sequential data mining involves analysis of sequential patterns of varying length. Sequential pattern analysis is important for pattern discovery from sequences of discrete symbols as in bioinformatics and text analysis, and from sequences or sets of continuous valued feature vectors as in processing of audio, speech, music, image, and video data. Pattern analysis techniques using kernel methods have been explored for static patterns as well as sequential patterns. The main issue in sequential pattern analysis using kernel methods is the design of a suitable kernel for sequential patterns of varying length. Kernel functions designed for sequential patterns are known as dynamic kernels. In this chapter, we present a brief description of kernel methods for pattern classification and clustering. Then we describe dynamic kernels for sequences of continuous feature vectors. We then present a review of approaches to sequential pattern classification and clustering using dynamic kernels.

Download Full-text

Sequential Pattern Mining from Sequential Data

Handbook of Research on Innovations in Database Technologies and Applications ◽

10.4018/978-1-60566-242-8.ch067 ◽

2009 ◽

pp. 622-631

Author(s):

Shigeaki Sakurai

Keyword(s):

Pattern Mining ◽

Pattern Discovery ◽

Sequential Pattern ◽

The Other ◽

Sequential Patterns ◽

Sequential Data ◽

Frequent Patterns ◽

New Knowledge ◽

Discovery Method ◽

Time Information

Owing to the progress of computer and network environments, it is easy to collect data with time information such as daily business reports, weblog data, and physiological information. This is the context in which methods of analyzing data with time information have been studied. This chapter focuses on a sequential pattern discovery method from discrete sequential data. The methods proposed by Pei et al. (2001), Srikant & Agrawal (1996), and Zaki (2001) efficiently discover the frequent patterns as characteristic patterns. However, the discovered patterns do not always correspond to the interests of analysts, because the patterns are common and are not a source of new knowledge for the analysts. The problem has been pointed out in connection with the discovery of associative rules. Blanchard et al. (2005), Brin et al. (1997), Silberschatz et al. (1996), and Suzuki et al. (2005) propose other criteria in order to discover other kinds of characteristic patterns. The patterns discovered by the criteria are not always frequent but are characteristic of viewpoints. The criteria may be applicable to discovery methods of sequential patterns. However, these criteria do not satisfy the Apriori property. It is difficult for the methods based on the criteria to efficiently discover the patterns. On the other hand, methods that use the background knowledge of analysts have been proposed in order to discover sequential patterns corresponding to the interests of analysts (Garofalakis et al., 1999; Pei et al., 2002; Sakurai et al., 2008b; Yen, 2005).

Download Full-text

Discovery of various sequential patterns within top-k from sequential data

2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS) ◽

10.1109/scis-isis.2014.7044644 ◽

2014 ◽

Cited By ~ 2

Author(s):

Shigeaki Sakurai ◽

Minoru Nishizawa

Keyword(s):

Sequential Patterns ◽

Sequential Data

Download Full-text

High-throughput Phenotyping with Temporal Sequences

10.1101/590307 ◽

2019 ◽

Author(s):

Hossein Estiri ◽

Zachary H Strasser ◽

Shawn N. Murphy

Keyword(s):

High Throughput ◽

Classification Performance ◽

Sequential Patterns ◽

Sequential Data ◽

Discrete Events ◽

Hybrid Classes ◽

High Throughput Phenotyping ◽

Electronic Health ◽

Using Data ◽

Temporal Sequences

ABSTRACTObjectiveHigh-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs are often underutilized in developing computational phenotypic definitions. The objective of this study is to develop a high-throughput phenotyping method, leveraging temporal sequential patterns of discrete events from electronic health records.Materials and MethodsWe develop a representation mining algorithm to extract five classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (AVR), the traditional immediate sequential patterns (SPM), the transitive sequential patterns (tSPM), as well as two hybrid classes of SPM+AVR and tSPM+AVR. A final small set of representations were selected from each class using the MSMR dimensionality reduction algorithm. Using EHR data on 10 phenotypes from Mass General Brigham Biobank, we trained regularized logistic regression algorithms, which we validated using labeled data.ResultsPhenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the AVR representations that are conventionally used in electronic phenotyping. Although this study only utilizes the diagnosis and medication records, the high-throughput algorithm’s classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations.DiscussionThe proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. A transitive sequence can offer a more accurate characterization of the phenotype, compared with its individual components. Additionally, the identified transitive sequences of a given phenotype reflect the actual lived experiences of the patients with that particular disease.ConclusionSequential data representations provide a precise mechanism for incorporating raw EHR records into downstream Machine Learning.

Download Full-text