Advances in Classification of Sequence Data

In recent years, advanced information systems have enabled collection of increasingly large amounts of data that are sequential in nature. To analyze huge amounts of sequential data, the interdisciplinary field of Knowledge Discovery in Databases (KDD) is very useful. The most important step within the process of KDD is data mining, which is concerned with the extraction of the valid patterns. Recent research focus in data mining includes stream data mining, sequence data mining, web mining, text mining, visual mining, multimedia mining and multi-relational data mining. Sequence data may be discrete or continuous in nature. Most of the research on discrete sequence data concentrated on the discovery of frequently occurring patterns. However, comparatively less amount of work has been carried out in the area of discrete sequence data classification. In this chapter, data taxonomy is introduced with a review of the state of art for sequence data classification. The usefulness of embedding partial subsequence information extracted using sliding window technique into traditional classifier like kNN has been demonstrated. kNN has been tested with various vector based distance/similarity metrics. Further, with the use of S3M similarity metric, the full subsequence information embedded in the data sequences is extracted. The experimental data taken is DARPA’98 IDS benchmark dataset collected from UCIML dataset repository. The chapter closes by pointing out various application areas of sequence data and also the open issues in sequence data classification problem.

Download Full-text

Approaches for Pattern Discovery Using Sequential Data Mining

Data Mining ◽

10.4018/978-1-4666-2455-9.ch095 ◽

2013 ◽

pp. 1835-1851

Author(s):

Manish Gupta ◽

Jiawei Han

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Sequence Data ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Data ◽

Stream Data ◽

Dual Representation ◽

Advantages And Disadvantages ◽

Growth Methods

In this chapter we first introduce sequence data. We then discuss different approaches for mining of patterns from sequence data, studied in literature. Apriori based methods and the pattern growth methods are the earliest and the most influential methods for sequential pattern mining. There is also a vertical format based method which works on a dual representation of the sequence database. Work has also been done for mining patterns with constraints, mining closed patterns, mining patterns from multi-dimensional databases, mining closed repetitive gapped subsequences, and other forms of sequential pattern mining. Some works also focus on mining incremental patterns and mining from stream data. We present at least one method of each of these types and discuss their advantages and disadvantages. We conclude with a summary of the work.

Download Full-text

Approaches for Pattern Discovery Using Sequential Data Mining

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch008 ◽

2012 ◽

pp. 137-154 ◽

Cited By ~ 4

Author(s):

Manish Gupta ◽

Jiawei Han

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Sequence Data ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Data ◽

Stream Data ◽

Dual Representation ◽

Advantages And Disadvantages ◽

Growth Methods

Download Full-text

Comparative Study of Different Classification Algorithms for Stream Data Mining Using MOA

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.614616 ◽

2018 ◽

Vol 6 (11) ◽

pp. 614-616

Author(s):

Ashish P. Joshi ◽

Biraj V. Patel

Keyword(s):

Data Mining ◽

Comparative Study ◽

Classification Algorithms ◽

Stream Data ◽

Stream Data Mining

Download Full-text

A Clustering Algorithm in Stream Data Using Strong Coreset

Journal of Interconnection Networks ◽

10.1142/s0219265921430118 ◽

2021 ◽

Author(s):

Manmohan Singh ◽

Rajendra Pamula ◽

Alok Kumar

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Local Optimum ◽

Reduction Algorithm ◽

Stream Data ◽

Stream Data Mining ◽

Clustering Approach ◽

Approximation Guarantee ◽

Competitive Algorithms ◽

Learning Data

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.

Download Full-text

Analysis of Classification and Clustering based Novel Class Detection Techniques for Stream Data Mining

International Journal of Engineering Research and ◽

10.17577/ijertv4is100160 ◽

2015 ◽

Vol V4 (10) ◽

Author(s):

Kamini Tandel ◽

Jignasa N. Patel ◽

Keyword(s):

Data Mining ◽

Stream Data ◽

Detection Techniques ◽

Stream Data Mining ◽

Classification And Clustering

Download Full-text

Stream Data Mining

Encyclopedia of GIS ◽

10.1007/978-3-319-17885-1_101358 ◽

2017 ◽

pp. 2212-2212

Keyword(s):

Data Mining ◽

Stream Data ◽

Stream Data Mining

Download Full-text

Evolving clustering algorithm based on mixture of typicalities for stream data mining

Future Generation Computer Systems ◽

10.1016/j.future.2020.01.017 ◽

2020 ◽

Vol 106 ◽

pp. 672-684 ◽

Cited By ~ 3

Author(s):

José Maia ◽

Carlos Alberto Severiano ◽

Frederico Gadelha Guimarães ◽

Cristiano Leite de Castro ◽

André Paim Lemos ◽

...

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Stream Data ◽

Stream Data Mining

Download Full-text

SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining

2009 International Joint Conference on Neural Networks ◽

10.1109/ijcnn.2009.5178874 ◽

2009 ◽

Cited By ~ 44

Author(s):

Sheng Chen ◽

Haibo He

Keyword(s):

Data Mining ◽

Stream Data ◽

Stream Data Mining

Download Full-text

A New Similarity Metric for Sequential Data

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2010100102 ◽

2010 ◽

Vol 6 (4) ◽

pp. 16-32 ◽

Cited By ~ 11

Author(s):

Pradeep Kumar ◽

Bapi S. Raju ◽

P. Radha Krishna

Keyword(s):

Data Mining ◽

Similarity Measure ◽

Web Mining ◽

Clustering Algorithms ◽

Sequential Data ◽

Similarity Metric ◽

Benchmark Datasets ◽

Similarity Preserving ◽

Sequential Nature ◽

Classification And Clustering

In many data mining applications, both classification and clustering algorithms require a distance/similarity measure. The central problem in similarity based clustering/classification comprising sequential data is deciding an appropriate similarity metric. The existing metrics like Euclidean, Jaccard, Cosine, and so forth do not exploit the sequential nature of data explicitly. In this paper, the authors propose a similarity preserving function called Sequence and Set Similarity Measure (S3M) that captures both the order of occurrence of items in sequences and the constituent items of sequences. The authors demonstrate the usefulness of the proposed measure for classification and clustering tasks. Experiments were conducted on benchmark datasets, that is, DARPA’98 and msnbc, for classification task in intrusion detection and clustering task in web mining domains. Results show the usefulness of the proposed measure.

Download Full-text

Dynamic Data Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch112 ◽

2011 ◽

pp. 722-728 ◽

Cited By ~ 3

Author(s):

Richard Weber

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Credit Card ◽

Knowledge Discovery In Databases ◽

Stream Data ◽

Dynamic Data ◽

Change Over Time ◽

Professional Activities ◽

Feature Values ◽

Over Time

Since the First KDD Workshop back in 1989 when “Knowledge Mining” was recognized as one of the top 5 topics in future database research (Piatetsky-Shapiro 1991), many scientists as well as users in industry and public organizations have considered data mining as highly relevant for their respective professional activities. We have witnessed the development of advanced data mining techniques as well as the successful implementation of knowledge discovery systems in many companies and organizations worldwide. Most of these implementations are static in the sense that they do not contemplate explicitly a changing environment. However, since most analyzed phenomena change over time, the respective systems should be adapted to the new environment in order to provide useful and reliable analyses. If we consider for example a system for credit card fraud detection, we may want to segment our customers, process stream data generated by their transactions, and finally classify them according to their fraud probability where fraud pattern change over time. If our segmentation should group together homogeneous customers using not only their current feature values but also their trajectories, things get even more difficult since we have to cluster vectors of functions instead of vectors of real values. An example for such a trajectory could be the development of our customers’ number of transactions over the past six months or so if such a development tells us more about their behavior than just a single value; e.g., the most recent number of transactions. It is in this kind of applications is where dynamic data mining comes into play! Since data mining is just one step of the iterative KDD (Knowledge Discovery in Databases) process (Han & Kamber, 2001), dynamic elements should be considered also during the other steps. The entire process consists basically of activities that are performed before doing data mining (such as: selection, pre-processing, transformation of data (Famili et al., 1997)), the actual data mining part, and subsequent steps (such as: interpretation, evaluation of results). In subsequent sections we will present the background regarding dynamic data mining by studying existing methodological approaches as well as already performed applications and even patents and tools. Then we will provide the main focus of this chapter by presenting dynamic approaches for each step of the KDD process. Some methodological aspects regarding dynamic data mining will be presented in more detail. After envisioning future trends regarding dynamic data mining we will conclude this chapter.

Download Full-text