Mint: MDL-based approach for Mining INTeresting Numerical Pattern Sets

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00799-9 ◽

2021 ◽

Author(s):

Tatiana Makhalova ◽

Sergei O. Kuznetsov ◽

Amedeo Napoli

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Research Area ◽

Subgroup Discovery ◽

Mdl Principle

AbstractPattern mining is well established in data mining research, especially for mining binary datasets. Surprisingly, there is much less work about numerical pattern mining and this research area remains under-explored. In this paper we propose Mint, an efficient MDL-based algorithm for mining numerical datasets. The MDL principle is a robust and reliable framework widely used in pattern mining, and as well in subgroup discovery. In Mint we reuse MDL for discovering useful patterns and returning a set of non-redundant overlapping patterns with well-defined boundaries and covering meaningful groups of objects. Mint is not alone in the category of numerical pattern miners based on MDL. In the experiments presented in the paper we show that Mint outperforms competitors among which IPD, RealKrimp, and Slim.

Download Full-text

Deep active reinforcement learning for privacy preserve data mining in 5G environments

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219262 ◽

2021 ◽

pp. 1-8

Author(s):

Usman Ahmed ◽

Jerry Chun-Wei Lin ◽

Gautam Srivastava ◽

Hsing-Chung Chen

Keyword(s):

Data Mining ◽

Active Learning ◽

Private Information ◽

Pattern Mining ◽

Research Area ◽

High Dimensional ◽

Data Sets ◽

Sensitive Information ◽

Using Data ◽

Transactional Data

Finding frequent patterns identifies the most important patterns in data sets. Due to the huge and high-dimensional nature of transactional data, classical pattern mining techniques suffer from the limitations of dimensions and data annotations. Recently, data mining while preserving privacy is considered an important research area in recent decades. Information privacy is a tradeoff that must be considered when using data. Through many years, privacy-preserving data mining (PPDM) made use of methods that are mostly based on heuristics. The operation of deletion was used to hide the sensitive information in PPDM. In this study, we used deep active learning to hide sensitive operations and protect private information. This paper combines entropy-based active learning with an attention-based approach to effectively detect sensitive patterns. The constructed models are then validated using high-dimensional transactional data with attention-based and active learning methods in a reinforcement environment. The results show that the proposed model can support and improve the decision boundaries by increasing the number of training instances through the use of a pooling technique and an entropy uncertainty measure. The proposed paradigm can achieve cleanup by hiding sensitive items and avoiding non-sensitive items. The model outperforms greedy, genetic, and particle swarm optimization approaches.

Download Full-text

A qualitative survey on frequent subgraph mining

Open Computer Science ◽

10.1515/comp-2018-0018 ◽

2018 ◽

Vol 8 (1) ◽

pp. 194-209 ◽

Cited By ~ 1

Author(s):

Büsra Güvenoglu ◽

Belgin Ergenç Bostanoglu

Keyword(s):

Data Mining ◽

Graph Mining ◽

Research Area ◽

Heterogeneous Data ◽

Graph Representation ◽

Frequent Subgraph Mining ◽

Subgraph Mining ◽

Frequent Subgraph ◽

Input Type ◽

Frequent Subgraphs

AbstractData mining is a popular research area that has been studied by many researchers and focuses on finding unforeseen and important information in large databases. One of the popular data structures used to represent large heterogeneous data in the field of data mining is graphs. So, graph mining is one of the most popular subdivisions of data mining. Subgraphs that are more frequently encountered than the user-defined threshold in a database are called frequent subgraphs. Frequent subgraphs in a database can give important information about this database. Using this information, data can be classified, clustered and indexed. The purpose of this survey is to examine frequent subgraph mining algorithms (i) in terms of frequent subgraph discovery process phases such as candidate generation and frequency calculation, (ii) categorize the algorithms according to their general attributes such as input type, dynamicity of graphs, result type, algorithmic approach they are based on, algorithmic design and graph representation as well as (iii) to discuss the performance of algorithms in comparison to each other and the challenges faced by the algorithms recently.

Download Full-text

A study on sequential pattern mining on chemical information

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.33.14828 ◽

2018 ◽

Vol 7 (3.3) ◽

pp. 532

Author(s):

S Sathya ◽

N Rajendran

Keyword(s):

Data Mining ◽

Chemical Bonding ◽

Sequential Analysis ◽

Pattern Mining ◽

Fundamental Problem ◽

Research Work ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Graph Representation ◽

Chemical Information

Data mining (DM) is used for extracting the useful and non-trivial information from the large amount of data to collect in many and diverse fields. Data mining determines explanation through clustering visualization, association and sequential analysis. Chemical compounds are well-defined structures compressed by a graph representation. Chemical bonding is the association of atoms into molecules, ions, crystals and other stable species which frame the common substances in chemical information. However, large-scale sequential data is a fundamental problem like higher classification time and bonding time in data mining with many applications. In this work, chemical structured index bonding is used for sequential pattern mining. Our research work helps to evaluate the structural patterns of chemical bonding in chemical information data sets.

Download Full-text

The comparative study of text documents clustering algorithms

Environment Conservation Journal ◽

10.36953/ecj.2015.se1614 ◽

2015 ◽

Vol 16 (SE) ◽

pp. 133-138

Author(s):

Mohammad Eiman Jamnezhad ◽

Reza Fattahi

Keyword(s):

Data Mining ◽

Dna Analysis ◽

Clustering Algorithms ◽

Research Area ◽

Large Set ◽

Text Documents ◽

Web Documents ◽

Significant Research ◽

The Comparative Study ◽

F Measure

Clustering is one of the most significant research area in the field of data mining and considered as an important tool in the fast developing information explosion era.Clustering systems are used more and more often in text mining, especially in analyzing texts and to extracting knowledge they contain. Data are grouped into clusters in such a way that the data of the same group are similar and those in other groups are dissimilar. It aims to minimizing intra-class similarity and maximizing inter-class dissimilarity. Clustering is useful to obtain interesting patterns and structures from a large set of data. It can be applied in many areas, namely, DNA analysis, marketing studies, web documents, and classification. This paper aims to study and compare three text documents clustering, namely, k-means, k-medoids, and SOM through F-measure.

Download Full-text

Subgroup Discovery for Election Analysis: A Case Study in Descriptive Data Mining

Discovery Science - Lecture Notes in Computer Science ◽

10.1007/978-3-642-16184-1_5 ◽

2010 ◽

pp. 57-71 ◽

Cited By ~ 12

Author(s):

Henrik Grosskreutz ◽

Mario Boley ◽

Maike Krause-Traudes

Keyword(s):

Data Mining ◽

Subgroup Discovery ◽

Descriptive Data

Download Full-text

Comparison of Linguistic Summaries and Fuzzy Functional Dependencies Related to Data Mining

Advances in Data Mining and Database Management - Biologically-Inspired Techniques for Knowledge Discovery and Data Mining ◽

10.4018/978-1-4666-6078-6.ch008 ◽

2014 ◽

pp. 174-203 ◽

Cited By ~ 4

Author(s):

Miroslav Hudec ◽

Miljan Vučetić ◽

Mirko Vujošević

Keyword(s):

Data Mining ◽

Fuzzy Logic ◽

Relational Databases ◽

Missing Values ◽

Expert Knowledge ◽

Real Data ◽

Research Area ◽

Functional Dependencies ◽

Useful Knowledge ◽

Important Research Area

Data mining methods based on fuzzy logic have been developed recently and have become an increasingly important research area. In this chapter, the authors examine possibilities for discovering potentially useful knowledge from relational database by integrating fuzzy functional dependencies and linguistic summaries. Both methods use fuzzy logic tools for data analysis, acquiring, and representation of expert knowledge. Fuzzy functional dependencies could detect whether dependency between two examined attributes in the whole database exists. If dependency exists only between parts of examined attributes' domains, fuzzy functional dependencies cannot detect its characters. Linguistic summaries are a convenient method for revealing this kind of dependency. Using fuzzy functional dependencies and linguistic summaries in a complementary way could mine valuable information from relational databases. Mining intensities of dependencies between database attributes could support decision making, reduce the number of attributes in databases, and estimate missing values. The proposed approach is evaluated with case studies using real data from the official statistics. Strengths and weaknesses of the described methods are discussed. At the end of the chapter, topics for further research activities are outlined.

Download Full-text

Educational Data Mining

Engineering Education Trends in the Digital Era - Advances in Higher Education and Professional Development ◽

10.4018/978-1-7998-2562-3.ch004 ◽

2020 ◽

pp. 70-82

Author(s):

Aslıhan Tüfekci ◽

Esra Ayça Güzeldereli Yılmaz

Keyword(s):

Data Mining ◽

Learning Styles ◽

Educational Data Mining ◽

Research Area ◽

Point Of View ◽

Primary Sources ◽

Systematic Mapping Study ◽

Input Output ◽

Learning Practices ◽

Process Elements

The education-training process and all activities related to it have the power to direct the future of societies. From this point of view, the process should be analyzed frequently in terms of input, output, and other process elements. Educational data mining is a multidisciplinary research area that develops methods and techniques for discovering data derived from various information systems used in education. It contributes to the understanding of the learning styles of learners and enables data-driven decision making to develop existing learning practices and learning materials. The number of academic and technical research on educational data mining is on the rise, and this has led to the need to systematically categorize the existing practices. This systematic mapping study was conducted to provide an overview of the current work on educational data mining and its results are based on 153 primary sources including journal papers, articles published in magazines, conference and symposium papers, theses, and others.

Download Full-text

Applications of Pattern Discovery Using Sequential Data Mining

Pattern Discovery Using Sequence Data Mining ◽

10.4018/978-1-61350-056-9.ch001 ◽

2012 ◽

pp. 1-23 ◽

Cited By ~ 8

Author(s):

Manish Gupta ◽

Jiawei Han

Keyword(s):

Data Mining ◽

Text Mining ◽

Intrusion Detection ◽

Pattern Mining ◽

Pattern Discovery ◽

Sequential Pattern Mining ◽

Web Usage Mining ◽

Sequential Pattern ◽

Sequential Data ◽

Mining Methods

Sequential pattern mining methods have been found to be applicable in a large number of domains. Sequential data is omnipresent. Sequential pattern mining methods have been used to analyze this data and identify patterns. Such patterns have been used to implement efficient systems that can recommend based on previously observed patterns, help in making predictions, improve usability of systems, detect events, and in general help in making strategic product decisions. In this chapter, we discuss the applications of sequential data mining in a variety of domains like healthcare, education, Web usage mining, text mining, bioinformatics, telecommunications, intrusion detection, et cetera. We conclude with a summary of the work.

Download Full-text

Data Mining

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch147 ◽

2011 ◽

pp. 921-926

Author(s):

Sherry Y. Chen ◽

Xiaohui Liu

Keyword(s):

Data Mining ◽

Electronic Commerce ◽

New Technologies ◽

Research Area ◽

Future Research ◽

Data Mining Techniques ◽

Wide Range ◽

Degree Of Confidence ◽

Common Application ◽

Existing Data

There is an explosion in the amount of data that organizations generate, collect, and store. Organizations are gradually relying more on new technologies to access, analyze, summarize, and interpret information intelligently. Data mining, therefore, has become a research area with increased importance (Amaratunga & Cabrera, 2004). Data mining is the search for valuable information in large volumes of data (Hand, Mannila, & Smyth, 2001). It can discover hidden relationships, patterns, and interdependencies and generate rules to predict the correlations, which can help the organizations make critical decisions faster or with a greater degree of confidence (Gargano & Ragged, 1999). There is a wide range of data mining techniques, which has been successfully used in many applications. This article is an attempt to provide an overview of existing data mining applications. The article begins by explaining the key tasks that data mining can achieve. It then moves to discuss applications domains that data mining can support. The article identifies three common application domains, including bioinformatics, electronic commerce, and search engines. For each domain, how data mining can enhance the functions will be described. Subsequently, the limitations of current research will be addressed, followed by a discussion of directions for future research.

Download Full-text

Clustering of Time Series Data

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch042 ◽

2011 ◽

pp. 258-263

Author(s):

Anne Denton

Keyword(s):

Data Mining ◽

Time Series ◽

Pattern Mining ◽

Time Series Data ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Series Data ◽

Science And Engineering ◽

Data Mining Algorithms ◽

Mining Algorithms

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.

Download Full-text