Cloud for Distributed Data Analysis Based on the Actor Model

This chapter aims to give a comprehensive view about the links between fuzzy logic and data mining. It will be shown that knowledge extracted from simple data sets or huge databases can be represented by fuzzy rule-based expert systems. It is highlighted that both model performance and interpretability of the mined fuzzy models are of major importance, and effort is required to keep the resulting rule bases small and comprehensible. Therefore, in the previous years, soft computing based data mining algorithms have been developed for feature selection, feature extraction, model optimization, and model reduction (rule based simplification). Application of these techniques is illustrated using the wine data classification problem. The results illustrate that fuzzy tools can be applied in a synergistic manner through the nine steps of knowledge discovery.

Download Full-text

Efficient Tree Based Distributed Data Mining Algorithms for mining Frequent Patterns

International Journal of Computer Applications ◽

10.5120/1447-1957 ◽

2010 ◽

Vol 10 (1) ◽

pp. 11-16

Author(s):

T. SathishKumar ◽

V. Kavitha ◽

Dr.T. Ravichandran

Keyword(s):

Data Mining ◽

Distributed Data Mining ◽

Distributed Data ◽

Frequent Patterns ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

A Comparison Study of Data Mining Algorithms for blood Cancer Prediction

passer ◽

10.24271/psr.29 ◽

2019 ◽

Vol 3 (1) ◽

pp. 174-179

Author(s):

Noor Bahjat ◽

Snwr Jamak

Keyword(s):

Data Mining ◽

Machine Learning Algorithms ◽

Common Disease ◽

Data Sets ◽

Blood Cancer ◽

Cancer Data ◽

Data Mining Algorithms ◽

Detection And Diagnosis ◽

Mining Algorithms ◽

Early Detection And Diagnosis

Cancer is a common disease that threats the life of one of every three people. This dangerous disease urgently requires early detection and diagnosis. The recent progress in data mining methods, such as classification, has proven the need for machine learning algorithms to apply to large datasets. This paper mainly aims to utilise data mining techniques to classify cancer data sets into blood cancer and non-blood cancer based on pre-defined information and post-defined information obtained after blood tests and CT scan tests. This research conducted using the WEKA data mining tool with 10-fold cross-validation to evaluate and compare different classification algorithms, extract meaningful information from the dataset and accurately identify the most suitable and predictive model. This paper depicted that the most suitable classifier with the best ability to predict the cancerous dataset is Multilayer perceptron with an accuracy of 99.3967%.

Download Full-text

A Survey on Evolutionary Instance Selection and Generation

International Journal of Applied Metaheuristic Computing ◽

10.4018/jamc.2010102604 ◽

2010 ◽

Vol 1 (1) ◽

pp. 60-92 ◽

Cited By ~ 38

Author(s):

Joaquín Derrac ◽

Salvador García ◽

Francisco Herrera

Keyword(s):

Data Mining ◽

Evolutionary Algorithms ◽

Nearest Neighbor ◽

Instance Selection ◽

Data Sets ◽

Generation Process ◽

Data Mining Algorithms ◽

Instance Generation ◽

Nearest Neighbor Rule ◽

Mining Algorithms

The use of Evolutionary Algorithms to perform data reduction tasks has become an effective approach to improve the performance of data mining algorithms. Many proposals in the literature have shown that Evolutionary Algorithms obtain excellent results in their application as Instance Selection and Instance Generation procedures. The purpose of this paper is to present a survey on the application of Evolutionary Algorithms to Instance Selection and Generation process. It will cover approaches applied to the enhancement of the nearest neighbor rule, as well as other approaches focused on the improvement of the models extracted by some well-known data mining algorithms. Furthermore, some proposals developed to tackle two emerging problems in data mining, Scaling Up and Imbalance Data Sets, also are reviewed.

Download Full-text

Decomposition of Data Mining Algorithms into Unified Functional Blocks

Mathematical Problems in Engineering ◽

10.1155/2016/8197349 ◽

2016 ◽

Vol 2016 ◽

pp. 1-11 ◽

Cited By ~ 4

Author(s):

Ivan Kholod ◽

Mikhail Kupriyanov ◽

Andrey Shorov

Keyword(s):

Data Mining ◽

Data Mining Algorithms ◽

Functional Blocks ◽

Mining Algorithms

The present paper describes the method of creating data mining algorithms from unified functional blocks. This method splits algorithms into independently functioning blocks. These blocks must have unified interfaces and implement pure functions. The method allows us to create new data mining algorithms from existing blocks and improves the existing algorithms by optimizing single blocks or the whole structure of the algorithms. This becomes possible due to a number of important properties inherent in pure functions and hence functional blocks.

Download Full-text

Data mining algorithms to compute mixed concepts with negative attributes: an application to breast cancer data analysis

Mathematical Methods in the Applied Sciences ◽

10.1002/mma.3814 ◽

2016 ◽

Vol 39 (16) ◽

pp. 4829-4845 ◽

Cited By ~ 12

Author(s):

Jose Manuel Rodríguez-Jiménez ◽

Pablo Cordero ◽

Manuel Enciso ◽

Angel Mora

Keyword(s):

Breast Cancer ◽

Data Mining ◽

Data Analysis ◽

Breast Cancer Data ◽

Cancer Data ◽

Data Mining Algorithms ◽

Mining Algorithms ◽

Negative Attributes

Download Full-text

Trajectory Pattern Mining: Methods and Applications

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.490-491.1361 ◽

2014 ◽

Vol 490-491 ◽

pp. 1361-1367

Author(s):

Xin Huang ◽

Hui Juan Chen ◽

Mao Gong Zheng ◽

Ping Liu ◽

Jing Qian

Keyword(s):

Data Mining ◽

Social Media ◽

Pattern Mining ◽

Data Sets ◽

Trajectory Data ◽

Data Mining Algorithms ◽

Mining Methods ◽

Trajectory Pattern ◽

Mining Algorithms ◽

Trajectory Pattern Mining

With the advent of location-based social media and locationacquisition technologies, trajectory data are becoming more and more ubiquitous in the real world. A lot of data mining algorithms have been successfully applied to trajectory data sets. Trajectory pattern mining has received a lot of attention in recent years. In this paper, we review the most inuential methods as well as typical applications within the context of trajectory pattern mining.

Download Full-text

The Research of High Efficient Data Mining Algorithms for Massive Data Sets

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.3901 ◽

2014 ◽

Vol 556-562 ◽

pp. 3901-3904

Author(s):

Cui Xia Tao

Keyword(s):

Data Mining ◽

Data Sets ◽

Modern Information Technology ◽

Incremental Processing ◽

Data Mining Algorithms ◽

High Efficient ◽

Data Volume ◽

Efficient Data ◽

Valid Solution ◽

Mining Algorithms

Data mining means to extract information and knowledge that potentially useful while still unknown in advance, from a large quantity of implicit incomplete, random data. With the quick advancement of modern information technology, people are accumulating data volume on the increase sharply, often at the speed of TB. How to extract meaningful information from large amounts of data has become a big problem must be tackled. In view of the huge amounts of data mining, distributed parallel processing and incremental processing is valid solution.

Download Full-text

Workload Optimization by Horizontal Aggregation in SQL for Data Mining Analysis

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217263 ◽

2021 ◽

pp. 304-309

Author(s):

Prasanna M. Rathod ◽

Prof. Dr. Anjali B. Raut

Keyword(s):

Data Mining ◽

Relational Algebra ◽

Data Migration ◽

Data Sets ◽

Data Set ◽

Application Performance ◽

Data Mining Algorithms ◽

Mining Project ◽

Pivot Methods ◽

Mining Algorithms

Preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have limitations to prepare data sets because they return one column per aggregated group. In general, a significant manual effort is required to build data sets, where a horizontal layout is required. We propose simple, yet powerful, methods to generate SQL code to return aggregated columns in a horizontal tabular layout, returning a set of numbers instead of one number per row. This new class of functions is called horizontal aggregations. Horizontal aggregations build data sets with a horizontal denormalized layout (e.g., point-dimension, observation variable, instance-feature), which is the standard layout required by most data mining algorithms. We propose three fundamental methods to evaluate horizontal aggregations: ? CASE: Exploiting the programming CASE construct; ? SPJ: Based on standard relational algebra operators (SPJ queries); ? PIVOT: Using the PIVOT operator, which is offered by some DBMSs. Experiments with large tables compare the proposed query evaluation methods. Our CASE method has similar speed to the PIVOT operator and it is much faster than the SPJ method. In general, the CASE and PIVOT methods exhibit linear scalability, whereas the SPJ method does not. For query optimization the distance computation and nearest cluster in the k-means are based on SQL. Workload balancing is the assignment of work to processors in a way that maximizes application performance. The process of load balancing can be generalized into four basic steps: 1. Monitoring processor load and state; 2. Exchanging workload and state information between processors; 3. Decision making; 4. Data migration. The decision phase is triggered when the load imbalance is detected to calculate optimal data redistribution. In the fourth and last phase, data migrates from overloaded processors to under-loaded ones.

Download Full-text

A Survey on Evolutionary Instance Selection and Generation

Modeling, Analysis, and Applications in Metaheuristic Computing ◽

10.4018/978-1-4666-0270-0.ch014 ◽

2012 ◽

pp. 233-266 ◽

Cited By ~ 1

Author(s):

Joaquín Derrac ◽

Salvador García ◽

Francisco Herrera

Keyword(s):

Data Mining ◽

Evolutionary Algorithms ◽

Nearest Neighbor ◽

Instance Selection ◽

Data Sets ◽

Generation Process ◽

Data Mining Algorithms ◽

Instance Generation ◽

Nearest Neighbor Rule ◽

Mining Algorithms

The use of Evolutionary Algorithms to perform data reduction tasks has become an effective approach to improve the performance of data mining algorithms. Many proposals in the literature have shown that Evolutionary Algorithms obtain excellent results in their application as Instance Selection and Instance Generation procedures. The purpose of this paper is to present a survey on the application of Evolutionary Algorithms to Instance Selection and Generation process. It will cover approaches applied to the enhancement of the nearest neighbor rule, as well as other approaches focused on the improvement of the models extracted by some well-known data mining algorithms. Furthermore, some proposals developed to tackle two emerging problems in data mining, Scaling Up and Imbalance Data Sets, also are reviewed.

Download Full-text