Genetic Programming for Automatically Constructing Data Mining Algorithms

At present there is a wide range of data mining algorithms available to researchers and practitioners (Witten & Frank, 2005; Tan et al., 2006). Despite the great diversity of these algorithms, virtually all of them share one feature: they have been manually designed. As a result, current data mining algorithms in general incorporate human biases and preconceptions in their designs. This article proposes an alternative approach to the design of data mining algorithms, namely the automatic creation of data mining algorithms by means of Genetic Programming (GP) (Pappa & Freitas, 2006). In essence, GP is a type of Evolutionary Algorithm – i.e., a search algorithm inspired by the Darwinian process of natural selection – that evolves computer programs or executable structures. This approach opens new avenues for research, providing the means to design novel data mining algorithms that are less limited by human biases and preconceptions, and so offer the potential to discover new kinds of patterns (or knowledge) to the user. It also offers an interesting opportunity for the automatic creation of data mining algorithms tailored to the data being mined.

Download Full-text

Applying data mining algorithms to real estate appraisals: a comparative study

International Journal of Housing Markets and Analysis ◽

10.1108/ijhma-07-2020-0080 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Thiago Cesar de Oliveira ◽

Lúcio de Medeiros ◽

Daniel Henrique Marco Detzel

Keyword(s):

Data Mining ◽

Real Estate ◽

Support Vector ◽

Predictive Capacity ◽

Content Type ◽

Data Mining Algorithms ◽

Wide Range ◽

Very Large Databases ◽

Mining Algorithms ◽

Statistical Results

Purpose Real estate appraisals are becoming an increasingly important means of backing up financial operations based on the values of these kinds of assets. However, in very large databases, there is a reduction in the predictive capacity when traditional methods, such as multiple linear regression (MLR), are used. This paper aims to determine whether in these cases the application of data mining algorithms can achieve superior statistical results. First, real estate appraisal databases from five towns and cities in the State of Paraná, Brazil, were obtained from Caixa Econômica Federal bank. Design/methodology/approach After initial validations, additional databases were generated with both real, transformed and nominal values, in clean and raw data. Each was assisted by the application of a wide range of data mining algorithms (multilayer perceptron, support vector regression, K-star, M5Rules and random forest), either isolated or combined (regression by discretization – logistic, bagging and stacking), with the use of 10-fold cross-validation in Weka software. Findings The results showed more varied incremental statistical results with the use of algorithms than those obtained by MLR, especially when combined algorithms were used. The largest increments were obtained in databases with a large amount of data and in those where minor initial data cleaning was carried out. The paper also conducts a further analysis, including an algorithmic ranking based on the number of significant results obtained. Originality/value The authors did not find similar studies or research studies conducted in Brazil.

Download Full-text

Incremental Discovery of Fuzzy Functional Dependencies

Handbook of Research on Fuzzy Information Processing in Databases ◽

10.4018/978-1-59904-853-6.ch024 ◽

2011 ◽

pp. 615-633 ◽

Cited By ~ 1

Author(s):

Shyue-Liang Wang ◽

Ju-Wen Shen ◽

Tuzng-Pei Hong

Keyword(s):

Data Mining ◽

Relational Databases ◽

Search Algorithm ◽

Current Data ◽

Research Interest ◽

Functional Dependencies ◽

Incremental Search ◽

Data Mining Techniques ◽

Analysis Technique ◽

Mining Algorithms

Mining functional dependencies (FDs) from databases has been identified as an important database analysis technique. It has received considerable research interest in recent years. However, most current data mining techniques for determining functional dependencies deal only with crisp databases. Although various forms of fuzzy functional dependencies (FFDs) have been proposed for fuzzy databases, they emphasized conceptual viewpoints and only a few mining algorithms are given. In this research, we propose methods to validate and incrementally search for FFDs from similarity-based fuzzy relational databases. For a given pair of attributes, the validation of FFDs is based on fuzzy projection and fuzzy selection operations. In addition, the property that FFDs are monotonic in the sense that r1 ? r2 implies FDa(r1) ? FDa(r2) is shown. An incremental search algorithm for FFDs based on this property is then presented. Experimental results showing the behavior of the search algorithm are discussed.

Download Full-text

The Development of Data Mining System Based on Cloud Computing

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.2280 ◽

2014 ◽

Vol 926-930 ◽

pp. 2280-2283

Author(s):

Qiong Ren

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Current Data ◽

Mining System ◽

Computing Platform ◽

Data Mining Algorithms ◽

Cloud Computing Platform ◽

Data Mining System ◽

Process Cost ◽

Mining Algorithms

With the increasing of input data size, process cost will be very long, for the explosive growth of the Internet data even reached the point of single machine can handle. This article mainly introduces the architecture of the concept of cloud computing and, the mainstream of the analysis of the current data mining algorithms, based on cloud computing to develop the data mining system, providing the operation feasibility of data mining in cloud computing platform, having strong guiding significance.

Download Full-text

PSO Optimized Nearest Neighbor Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b3574.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1508-1513

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Heuristic Algorithms ◽

Optimization Methods ◽

Classification Model ◽

Natural Phenomenon ◽

Wide Applicability ◽

Data Mining Algorithms ◽

Wide Range ◽

Mining Algorithms

Data mining can be considered to be an important aspects of information industry. Data mining has found a wide applicability in almost every field which deals with data. Out of the various techniques employed for data mining, Classification is a very commonly used tool for knowledge discovery. Various alternatives methods are available which can be used to create a classification model, out of which the most common and apprehensible one is KNN. In spite of KNN having a number of shortcomings and limitations in it, these can be overcome by with the help of alterations which can be made to the basic KNN algorithm. Due to its wide applicability, kNN has been the focus of extensive research and as a result, many alternatives have been performed with wide range of success in performance improvement. A major hardship being faced by the data mining applications is the large number of dimensions which render most of the data mining algorithms inefficient. The problem can be solved to some extent by using dimensionality reduction methods like PCA. Further improvements in the efficiency of the classification based mining algorithms can be achieved by using optimization methods. Meta-heuristic algorithms inspired by natural phenomenon like particle swarm optimization can be used very effectively for the purpose.

Download Full-text

Data Mining Algorithms on Prediction of Cardiovascular Diseases

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c6887.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 4846-4853

Keyword(s):

Data Mining ◽

Unstructured Data ◽

Healthcare Sector ◽

Market Segment ◽

Data Generation ◽

Data Mining Algorithms ◽

Wide Range ◽

Enormous Amount ◽

Mining Algorithms

In the age of data generation known as Big Data, where data is produced in enormous amount, managing it has become a big challenge and along with this drawing information from the gathered data is equally important and challenging. Inferring relationships and predicting patterns from theses structured and unstructured data is now an area of research for researchers. And the data mining techniques have evolved as a tool for generating results and deducing conclusions. These mining algorithms find their applicability in almost every domain likewise understanding market segment, fraud detection, trend analysis, healthcare sector, education sector and many more. Looking at the wide range of applicability, in this paper, a brief overview of data mining algorithms is discussed. This discussion comprises of different data mining algorithms, their mathematical modelling, their evaluation methods, and their limitations. To support the fact a case study is conducted on a cardiovascular disease dataset and the measures of these mining techniques are compared.

Download Full-text

New representations in genetic programming for feature construction in k-means clustering

10.26686/wgtn.13058759 ◽

2020 ◽

Author(s):

Andrew Lensen ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Data Mining ◽

Genetic Programming ◽

Performance Improvement ◽

Feature Construction ◽

Improve Performance ◽

Data Mining Algorithms ◽

Significant Performance ◽

International Publishing ◽

Mining Algorithms ◽

Insight Into

© Springer International Publishing AG 2017. k-means is one of the fundamental and most well-known algorithms in data mining. It has been widely used in clustering tasks, but suffers from a number of limitations on large or complex datasets. Genetic Programming (GP) has been used to improve performance of data mining algorithms by performing feature construction—the process of combining multiple attributes (features) of a dataset together to produce more powerful constructed features. In this paper, we propose novel representations for using GP to perform feature construction to improve the clustering performance of the k-means algorithm. Our experiments show significant performance improvement compared to k-means across a variety of difficult datasets. Several GP programs are also analysed to provide insight into how feature construction is able to improve clustering performance.

Download Full-text

Application of Data Mining Algorithms in Determination of Voting Tendencies in Turkey

Advances in Wireless Technologies and Telecommunication - Recent Developments in Individual and Organizational Adoption of ICTs ◽

10.4018/978-1-7998-3045-0.ch008 ◽

2021 ◽

pp. 134-149

Author(s):

Ali Bayır ◽

Sevinç Gülseçen ◽

Gökhan Türkmen

Keyword(s):

Data Mining ◽

Current Data ◽

Data Set ◽

Data Mining Techniques ◽

Human Behaviors ◽

Political Sciences ◽

Data Mining Algorithms ◽

Number Of Factors ◽

Mining Algorithms

Political elections are influenced by a number of factors such as political tendencies, voters' perceptions, and preferences. The results of a political election could also be based on specific attributes of candidates: age, gender, occupancy, education, etc. Although it is very difficult to understand all the factors which could have influenced the outcome of the election, many of the attributes mentioned above could be included in a data set, and by using current data mining techniques, undiscovered patterns can be revealed. Despite unpredictability of human behaviors and/or choices involved, data mining techniques still could help in predicting the election outcomes. In this study, the results of the survey prepared by KONDA Research and Consultancy Company before 2011 elections in Turkey were used as raw data. This study may help in understanding how data mining methods and techniques could be used in political sciences research. The study may also reveal whether voting tendencies in elections could be a factor for the outcome of the election.

Download Full-text

New representations in genetic programming for feature construction in k-means clustering

10.26686/wgtn.13058759.v1 ◽

2020 ◽

Author(s):

Andrew Lensen ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Data Mining ◽

Genetic Programming ◽

Performance Improvement ◽

Feature Construction ◽

Improve Performance ◽

Data Mining Algorithms ◽

Significant Performance ◽

International Publishing ◽

Mining Algorithms ◽

Insight Into

Download Full-text

Novel Adverse Events of Iloperidone: A Disproportionality Analysis in US Food and Drug Administration Adverse Event Reporting System (FAERS) Database

Current Drug Safety ◽

10.2174/1574886313666181026100000 ◽

2019 ◽

Vol 14 (1) ◽

pp. 21-26 ◽

Cited By ~ 2

Author(s):

Viswam Subeesh ◽

Eswaran Maheswari ◽

Hemendra Singh ◽

Thomas Elsa Beulah ◽

Ann Mary Swaroop

Keyword(s):

Data Mining ◽

Adverse Event ◽

Adverse Events ◽

Reporting System ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Disproportionality Analysis ◽

Positive Signal ◽

Data Mining Algorithms ◽

Mining Algorithms

Background: The signal is defined as “reported information on a possible causal relationship between an adverse event and a drug, of which the relationship is unknown or incompletely documented previously”. Objective: To detect novel adverse events of iloperidone by disproportionality analysis in FDA database of Adverse Event Reporting System (FAERS) using Data Mining Algorithms (DMAs). Methodology: The US FAERS database consists of 1028 iloperidone associated Drug Event Combinations (DECs) which were reported from 2010 Q1 to 2016 Q3. We consider DECs for disproportionality analysis only if a minimum of ten reports are present in database for the given adverse event and which were not detected earlier (in clinical trials). Two data mining algorithms, namely, Reporting Odds Ratio (ROR) and Information Component (IC) were applied retrospectively in the aforementioned time period. A value of ROR-1.96SE>1 and IC- 2SD>0 were considered as the threshold for positive signal. Results: The mean age of the patients of iloperidone associated events was found to be 44years [95% CI: 36-51], nevertheless age was not mentioned in twenty-one reports. The data mining algorithms exhibited positive signal for akathisia (ROR-1.96SE=43.15, IC-2SD=2.99), dyskinesia (21.24, 3.06), peripheral oedema (6.67,1.08), priapism (425.7,9.09) and sexual dysfunction (26.6-1.5) upon analysis as those were well above the pre-set threshold. Conclusion: Iloperidone associated five potential signals were generated by data mining in the FDA AERS database. The result requires an integration of further clinical surveillance for the quantification and validation of possible risks for the adverse events reported of iloperidone.

Download Full-text