Genetic Selection Algorithm and Cloning for Data Mining with GMDH Method

Author(s):  
Marcel Jirina ◽  
Marcel Jirina
2013 ◽  
Vol 2 (4) ◽  
pp. 33-46 ◽  
Author(s):  
P. K. Nizar Banu ◽  
H. Hannah Inbarani

As the micro array databases increases in dimension and results in complexity, identifying the most informative genes is a challenging task. Such difficulty is often related to the huge number of genes with very few samples. Research in medical data mining addresses this problem by applying techniques from data mining and machine learning to the micro array datasets. In this paper Unsupervised Tolerance Rough Set based Quick Reduct (U-TRS-QR), a diverse feature selection algorithm, which extends the existing equivalent rough sets for unsupervised learning, is proposed. Genes selected by the proposed method leads to a considerably improved class predictions in wide experiments on two gene expression datasets: Brain Tumor and Colon Cancer. The results indicate consistent improvement among 12 classifiers.


2013 ◽  
Vol 22 (04) ◽  
pp. 1350027
Author(s):  
JAGANATHAN PALANICHAMY ◽  
KUPPUCHAMY RAMASAMY

Feature selection is essential in data mining and pattern recognition, especially for database classification. During past years, several feature selection algorithms have been proposed to measure the relevance of various features to each class. A suitable feature selection algorithm normally maximizes the relevancy and minimizes the redundancy of the selected features. The mutual information measure can successfully estimate the dependency of features on the entire sampling space, but it cannot exactly represent the redundancies among features. In this paper, a novel feature selection algorithm is proposed based on maximum relevance and minimum redundancy criterion. The mutual information is used to measure the relevancy of each feature with class variable and calculate the redundancy by utilizing the relationship between candidate features, selected features and class variables. The effectiveness is tested with ten benchmarked datasets available in UCI Machine Learning Repository. The experimental results show better performance when compared with some existing algorithms.


PLoS ONE ◽  
2017 ◽  
Vol 12 (11) ◽  
pp. e0187676 ◽  
Author(s):  
Peyman Tavallali ◽  
Marianne Razavi ◽  
Sean Brady

2013 ◽  
Vol 23 (5) ◽  
pp. 451-464
Author(s):  
Marcel Jiřina ◽  
Marcel, jr. Jiřina

Processes ◽  
2019 ◽  
Vol 7 (4) ◽  
pp. 222 ◽  
Author(s):  
Bodur ◽  
Atsa’am

This research developed and tested a filter algorithm that serves to reduce the feature space in healthcare datasets. The algorithm binarizes the dataset, and then separately evaluates the risk ratio of each predictor with the response, and outputs ratios that represent the association between a predictor and the class attribute. The value of the association translates to the importance rank of the corresponding predictor in determining the outcome. Using Random Forest and Logistic regression classification, the performance of the developed algorithm was compared against the regsubsets and varImp functions, which are unsupervised methods of variable selection. Equally, the proposed algorithm was compared with the supervised Fisher score and Pearson’s correlation feature selection methods. Different datasets were used for the experiment, and, in the majority of the cases, the predictors selected by the new algorithm outperformed those selected by the existing algorithms. The proposed filter algorithm is therefore a reliable alternative for variable ranking in data mining classification tasks with a dichotomous response.


Webology ◽  
2021 ◽  
Vol 18 (SI02) ◽  
pp. 01-20
Author(s):  
S. Bharani Nayagi ◽  
T.S. Shiny Angel

The eradication of correlated evidence of the enormous volume of the directory is designated as data mining. Extracting discriminate knowledge associate with the approach is performed by a feature of knowledge. Knowledge rejuvenation is carried out as features and the process is delineated as a feature selection mechanism. Feature selection is a subset of features, acquired more information. Before data mining, Feature selection is essential to trim down the elevated dimensional information. Without feature selection pre-processing techniques, classification required interminable calculation duration which might lead to intricacy. The foremost intention of the analysis is to afford a summary of feature selection approaches adopted to evaluate the extreme extensive features.


Sign in / Sign up

Export Citation Format

Share Document