Application of CART-Based Variable Ranking for Faulty Variable Isolation in Tennessee Eastman Benchmark Process

Abstract The aim of this study was to verify the newly proposed transformation of penalty points and ranking of showjumping horses for the purpose of genetic evaluation. Genomic information in the transformation of input data was used as well. Data of showjumping competition Global Champions Tour was used. Profit of penalty points was transformed to normally distributed variable using Blom formula (height of obstacles and height of obstacles with single nucleotide polymorphism - SNP effect taken into account). Non-normal distribution was obtained. The rankings of sport horses in competitions were transformed using the Blom formula (height of obstacles taken into account) to normal distribution (tests of normality Kolmogorov-Smirnov (KS) test Pr>D, D 0.011, P>0.150, Cramer-von Mises (CM) test Pr>W-Sq, W-Sq 0.039, P>0.250, Anderson-Darling test (AD) Pr>A-Sq, A-Sq 0.638, P<0.097). Better distributed variable ranking transformed by Blom formula (height of obstacles and SNP effect taken into account) was obtained (KS test Pr>D, D 0.004, P>0.150, CM test Pr>W-Sq, W-Sq 0.004, P>0.250, AD test Pr>A-Sq, A-Sq 0.062, P>0.250). Model where all used fixed effects to equation were applied without any combination of the effects was tested, R2 0.54. Variable ranking was transformed to normal score by Blom formula (height of obstacles was taken into account). In the following model some effects were taken into account in the form of quadratic regression, R2 0.61. Variable ranking was transformed to normal score, the same as in previous model. In the last model we transformed variable ranking to normal score by Blom formula, taking into account height of obstacles and SNP effect. Same effects as in previous model were used, R2 0.60

Download Full-text

Filter Variable Selection Algorithm Using Risk Ratios for Dimensionality Reduction of Healthcare Data for Classification

Processes ◽

10.3390/pr7040222 ◽

2019 ◽

Vol 7 (4) ◽

pp. 222 ◽

Cited By ~ 4

Author(s):

Bodur ◽

Atsa’am

Keyword(s):

Data Mining ◽

Variable Selection ◽

Feature Space ◽

Selection Methods ◽

Selection Algorithm ◽

Fisher Score ◽

Healthcare Data ◽

Classification Tasks ◽

Risk Ratios ◽

Variable Ranking

This research developed and tested a filter algorithm that serves to reduce the feature space in healthcare datasets. The algorithm binarizes the dataset, and then separately evaluates the risk ratio of each predictor with the response, and outputs ratios that represent the association between a predictor and the class attribute. The value of the association translates to the importance rank of the corresponding predictor in determining the outcome. Using Random Forest and Logistic regression classification, the performance of the developed algorithm was compared against the regsubsets and varImp functions, which are unsupervised methods of variable selection. Equally, the proposed algorithm was compared with the supervised Fisher score and Pearson’s correlation feature selection methods. Different datasets were used for the experiment, and, in the majority of the cases, the predictors selected by the new algorithm outperformed those selected by the existing algorithms. The proposed filter algorithm is therefore a reliable alternative for variable ranking in data mining classification tasks with a dichotomous response.

Download Full-text