PCA based Regression Decision Tree Classification for Somatic Mutations.

The analization of cancer data and normal data for the predication of somatic mu-tation occurrences in the data set plays an important role and several challenges persist in detectingsomatic mutations which leads to complexity of handling large volumes of data in classifi-cation with good accuracy. In many situations the dataset may consist of redundant and less significant features and there is a need to remove insignificant features in order to improve the performance of classification. Feature selection techniques are useful for dimensionality reduction purpose. PCA is one type of feature selection technique to identify significant attributes and is adopted in this paper. A novel technique, PCA based regression decision tree is proposed for classification of somatic mutations data in this paper.The performance analysis of this clas-sification process for the detection of somatic mutation is compared with existing algorithms and satisfactory results are obtained with the proposed model.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Design of novel multi filter union feature selection framework for breast cancer dataset

Concurrent Engineering ◽

10.1177/1063293x211016046 ◽

2021 ◽

pp. 1063293X2110160

Author(s):

Dinesh Morkonda Gunasekaran ◽

Prabha Dhandayudam

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Care Center ◽

Feature Selection Method ◽

Selection Method ◽

Cancer Center ◽

Breast Cancer Dataset ◽

Data Set ◽

Health Care Center ◽

Cancer Data

Nowadays women are commonly diagnosed with breast cancer. Feature based Selection method plays an important step while constructing a classification based framework. We have proposed Multi filter union (MFU) feature selection method for breast cancer data set. The feature selection process based on random forest algorithm and Logistic regression (LG) algorithm based union model is used for selecting important features in the dataset. The performance of the data analysis is evaluated using optimal features subset from selected dataset. The experiments are computed with data set of Wisconsin diagnostic breast cancer center and next the real data set from women health care center. The result of the proposed approach shows high performance and efficient when comparing with existing feature selection algorithms.

Download Full-text

A Survey on Feature Selection Techniques using Evolutionary Algorithms

Iraqi Journal of Science ◽

10.24996/ijs.2021.62.8.32 ◽

2021 ◽

pp. 2796-2812

Author(s):

Nishath Ansari

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Machine Learning Algorithms ◽

Nsga Ii ◽

Feature Selection Technique ◽

Heuristic Strategies ◽

Real World Applications ◽

Heuristic Strategy ◽

Feature Selection Techniques ◽

Straightforward Approach

Feature selection, a method of dimensionality reduction, is nothing but collecting a range of appropriate feature subsets from the total number of features. In this paper, a point by point explanation review about the feature selection in this segment preferred affairs and its appraisal techniques are discussed. I will initiate my conversation with a straightforward approach so that we consider taking care of features and preferred issues depending upon meta-heuristic strategy. These techniques help in obtaining the best highlight subsets. Thereafter, this paper discusses some system models that drive naturally from the environment are discussed and calculations are performed so that we can take care of the preferred feature matters in complex and massive data. Here, furthermore, I discuss algorithms like the genetic algorithm (GA), the Non-Dominated Sorting Genetic Algorithm (NSGA-II), Particle Swarm Optimization (PSO), and some other meta-heuristic strategies for considering the provisional separation of issues. A comparison of these algorithms has been performed; the results show that the feature selection technique benefits machine learning algorithms by improving the performance of the algorithm. This paper also presents various real-world applications of using feature selection.

Download Full-text

A Classification Model for Multispectral Forest Datatype with the help of a Decision Tree and Wrapper Based Forward Feature Selection Technique

Lecture Notes in Networks and Systems - Advances in Distributed Computing and Machine Learning ◽

10.1007/978-981-16-4807-6_42 ◽

2022 ◽

pp. 444-456

Author(s):

Madhusmita Sahu ◽

Rasmita Dash

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Classification Model ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Research on Duplicated Documentation Removal Model Based on Information Entropy and Decision Classification Techniques

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.1357 ◽

2014 ◽

Vol 998-999 ◽

pp. 1357-1361

Author(s):

Qing Song Tang ◽

Jian Ying He

Keyword(s):

Decision Tree ◽

Information Entropy ◽

Processing Method ◽

Electronic Document ◽

Decision Tree Classification ◽

Classification Technique ◽

Proposed Model ◽

Discrete Processing ◽

Type Size ◽

Removal Model

Electronic document is presented in the form of data table through applying characterization and discrete processing method to its type, size, MD5 value etc. A duplication removal model for documentation is constructed by using information entropy based decision tree classification technique. Simple experiments are carried out that show that the proposed model is feasible to a certain degree and can achieve documentation’s duplication removal to a certain extent.

Download Full-text

Hybrid bat-ant colony optimization algorithm for rule-based feature selection in health care

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v10i6.pp6655-6663 ◽

2020 ◽

Vol 10 (6) ◽

pp. 6655

Author(s):

Rafid Sagban ◽

Haydar A. Marhoon ◽

Raaid Alubady

Keyword(s):

Health Care ◽

Cervical Cancer ◽

Feature Selection ◽

Ant Colony Optimization ◽

Diagnostic Methods ◽

Ant Colony ◽

Ant Colony Optimization Algorithm ◽

Data Set ◽

Rule Based ◽

Cancer Data

Rule-based classification in the field of health care using artificial intelligence provides solutions in decision-making problems involving different domains. An important challenge is providing access to good and fast health facilities. Cervical cancer is one of the most frequent causes of death in females. The diagnostic methods for cervical cancer used in health centers are costly and time-consuming. In this paper, bat algorithm for feature selection and ant colony optimization-based classification algorithm were applied on cervical cancer data set obtained from the repository of the University of California, Irvine to analyze the disease based on optimal features. The proposed algorithm outperforms other methods in terms of comprehensibility and obtains better results in terms of classification accuracy.

Download Full-text

PENERAPAN DATA MINING MENGGUNAKAN ALGORITMA C4.5 TEHADAP PENGARUH PENJUALAN KOPI PADA PT. JPW INDONESIA

Jurnal Sistem Informasi dan Informatika (Simika) ◽

10.47080/simika.v3i1.836 ◽

2020 ◽

Vol 3 (1) ◽

pp. 40-54

Author(s):

Ikong Ifongki

Keyword(s):

Data Mining ◽

Decision Tree ◽

Decision Rules ◽

Large Data ◽

Added Value ◽

Data Set ◽

Use Of Data ◽

Decision Tree Classification ◽

C4.5 Algorithm

Data mining is a series of processes to explore the added value of a data set in the form of knowledge that has not been known manually. The use of data mining techniques is expected to provide knowledge - knowledge that was previously hidden in the data warehouse, so that it becomes valuable information. C4.5 algorithm is a decision tree classification algorithm that is widely used because it has the main advantages of other algorithms. The advantages of the C4.5 algorithm can produce decision trees that are easily interpreted, have an acceptable level of accuracy, are efficient in handling discrete type attributes and can handle discrete and numeric type attributes. The output of the C4.5 algorithm is a decision tree like other classification techniques, a decision tree is a structure that can be used to divide a large data set into smaller sets of records by applying a series of decision rules, with each series of division members of the resulting set become similar to each other. In this case study what is discussed is the effect of coffee sales by processing 106 data from 1087 coffee sales data at PT. JPW Indonesia. Data samples taken will be calculated manually using Microsoft Excel and Rapidminer software. The results of the calculation of the C4.5 algorithm method show that the Quantity and Price attributes greatly affect coffee sales so that sales at PT. JPW Indonesia is still often unstable.

Download Full-text

An Ensemble Voted Feature Selection Technique for Predictive Modeling of Malwares of Android

International Journal of Information System Modeling and Design ◽

10.4018/ijismd.2019040103 ◽

2019 ◽

Vol 10 (2) ◽

pp. 46-69

Author(s):

Abhishek Bhattacharya ◽

Radha Tamal Goswami ◽

Kuntal Mukherjee ◽

Nhu Gia Nguyen

Keyword(s):

Feature Selection ◽

Predictive Modeling ◽

Data Partitioning ◽

Coefficient Of Determination ◽

Feature Selection Technique ◽

Selection Technique ◽

Feature Selector ◽

The Impact ◽

Feature Selection Techniques ◽

Installation Time

Each Android application requires accumulations of permissions in installation time and they are considered as the features which can be utilized in permission-based identification of Android malwares. Recently, ensemble feature selection techniques have received increasing attention over conventional techniques in different applications. In this work, a cluster based voted ensemble voted feature selection technique combining five base wrapper approaches of R libraries is projected for identifying most prominent set of features in the predictive modeling of Android malwares. The proposed method preserves both the desirable features of an ensemble feature selector, accuracy and diversity. Moreover, in this work, five different data partitioning ratios are considered and the impact of those ratios on predictive model are measured using coefficient of determination (r-square) and root mean square error. The proposed strategy has created significant better outcome in term of the number of selected features and classification accuracy.

Download Full-text

Feature Selection Methods in QSAR Studies

Journal of AOAC International ◽

10.5740/jaoacint.sge_goodarzi ◽

2012 ◽

Vol 95 (3) ◽

pp. 636-651 ◽

Cited By ~ 51

Author(s):

Mohammad Goodarzi ◽

Bieke Dejaegher ◽

Yvan Vander Heyden

Keyword(s):

Feature Selection ◽

Biological Activity ◽

Learning Algorithm ◽

Qsar Model ◽

Model Complexity ◽

Feature Selection Problem ◽

Data Set ◽

Prediction Ability ◽

Qsar Studies ◽

Feature Selection Techniques

Abstract A quantitative structure-activity relationship (QSAR) relates quantitative chemical structure attributes (molecular descriptors) to a biological activity. QSAR studies have now become attractive in drug discovery and development because their application can save substantial time and human resources. Several parameters are important in the prediction ability of a QSAR model. On the one hand, different statistical methods may be applied to check the linear or nonlinear behavior of a data set. On the other hand, feature selection techniques are applied to decrease the model complexity, to decrease the overfitting/overtraining risk, and to select the most important descriptors from the often more than 1000 calculated. The selected descriptors are then linked to a biological activity of the corresponding compound by means of a mathematical model. Different modeling techniques can be applied, some of which explicitly require a feature selection. A QSAR model can be useful in the design of new compounds with improved potency in the class under study. Only molecules with a predicted interesting activity will be synthesized. In the feature selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus attention, while ignoring the rest. Up to now, many feature selection techniques, such as genetic algorithms, forward selection, backward elimination, stepwise regression, and simulated annealing have been used extensively. Swarm intelligence optimizations, such as ant colony optimization and partial swarm optimization, which are feature selection techniques usually simulated based on animal and insect life behavior to find the shortest path between a food source and their nests, recently are also involved in QSAR studies. This review paper provides an overview of different feature selection techniques applied in QSAR modeling.

Download Full-text

Feature Selection Techniques to Choose the Best Features for Parkinsons Disease Predictions Based on Decision Tree

Journal of Physics Conference Series ◽

10.1088/1742-6596/1477/3/032008 ◽

2020 ◽

Vol 1477 ◽

pp. 032008

Author(s):

Yulianti ◽

A N Syapariyah ◽

A Saifudin ◽

T. Desyani

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Parkinsons Disease ◽

Feature Selection Techniques

Download Full-text