A Classification Model for Multispectral Forest Datatype with the help of a Decision Tree and Wrapper Based Forward Feature Selection Technique

<span>The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.</span>

Download Full-text

A Feature Selection Technique for Cloud IDS Using Ant Colony Optimization and Decision Tree

Advanced Science Letters ◽

10.1166/asl.2017.10045 ◽

2017 ◽

Vol 23 (9) ◽

pp. 9163-9169 ◽

Cited By ~ 1

Author(s):

Nurudeen Mahmud Ibrahim ◽

Anazida Zainal

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Ant Colony Optimization ◽

Ant Colony ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Comparative Study of Classification Models with Genetic Search Based Feature Selection Technique

International Journal of Applied Evolutionary Computation ◽

10.4018/ijaec.2018070101 ◽

2018 ◽

Vol 9 (3) ◽

pp. 1-11

Author(s):

Sanat Kumar Sahu ◽

A. K. Shrivas

Keyword(s):

Feature Selection ◽

Radial Basis Function Network ◽

Research Work ◽

Regression Tree ◽

Classification And Regression Tree ◽

Classification Model ◽

Classification Models ◽

Feature Selection Technique ◽

Genetic Search ◽

Selection Technique

Feature selection plays a very important role to retrieve the relevant features from datasets and computationally improves the performance of a model. The objective of this study is to evaluate the most important features of a chronic kidney disease (CKD) dataset and diagnose the CKD problem. In this research work, the authors have used a genetic search with the Wrapper Subset Evaluator method for feature selection to increase the overall performance of the classification model. They have also used Bayes Network, Classification and Regression Tree (CART), Radial Basis Function Network (RBFN) and J48 classifier for classification of CKD and non-CKD data. The proposed genetic search based feature selection technique (GSBFST) selects the best features from CKD dataset and compares the performance of classifiers with proposed and existing genetic search feature selection techniques (FSTs). All classification models give the better result with proposed GSBFST as compared to without FST and existing genetic search FSTs.

Download Full-text

A STUDY OF SOFTWARE METRIC SELECTION TECHNIQUES: STABILITY ANALYSIS AND DEFECT PREDICTION MODEL PERFORMANCE

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213013600105 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1360010 ◽

Cited By ~ 6

Author(s):

HUANJING WANG ◽

TAGHI M. KHOSHGOFTAAR ◽

QIANHUI (ALTHEA) LIANG

Keyword(s):

Feature Selection ◽

Prediction Model ◽

Prediction Models ◽

Model Performance ◽

Classification Model ◽

Defect Prediction ◽

Feature Selection Technique ◽

Selection Technique ◽

Metric Selection ◽

Feature Selection Techniques

Software metrics (features or attributes) are collected during the software development cycle. Metric selection is one of the most important preprocessing steps in the process of building defect prediction models and may improve the final prediction result. However, the addition or removal of program modules (instances or samples) can alter the subsets chosen by a feature selection technique, rendering the previously-selected feature sets invalid. Very limited research have been done considering both stability (or robustness) and defect prediction model performance together in the software engineering domain, despite the importance of both aspects when choosing a feature selection technique. In this paper, we test the stability and classification model performance of eighteen feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on sixteen datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability while two different versions of ReliefF show the most stability, followed by the PRC- and AUC-based threshold-based feature selection techniques. Results also show that the signal-to-noise ranker performed moderately in terms of robustness and was the best ranker in terms of model performance. Finally, we conclude that while for some rankers, stability and classification performance are correlated, this is not true for other rankers, and therefore performance according to one scheme (stability or model performance) cannot be used to predict performance according to the other.

Download Full-text

Classification of Gene Expression Data using Efficient Feature Selection Technique and Resampling Method

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e7816.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 406-414

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Expression Data ◽

Classification Model ◽

Support Vector ◽

Expression Data ◽

Feature Selection Technique ◽

Selection Technique ◽

Resampling Method

Microarray technology has been developed as one of the powerful tools that have attracted many researchers to analyze gene expression level for a given organism. It has been observed that gene expression data have very large (in terms of thousands) of features and less number of samples (in terms of hundreds). This characteristic makes difficult to do an analysis of gene expression data. Hence efficient feature selection technique must be applied before we go for any kind of analysis. Feature selection plays a vital role in the classification of gene expression data. There are several feature selection techniques have been induced in this field. But Support Vector Machine with Recursive Feature Elimination (SVM-RFE) has been proven as the promising feature selection methods among others. SVM-RFE ranks the genes (features) by training the SVM classification model and with the combination of RFE method key genes are selected. Huge time consumption is the main issue of SVM-RFE. We introduced an efficient implementation of linier SVM to overcome this problem and improved the RFE with variable step size. Then, combined method was used for selecting informative genes. Effective resampling method is proposed to preprocess the datasets. This is used to make the distribution of samples balanced, which gives more reliable classification results. In this paper, we have also studied the applicability of common classifiers. Detailed experiments are conducted on four commonly used microarray gene expression datasets. The results show that the proposed method comparable classification performance

Download Full-text

SyntcRec: a Syntactic Recommender System Based on Improved Feature Selection Technique in Large Scholarly Data

International Journal on Communications Antenna and Propagation (IRECAP) ◽

10.15866/irecap.v7i6.13353 ◽

2017 ◽

Vol 7 (6) ◽

pp. 537

Author(s):

Deepa Mandave ◽

Govind Pole

Keyword(s):

Feature Selection ◽

Recommender System ◽

Feature Selection Technique ◽

Selection Technique ◽

Scholarly Data

Download Full-text

Identification of Secretory Proteins of Malaria Parasite by Feature Selection Technique

Letters in Organic Chemistry ◽

10.2174/1570178614666170329155502 ◽

2017 ◽

Vol 14 (9) ◽

Cited By ~ 14

Author(s):

Hua Tang ◽

Chunmei Zhang ◽

Rong Chen ◽

Po Huang ◽

Chenggang Duan ◽

...

Keyword(s):

Feature Selection ◽

Malaria Parasite ◽

Secretory Proteins ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Hybrid feature selection technique for prediction of cardiovascular diseases

Materials Today Proceedings ◽

10.1016/j.matpr.2021.03.225 ◽

2021 ◽

Author(s):

Pavithra V ◽

Jayalakshmi V

Keyword(s):

Feature Selection ◽

Cardiovascular Diseases ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

Predicting breast cancer recurrence using effective classification and feature selection technique

2016 19th International Conference on Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2016.7860215 ◽

2016 ◽

Cited By ~ 10

Author(s):

Ahmed Iqbal Pritom ◽

Md. Ahadur Rahman Munshi ◽

Shahed Anzarus Sabab ◽

Shihabuzzaman Shihab

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Cancer Recurrence ◽

Breast Cancer Recurrence ◽

Feature Selection Technique ◽

Selection Technique

Download Full-text

An intelligent approach towards very short-term load forecasting

International Journal of Emerging Electric Power Systems ◽

10.1515/ijeeps-2021-0012 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

Uttamarani Pati ◽

Papia Ray ◽

Arvind R. Singh

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Load Forecasting ◽

Dew Point ◽

Short Term ◽

Spinning Reserve ◽

Feature Selection Technique ◽

Selection Technique ◽

Load Demand ◽

Short Term Load Forecasting

Abstract Very short term load forecasting (VSTLF) plays a pivotal role in helping the utility workers make proper decisions regarding generation scheduling, size of spinning reserve, and maintaining equilibrium between the power generated by the utility to fulfil the load demand. However, the development of an effective VSTLF model is challenging in gathering noisy real-time data and complicates features found in load demand variations from time to time. A hybrid approach for VSTLF using an incomplete fuzzy decision system (IFDS) combined with a genetic algorithm (GA) based feature selection technique for load forecasting in an hour ahead format is proposed in this research work. This proposed work aims to determine the load features and eliminate redundant features to form a less complex forecasting model. The proposed method considers the time of the day, temperature, humidity, and dew point as inputs and generates output as forecasted load. The input data and historical load data are collected from the Northern Regional Load Dispatch Centre (NRLDC) New Delhi for December 2009, January 2010 and February 2010. For validation of proposed method efficacy, it’s performance is further compared with other conventional AI techniques like ANN and ANFIS, which are integrated with genetic algorithm-based feature selection technique to boost their performance. These techniques’ accuracy is tested through their mean absolute percentage error (MAPE) and normalized root mean square error (nRMSE) value. Compared to other conventional AI techniques and other methods provided through previous studies, the proposed method is found to have acceptable accuracy for 1 h ahead of electrical load forecasting.

Download Full-text