A Classification Model for Multispectral Forest Datatype with the help of a Decision Tree and Wrapper Based Forward Feature Selection Technique

Author(s):  
Madhusmita Sahu ◽  
Rasmita Dash
Author(s):  
Norsyela Muhammad Noor Mathivanan ◽  
Nor Azura Md.Ghani ◽  
Roziah Mohd Janor

<span>The curse of dimensionality and the empty space phenomenon emerged as a critical problem in text classification. One way of dealing with this problem is applying a feature selection technique before performing a classification model. This technique helps to reduce the time complexity and sometimes increase the classification accuracy. This study introduces a feature selection technique using K-Means clustering to overcome the weaknesses of traditional feature selection technique such as principal component analysis (PCA) that require a lot of time to transform all the inputs data. This proposed technique decides on features to retain based on the significance value of each feature in a cluster. This study found that k-means clustering helps to increase the efficiency of KNN model for a large data set while KNN model without feature selection technique is suitable for a small data set. A comparison between K-Means clustering and PCA as a feature selection technique shows that proposed technique is better than PCA especially in term of computation time. Hence, k-means clustering is found to be helpful in reducing the data dimensionality with less time complexity compared to PCA without affecting the accuracy of KNN model for a high frequency data.</span>


2018 ◽  
Vol 9 (3) ◽  
pp. 1-11
Author(s):  
Sanat Kumar Sahu ◽  
A. K. Shrivas

Feature selection plays a very important role to retrieve the relevant features from datasets and computationally improves the performance of a model. The objective of this study is to evaluate the most important features of a chronic kidney disease (CKD) dataset and diagnose the CKD problem. In this research work, the authors have used a genetic search with the Wrapper Subset Evaluator method for feature selection to increase the overall performance of the classification model. They have also used Bayes Network, Classification and Regression Tree (CART), Radial Basis Function Network (RBFN) and J48 classifier for classification of CKD and non-CKD data. The proposed genetic search based feature selection technique (GSBFST) selects the best features from CKD dataset and compares the performance of classifiers with proposed and existing genetic search feature selection techniques (FSTs). All classification models give the better result with proposed GSBFST as compared to without FST and existing genetic search FSTs.


2013 ◽  
Vol 22 (05) ◽  
pp. 1360010 ◽  
Author(s):  
HUANJING WANG ◽  
TAGHI M. KHOSHGOFTAAR ◽  
QIANHUI (ALTHEA) LIANG

Software metrics (features or attributes) are collected during the software development cycle. Metric selection is one of the most important preprocessing steps in the process of building defect prediction models and may improve the final prediction result. However, the addition or removal of program modules (instances or samples) can alter the subsets chosen by a feature selection technique, rendering the previously-selected feature sets invalid. Very limited research have been done considering both stability (or robustness) and defect prediction model performance together in the software engineering domain, despite the importance of both aspects when choosing a feature selection technique. In this paper, we test the stability and classification model performance of eighteen feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on sixteen datasets from three real-world software projects. The experimental results demonstrate that Gain Ratio shows the least stability while two different versions of ReliefF show the most stability, followed by the PRC- and AUC-based threshold-based feature selection techniques. Results also show that the signal-to-noise ranker performed moderately in terms of robustness and was the best ranker in terms of model performance. Finally, we conclude that while for some rankers, stability and classification performance are correlated, this is not true for other rankers, and therefore performance according to one scheme (stability or model performance) cannot be used to predict performance according to the other.


Microarray technology has been developed as one of the powerful tools that have attracted many researchers to analyze gene expression level for a given organism. It has been observed that gene expression data have very large (in terms of thousands) of features and less number of samples (in terms of hundreds). This characteristic makes difficult to do an analysis of gene expression data. Hence efficient feature selection technique must be applied before we go for any kind of analysis. Feature selection plays a vital role in the classification of gene expression data. There are several feature selection techniques have been induced in this field. But Support Vector Machine with Recursive Feature Elimination (SVM-RFE) has been proven as the promising feature selection methods among others. SVM-RFE ranks the genes (features) by training the SVM classification model and with the combination of RFE method key genes are selected. Huge time consumption is the main issue of SVM-RFE. We introduced an efficient implementation of linier SVM to overcome this problem and improved the RFE with variable step size. Then, combined method was used for selecting informative genes. Effective resampling method is proposed to preprocess the datasets. This is used to make the distribution of samples balanced, which gives more reliable classification results. In this paper, we have also studied the applicability of common classifiers. Detailed experiments are conducted on four commonly used microarray gene expression datasets. The results show that the proposed method comparable classification performance


Author(s):  
Hua Tang ◽  
Chunmei Zhang ◽  
Rong Chen ◽  
Po Huang ◽  
Chenggang Duan ◽  
...  

Author(s):  
Uttamarani Pati ◽  
Papia Ray ◽  
Arvind R. Singh

Abstract Very short term load forecasting (VSTLF) plays a pivotal role in helping the utility workers make proper decisions regarding generation scheduling, size of spinning reserve, and maintaining equilibrium between the power generated by the utility to fulfil the load demand. However, the development of an effective VSTLF model is challenging in gathering noisy real-time data and complicates features found in load demand variations from time to time. A hybrid approach for VSTLF using an incomplete fuzzy decision system (IFDS) combined with a genetic algorithm (GA) based feature selection technique for load forecasting in an hour ahead format is proposed in this research work. This proposed work aims to determine the load features and eliminate redundant features to form a less complex forecasting model. The proposed method considers the time of the day, temperature, humidity, and dew point as inputs and generates output as forecasted load. The input data and historical load data are collected from the Northern Regional Load Dispatch Centre (NRLDC) New Delhi for December 2009, January 2010 and February 2010. For validation of proposed method efficacy, it’s performance is further compared with other conventional AI techniques like ANN and ANFIS, which are integrated with genetic algorithm-based feature selection technique to boost their performance. These techniques’ accuracy is tested through their mean absolute percentage error (MAPE) and normalized root mean square error (nRMSE) value. Compared to other conventional AI techniques and other methods provided through previous studies, the proposed method is found to have acceptable accuracy for 1 h ahead of electrical load forecasting.


Sign in / Sign up

Export Citation Format

Share Document