Improved Stability of Feature Selection by Combining Instance and Feature Weighting

Author(s):  
Gabriel Prat ◽  
Lluís A. Belanche
2018 ◽  
Vol 49 (4) ◽  
pp. 1580-1596 ◽  
Author(s):  
Dalwinder Singh ◽  
Birmohan Singh

2021 ◽  
Vol 12 (1) ◽  
pp. 1
Author(s):  
Rian Sanjaya ◽  
Yessica Nataliani

Abstract.Comparison of Weighted Criteria and Selection Criteria for Employee Performance Grouping with Fuzzy C-Means. The development of information technology makes it easier for companies to do many things and affect company operations. One of the objects affecting the company development is employees. Employees’ performance can be observed from their discipline, honesty, cooperation, and work quality. The purpose of this study is to group the employees based on their performance using fuzzy c-means. There are two kinds of clustering explained in this paper, i.e., clustering with feature weighting and clustering with feature selection. Using the feature weights of 25%, 30%, 25%, and 20% for work discipline, honesty, cooperation, and work quality, respectively, the clustering with feature weighting gives an accuracy rate of 0.8462. While using feature selection, the fuzzy c-means give 1, where the work discipline and honesty are the critical features in clustering. Therefore, we find that honesty is the most essential feature to cluster the employees based on their performance from this research.Keywords: clustering, employees, fuzzy c-means, feature weighting, feature selectionAbstrak.Perkembangan teknologi informasi mempermudah perusahaan dalam melakukan banyak hal dan mempengaruhi operasional perusahaan. Salah satu objek yang mempengaruhi operasional perusahaan adalah kinerja karyawan. Penilaian kinerja karyawan didasarkan pada empat kriteria, yaitu kedisiplinan, kejujuran, kerja sama, dan kualitas kerja, Tujuan penelitian ini untuk melakukan pengelompokan karyawan dengan fuzzy c-means. Pengelompokan yang dilakukan dalam penelitian ini terdiri dari dua macam, yaitu pengelompokan dengan pembobotan kriteria dan pengelompokan dengan seleksi kriteria. Dengan bobot sebesar 25%, 30%, 25%, dan 20% untuk kriteria kedisiplinan, kejujuran, kerja sama, dan kualitas kerja, pengelompokan dengan pembobotan kriteria menghasilkan akurasi sebesar 0.8462. Pengelompokan FCM dengan seleksi kriteria menghasilkan kriteria kedisiplinan dan kejujuran merupakan dua kriteria yang penting dalam pengelompokan karyawan, dengan akurasi sebesar 1. Dari hasil perbandingan dua macam pengelompokan tersebut didapatkan bahwa kejujuran merupakan kriteria terpenting dalam pengelompokan karyawan berdasarkan kinerjanya.Kata Kunci: pengelompokan, karyawan, fuzzy c-means, pembobotan kriteria, seleksi kriteria


2021 ◽  
Author(s):  
◽  
Shima Afzali Vahed Moghaddam

<p>The human visual system can efficiently cope with complex natural scenes containing various objects at different scales using the visual attention mechanism. Salient object detection (SOD) aims to simulate the capability of the human visual system in prioritizing objects for high-level processing. SOD is a process of identifying and localizing the most attention grabbing object(s) of a scene and separating the whole extent of the object(s) from the scene. In SOD, significant research has been dedicated to design and introduce new features to the domain. The existing saliency feature space suffers from some difficulties such as having high dimensionality, features are not equally important, some features are irrelevant, and the original features are not informative enough. These difficulties can lead to various performance limitations. Feature manipulation is the process which improves the input feature space to enhance the learning quality and performance.   Evolutionary computation (EC) techniques have been employed in a wide range of tasks due to their powerful search abilities. Genetic programming (GP) and particle swarm optimization (PSO) are well-known EC techniques which have been used for feature manipulation.   The overall goal of this thesis is to develop feature manipulation methods including feature weighting, feature selection, and feature construction using EC techniques to improve the input feature set for SOD.   This thesis proposes a feature weighting method utilizing PSO to explore the relative contribution of each saliency feature in the feature combination process. Saliency features are referred to the features which are extracted from different levels (e.g., pixel, segmentation) of an image to compute the saliency values over the entire image. The experimental results show that different datasets favour different weights for the employed features. The results also reveal that by considering the importance of each feature in the combination process, the proposed method has achieved better performance than that of the competitive methods.  This thesis proposes a new bottom-up SOD method to detect salient objects by constructing two new informative saliency features and designing a new feature combination framework. The proposed method aims at developing features which target to identify different regions of the image. The proposed method makes a good balance between computational time and performance.   This thesis proposes a GP-based method to automatically construct foreground and background saliency features. The automatically constructed features do not require domain-knowledge and they are more informative compared to the manually constructed features. The results show that GP is robust towards the changes in the input feature set (e.g., adding more features to the input feature set) and improves the performance by introducing more informative features to the SOD domain.   This thesis proposes a GP-based SOD method which automatically produces saliency maps (a 2-D map containing saliency values) for different types of images. This GP-based SOD method applies feature selection and feature combination during the learning process for SOD. GP with built-in feature selection process which selects informative features from the original set and combines the selected features to produce the final saliency map. The results show that GP can potentially explore a large search space and find a good way to combine different input features.  This thesis introduces GP for the first time to construct high-level saliency features from the low-level features for SOD, which aims to improve the performance of SOD, particularly on challenging and complex SOD tasks. The proposed method constructs fewer features that achieve better saliency performance than the original full feature set.</p>


Author(s):  
Xia Wu ◽  
Xueyuan Xu ◽  
Jianhong Liu ◽  
Hailing Wang ◽  
Bin Hu ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Maoxian Zhao ◽  
Yue Qin

For the low optimization accuracy of the cuckoo search algorithm, a new search algorithm, the Elite Hybrid Binary Cuckoo Search (EHBCS) algorithm, is improved by feature weighting and elite strategy. The EHBCS algorithm has been designed for feature selection on a series of binary classification datasets, including low-dimensional and high-dimensional samples by SVM classifier. The experimental results show that the EHBCS algorithm achieves better classification performances compared with binary genetic algorithm and binary particle swarm optimization algorithm. Besides, we explain its superiority in terms of standard deviation, sensitivity, specificity, precision, and F -measure.


2019 ◽  
Vol 6 (1) ◽  
pp. 138-149
Author(s):  
Ukhti Ikhsani Larasati ◽  
Much Aziz Muslim ◽  
Riza Arifudin ◽  
Alamsyah Alamsyah

Data processing can be done with text mining techniques. To process large text data is required a machine to explore opinions, including positive or negative opinions. Sentiment analysis is a process that applies text mining methods. Sentiment analysis is a process that aims to determine the content of the dataset in the form of text is positive or negative. Support vector machine is one of the classification algorithms that can be used for sentiment analysis. However, support vector machine works less well on the large-sized data. In addition, in the text mining process there are constraints one is number of attributes used. With many attributes it will reduce the performance of the classifier so as to provide a low level of accuracy. The purpose of this research is to increase the support vector machine accuracy with implementation of feature selection and feature weighting. Feature selection will reduce a large number of irrelevant attributes. In this study the feature is selected based on the top value of K = 500. Once selected the relevant attributes are then performed feature weighting to calculate the weight of each attribute selected. The feature selection method used is chi square statistic and feature weighting using Term Frequency Inverse Document Frequency (TFIDF). Result of experiment using Matlab R2017b is integration of support vector machine with chi square statistic and TFIDF that uses 10 fold cross validation gives an increase of accuracy of 11.5% with the following explanation, the accuracy of the support vector machine without applying chi square statistic and TFIDF resulted in an accuracy of 68.7% and the accuracy of the support vector machine by applying chi square statistic and TFIDF resulted in an accuracy of 80.2%.


Author(s):  
Qasem A. Al-Radaideh ◽  
Md Nasir Sulaiman ◽  
Mohd Hasan Selamat ◽  
Hamidah Ibrahim

2022 ◽  
Vol 10 (1) ◽  
pp. 0-0

: The medical diagnostic process works very similarly to the Case Based Reasoning (CBR) cycle scheme. CBR is a problem solving approach based on the reuse of past experiences called cases. To improve the performance of the retrieval phase, a Random Forest (RF) model is proposed, in this respect we used this algorithm in three different ways (three different algorithms): Classic Random Forest (CRF) algorithm, Random Forest with Feature Selection (RF_FS) algorithm where we selected the most important attributes and deleted the less important ones and Weighted Random Forest (WRF) algorithm where we weighted the most important attributes by giving them more weight. We did this by multiplying the entropy with the weight corresponding to each attribute.We tested our three algorithms CRF, RF_FS and WRF with CBR on data from 11 medical databases and compared the results they produced. We found that WRF and RF_FS give better results than CRF. The experiemental results show the performance and robustess of the proposed approach.


2015 ◽  
Vol 2015 ◽  
pp. 1-18 ◽  
Author(s):  
Thanh-Tung Nguyen ◽  
Joshua Zhexue Huang ◽  
Thuy Thi Nguyen

Random forests (RFs) have been widely used as a powerful classification method. However, with the randomization in both bagging samples and feature selection, the trees in the forest tend to select uninformative features for node splitting. This makes RFs have poor accuracy when working with high-dimensional data. Besides that, RFs have bias in the feature selection process where multivalued features are favored. Aiming at debiasing feature selection in RFs, we propose a new RF algorithm, called xRF, to select good features in learning RFs for high-dimensional data. We first remove the uninformative features usingp-value assessment, and the subset of unbiased features is then selected based on some statistical measures. This feature subset is then partitioned into two subsets. A feature weighting sampling technique is used to sample features from these two subsets for building trees. This approach enables one to generate more accurate trees, while allowing one to reduce dimensionality and the amount of data needed for learning RFs. An extensive set of experiments has been conducted on 47 high-dimensional real-world datasets including image datasets. The experimental results have shown that RFs with the proposed approach outperformed the existing random forests in increasing the accuracy and the AUC measures.


Sign in / Sign up

Export Citation Format

Share Document