A novel weighted fuzzy c-means based on feature weight learning

2021 ◽  
pp. 1-19
Author(s):  
Xingguang Pan ◽  
Lin Wang ◽  
Chengquan Huang ◽  
Shitong Wang ◽  
Haiqing Chen

In feature weighted fuzzy c-means algorithms, there exist two challenges when the feature weighting techniques are used to improve their performances. On one hand, if the values of feature weights are learnt in advance, and then fixed in the process of clustering, the learnt weights might be lack of flexibility and might not fully reflect their relevance. On the other hand, if the feature weights are adaptively adjusted during the clustering process, the algorithms maybe suffer from bad initialization and lead to incorrect feature weight assignment, thus the performance of the algorithms may degrade the in some conditions. In order to ease these problems, a novel weighted fuzzy c-means based on feature weight learning (FWL-FWCM) is proposed. It is a hybrid of fuzzy weighted c-means (FWCM) algorithm with Improved FWCM (IFWCM) algorithm. FWL-FWCM algorithm first learns feature weights as priori knowledge from the data in advance by minimizing the feature evaluation function using the gradient descent technique, then iteratively optimizes the clustering objective function which integrates the within weighted cluster dispersion with a term of the discrepancy between the weights and the priori knowledge. Experiments conducted on an artificial dataset and real datasets demonstrate the proposed approach outperforms the-state-of-the-art feature weight clustering methods. The convergence property of FWL-FWCM is also presented.

2019 ◽  
Vol 78 ◽  
pp. 324-345 ◽  
Author(s):  
Mahdi Hashemzadeh ◽  
Amin Golzari Oskouei ◽  
Nacer Farajzadeh

2021 ◽  
Vol 12 (1) ◽  
pp. 1
Author(s):  
Rian Sanjaya ◽  
Yessica Nataliani

Abstract.Comparison of Weighted Criteria and Selection Criteria for Employee Performance Grouping with Fuzzy C-Means. The development of information technology makes it easier for companies to do many things and affect company operations. One of the objects affecting the company development is employees. Employees’ performance can be observed from their discipline, honesty, cooperation, and work quality. The purpose of this study is to group the employees based on their performance using fuzzy c-means. There are two kinds of clustering explained in this paper, i.e., clustering with feature weighting and clustering with feature selection. Using the feature weights of 25%, 30%, 25%, and 20% for work discipline, honesty, cooperation, and work quality, respectively, the clustering with feature weighting gives an accuracy rate of 0.8462. While using feature selection, the fuzzy c-means give 1, where the work discipline and honesty are the critical features in clustering. Therefore, we find that honesty is the most essential feature to cluster the employees based on their performance from this research.Keywords: clustering, employees, fuzzy c-means, feature weighting, feature selectionAbstrak.Perkembangan teknologi informasi mempermudah perusahaan dalam melakukan banyak hal dan mempengaruhi operasional perusahaan. Salah satu objek yang mempengaruhi operasional perusahaan adalah kinerja karyawan. Penilaian kinerja karyawan didasarkan pada empat kriteria, yaitu kedisiplinan, kejujuran, kerja sama, dan kualitas kerja, Tujuan penelitian ini untuk melakukan pengelompokan karyawan dengan fuzzy c-means. Pengelompokan yang dilakukan dalam penelitian ini terdiri dari dua macam, yaitu pengelompokan dengan pembobotan kriteria dan pengelompokan dengan seleksi kriteria. Dengan bobot sebesar 25%, 30%, 25%, dan 20% untuk kriteria kedisiplinan, kejujuran, kerja sama, dan kualitas kerja, pengelompokan dengan pembobotan kriteria menghasilkan akurasi sebesar 0.8462. Pengelompokan FCM dengan seleksi kriteria menghasilkan kriteria kedisiplinan dan kejujuran merupakan dua kriteria yang penting dalam pengelompokan karyawan, dengan akurasi sebesar 1. Dari hasil perbandingan dua macam pengelompokan tersebut didapatkan bahwa kejujuran merupakan kriteria terpenting dalam pengelompokan karyawan berdasarkan kinerjanya.Kata Kunci: pengelompokan, karyawan, fuzzy c-means, pembobotan kriteria, seleksi kriteria


2021 ◽  
Vol 22 (S6) ◽  
Author(s):  
Rui-Yi Li ◽  
Jihong Guan ◽  
Shuigeng Zhou

Abstract Background The rapid development of single-cell RNA sequencing (scRNA-seq) enables the exploration of cell heterogeneity, which is usually done by scRNA-seq data clustering. The essence of scRNA-seq data clustering is to group cells by measuring the similarities among genes/transcripts of cells. And the selection of features for cell similarity evaluation is of great importance, which will significantly impact clustering effectiveness and efficiency. Results In this paper, we propose a novel method called CaFew to select genes based on cluster-aware feature weighting. By optimizing the clustering objective function, CaFew obtains a feature weight matrix, which is further used for feature selection. The genes have large weights in at least one cluster or the genes whose weights vary greatly in different clusters are selected. Experiments on 8 real scRNA-seq datasets show that CaFew can obviously improve the clustering performance of existing scRNA-seq data clustering methods. Particularly, the combination of CaFew with SC3 achieves the state-of-art performance. Furthermore, CaFew also benefits the visualization of scRNA-seq data. Conclusion CaFew is an effective scRNA-seq data clustering method due to its gene selection mechanism based on cluster-aware feature weighting, and it is a useful tool for scRNA-seq data analysis.


2004 ◽  
Vol 25 (10) ◽  
pp. 1123-1132 ◽  
Author(s):  
Xizhao Wang ◽  
Yadong Wang ◽  
Lijuan Wang

2021 ◽  
Vol 5 (4) ◽  
pp. 415
Author(s):  
Yessica Nataliani

One of the best-known clustering methods is the fuzzy c-means clustering algorithm, besides k-means and hierarchical clustering. Since FCM treats all data features as equally important, it may obtain a poor clustering result. To solve the problem, feature selection with feature weighting is needed. Besides feature selection by assigning feature weights, there is also feature selection by assigning feature weights and eliminating the unrelated feature(s). THE Feature-reduction FCM (FRFCM) clustering algorithm can improve the FCM clustering result by weighting the features and discarding the unrelated feature(s) during the clustering process. Basketball is one of the famous sports, both international and national. There are five players in basketball, each with a different position. A player can generally be in guard, forward, or center position. Those three general positions need different characteristics of players’ physical conditions. In this paper, FRFCM is used to select the related physical feature(s) for basketball players, consisting of height, weight, age, and body mass index. to determine the basketball players’ position. The result shows that FRFCM can be applied to determine the basketball players’ position, where the most related physical feature is the player’s height. FRFCM gets one incorrect player’s position, so the error rate is 0.0435. As a comparison, FCM gets five incorrect player’s positions, with an error rate of 0.2174. This method can help the coach decide the basketball new player’s position.


Author(s):  
Yufika Sari Bagi ◽  
Suprapto Suprapto

Retrieval is one of the stages in case-based reasoning system which find a solution to new problem or case by measuring the similarity between the new case and old cases in the case base. Some of the similarity measurement techniques are involving feature weights that show the importance of the feature in a case. Feature weights can be obtained from a domain expert or by using a feature weighting method either locally or globally. Gradient descent is the feature weighting method which computes global weights for each feature. This research implemented gradient descent to obtain feature weights in case-based reasoning for hepatitis diagnosis and the similarity measurement using weighted Euclidean distance. There are four variations number of case base and test data that used in this research, those are: the first variation using 50% of data as case base and 50% as test data second variation using 60% of data as case base and 40% as test data, third variation using 70% of data as case base and 30% as test data and fourth variation using 80% of data as case base and 20% as test data. For each variation, using 4 kinds of scenario to mark the test data those are in first scenario the test data mark at the end of data, in second scenario the test data mark at the begin of data, in third scenario the test data mark half at the begin and half at the end of data and in the fourth scenario the test data mark in the middle of data. The result of this research showed that the accuracy of the system reaches 100% at scenario 1 in variation 4. Overall of all four variations and four kinds of scenario, the average accuracy of the system was 77.55%, average recall of system was 69.74%, and the average of precision was 78.39%. In addition, the level of accuracy was also influenced by the number of case base and the scenario of case selection for the case base. This is because more cases in the case base, the chances of a system to finding similar cases will be more.


2020 ◽  
Vol 8 (1) ◽  
pp. 84-90
Author(s):  
R. Lalchhanhima ◽  
◽  
Debdatta Kandar ◽  
R. Chawngsangpuii ◽  
Vanlalmuansangi Khenglawt ◽  
...  

Fuzzy C-Means is an unsupervised clustering algorithm for the automatic clustering of data. Synthetic Aperture Radar Image Segmentation has been a challenging task because of the presence of speckle noise. Therefore the segmentation process can not directly rely on the intensity information alone but must consider several derived features in order to get satisfactory segmentation results. In this paper, it is attempted to use the fuzzy nature of classification for the purpose of unsupervised region segmentation in which FCM is employed. Different features are obtained by filtering of the image by using different spatial filters and are selected for segmentation criteria. The segmentation performance is determined by the accuracy compared with a different state of the art techniques proposed recently.


2021 ◽  
Vol 11 (4) ◽  
pp. 1728
Author(s):  
Hua Zhong ◽  
Li Xu

The prediction interval (PI) is an important research topic in reliability analyses and decision support systems. Data size and computation costs are two of the issues which may hamper the construction of PIs. This paper proposes an all-batch (AB) loss function for constructing high quality PIs. Taking the full advantage of the likelihood principle, the proposed loss makes it possible to train PI generation models using the gradient descent (GD) method for both small and large batches of samples. With the structure of dual feedforward neural networks (FNNs), a high-quality PI generation framework is introduced, which can be adapted to a variety of problems including regression analysis. Numerical experiments were conducted on the benchmark datasets; the results show that higher-quality PIs were achieved using the proposed scheme. Its reliability and stability were also verified in comparison with various state-of-the-art PI construction methods.


Sign in / Sign up

Export Citation Format

Share Document