Adaptive Fuzzy Clustering for improving classification performance in yeast data set

Author(s):  
Man Sun Kim ◽  
Hyung Jeong Yang ◽  
Wooi Ping Cheah
Author(s):  
Türkan Erbay Dalkiliç ◽  
Seda Sağirkaya

In regression analysis, the data have different distributions which requires to go beyond the classical analysis during the prediction process. In such cases, the analysis method based on fuzzy logic is preferred as alternative methods. There are couple important steps in the regression analysis based on fuzzy logic. One of them is identification of the clusters that generate the data set, the other is the degree of memberships that are determined the grades of the contributions of the data contained in these clusters. In this study, parameter prediction based on type-2 fuzzy clustering is discussed. Firstly, type-1 fuzzy clustering problem was solved by the fuzzy c-means (FCM) method when the fuzzifier index is equal to two. Then the fuzzifier index m is defined as interval number. The membership degrees to the sets are determined by type-2 fuzzy clustering method. Membership degree obtained as a result of clustering based on type-1 and type-2 fuzzy logic are used as weight and parameter prediction using these membership degrees that determined by the proposed algorithm. Finally, the prediction result of the type-1 and type-2 fuzzy clustering parameter is compared with the error criterion based on the difference between observed values and the predicted values.


MATEMATIKA ◽  
2020 ◽  
Vol 36 (1) ◽  
pp. 43-49
Author(s):  
T Dwi Ary Widhianingsih ◽  
Heri Kuswanto ◽  
Dedy Dwi Prastyo

Logistic regression is one of the commonly used classification methods. It has some advantages, specifically related to hypothesis testing and its objective function. However, it also has some disadvantages in the case of high-dimensional data, such as multicolinearity, over-fitting, and a high computational burden. Ensemblebased classification methods have been proposed to overcome these problems. The logistic regression ensemble (LORENS) method is expected to improve the classification performance of basic logistic regression. In this paper, we apply it to the case of drug discovery with the objective of obtaining candidate compounds to protect the normal non-cancerous cells, which is considered to be a problem with a data-set of high dimensionality. The experimental results show that it performs well, with an accuracy of 69% and AUC of 0.7306.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Abbas Akkasi ◽  
Ekrem Varoğlu ◽  
Nazife Dimililer

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.


2013 ◽  
Vol 760-762 ◽  
pp. 2220-2223
Author(s):  
Lang Guo

In view of the defects of K-means algorithm in intrusion detection: the need of preassign cluster number and sensitive initial center and easy to fall into local optimum, this paper puts forward a fuzzy clustering algorithm. The fuzzy rules are utilized to express the invasion features, and standardized matrix is adopted to further process so as to reflect the approximation degree or correlation degree between the invasion indicator data and establish a similarity matrix. The simulation results of KDD CUP1999 data set show that the algorithm has better intrusion detection effect and can effectively detect the network intrusion data.


Author(s):  
Mashhour H. Baeshen ◽  
Malcolm J. Beynon ◽  
Kate L. Daunt

This chapter presents a study of the development of the clustering methodology to data analysis, with particular attention to the analysis from a crisp environment to a fuzzy environment. An applied problem concerning service quality (using SERVQUAL) of mobile phone users, and subsequent loyalty and satisfaction forms the data set to demonstrate the clustering issue. Following details on both the crisp k-means and fuzzy c-means clustering techniques, comparable results from their analysis are shown, on a subset of data, to enable both graphical and statistical elucidation. Fuzzy c-means is then employed on the full SERVQUAL dimensions, and the established results interpreted before tested on external variables, namely the level of loyalty and satisfaction across the different clusters established.


Kybernetes ◽  
2019 ◽  
Vol 48 (9) ◽  
pp. 2006-2029
Author(s):  
Hongshan Xiao ◽  
Yu Wang

Purpose Feature space heterogeneity exists widely in various application fields of classification techniques, such as customs inspection decision, credit scoring and medical diagnosis. This paper aims to study the relationship between feature space heterogeneity and classification performance. Design/methodology/approach A measurement is first developed for measuring and identifying any significant heterogeneity that exists in the feature space of a data set. The main idea of this measurement is derived from a meta-analysis. For the data set with significant feature space heterogeneity, a classification algorithm based on factor analysis and clustering is proposed to learn the data patterns, which, in turn, are used for data classification. Findings The proposed approach has two main advantages over the previous methods. The first advantage lies in feature transform using orthogonal factor analysis, which results in new features without redundancy and irrelevance. The second advantage rests on samples partitioning to capture the feature space heterogeneity reflected by differences of factor scores. The validity and effectiveness of the proposed approach is verified on a number of benchmarking data sets. Research limitations/implications Measurement should be used to guide the heterogeneity elimination process, which is an interesting topic in future research. In addition, to develop a classification algorithm that enables scalable and incremental learning for large data sets with significant feature space heterogeneity is also an important issue. Practical implications Measuring and eliminating the feature space heterogeneity possibly existing in the data are important for accurate classification. This study provides a systematical approach to feature space heterogeneity measurement and elimination for better classification performance, which is favorable for applications of classification techniques in real-word problems. Originality/value A measurement based on meta-analysis for measuring and identifying any significant feature space heterogeneity in a classification problem is developed, and an ensemble classification framework is proposed to deal with the feature space heterogeneity and improve the classification accuracy.


2019 ◽  
Vol 27 (9) ◽  
pp. 1779-1792 ◽  
Author(s):  
Kaijie Xu ◽  
Witold Pedrycz ◽  
Zhiwu Li ◽  
Weike Nie

Sign in / Sign up

Export Citation Format

Share Document