A Novel Consensus Fuzzy K-Modes Clustering Using Coupling DNA-Chain-Hypergraph P System for Categorical Data

In this paper, a data clustering method named consensus fuzzy k-modes clustering is proposed to improve the performance of the clustering for the categorical data. At the same time, the coupling DNA-chain-hypergraph P system is constructed to realize the process of the clustering. This P system can prevent the clustering algorithm falling into the local optimum and realize the clustering process in implicit parallelism. The consensus fuzzy k-modes algorithm can combine the advantages of the fuzzy k-modes algorithm, weight fuzzy k-modes algorithm and genetic fuzzy k-modes algorithm. The fuzzy k-modes algorithm can realize the soft partition which is closer to reality, but treats all the variables equally. The weight fuzzy k-modes algorithm introduced the weight vector which strengthens the basic k-modes clustering by associating higher weights with features useful in analysis. These two methods are only improvements the k-modes algorithm itself. So, the genetic k-modes algorithm is proposed which used the genetic operations in the clustering process. In this paper, we examine these three kinds of k-modes algorithms and further introduce DNA genetic optimization operations in the final consensus process. Finally, we conduct experiments on the seven UCI datasets and compare the clustering results with another four categorical clustering algorithms. The experiment results and statistical test results show that our method can get better clustering results than the compared clustering algorithms, respectively.

Download Full-text

Improved minimum-minimum roughness algorithm for clustering categorical data

International Journal of ADVANCED AND APPLIED SCIENCES ◽

10.21833/ijaas.2021.10.006 ◽

2021 ◽

Vol 8 (10) ◽

pp. 43-50

Author(s):

Truong et al. ◽

Keyword(s):

Machine Learning ◽

Data Mining ◽

Hierarchical Clustering ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Top Down ◽

Hierarchical Clustering Algorithm

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.

Download Full-text

A Modified Overlapping Partitioning Clustering Algorithm for Categorical Data Clustering

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v7i1.896 ◽

2018 ◽

Vol 7 (1) ◽

pp. 55-62

Author(s):

Mohammad Alaqtash ◽

Moayad A.Fadhil ◽

Ali F. Al-Azzawi

Keyword(s):

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Numerical Data ◽

Data Representation ◽

The Past ◽

Textual Data ◽

Traditional Algorithm ◽

Clustering Problems ◽

Categorical Data Clustering

Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data by partitioning data into clusters with similar patterns. Over the past decades, many clustering algorithms have been developed for various clustering problems. An overlapping partitioning clustering (OPC) algorithm can only handle numerical data. Hence, novel clustering algorithms have been studied extensively to overcome this issue. By increasing the number of objects belonging to one cluster and distance between cluster centers, the study aimed to cluster the textual data type without losing the main functions. The proposed study herein included over twenty newsgroup dataset, which consisted of approximately 20000 textual documents. By introducing some modifications to the traditional algorithm, an acceptable level of homogeneity and completeness of clusters were generated. Modifications were performed on the pre-processing phase and data representation, along with the number methods which influence the primary function of the algorithm. Subsequently, the results were evaluated and compared with the k-means algorithm of the training and test datasets. The results indicated that the modified algorithm could successfully handle the categorical data and produce satisfactory clusters.

Download Full-text

A Spectral Clustering Algorithm Improved by P Systems

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2018.5.3238 ◽

2018 ◽

Vol 13 (5) ◽

pp. 759-771 ◽

Cited By ~ 1

Author(s):

Guangchun Chen ◽

Juan Hu ◽

Hong Peng ◽

Jun Wang ◽

Xiangnian Huang

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Membrane Computing ◽

Clustering Algorithms ◽

P System ◽

Comparison Results ◽

Clustering Effect ◽

Computing Framework ◽

Spectral Clustering Algorithm ◽

Selection Of

Using spectral clustering algorithm is diffcult to find the clusters in the cases that dataset has a large difference in density and its clustering effect depends on the selection of initial centers. To overcome the shortcomings, we propose a novel spectral clustering algorithm based on membrane computing framework, called MSC algorithm, whose idea is to use membrane clustering algorithm to realize the clustering component in spectral clustering. A tissue-like P system is used as its computing framework, where each object in cells denotes a set of cluster centers and velocity-location model is used as the evolution rules. Under the control of evolutioncommunication mechanism, the tissue-like P system can obtain a good clustering partition for each dataset. The proposed spectral clustering algorithm is evaluated on three artiffcial datasets and ten UCI datasets, and it is further compared with classical spectral clustering algorithms. The comparison results demonstrate the advantage of the proposed spectral clustering algorithm.

Download Full-text

P System–Based Clustering Methods Using NoSQL Databases

Computation ◽

10.3390/computation9100102 ◽

2021 ◽

Vol 9 (10) ◽

pp. 102

Author(s):

Péter Lehotay-Kéry ◽

Tamás Tarczali ◽

Attila Kiss

Keyword(s):

Management System ◽

Database Management ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Database Systems ◽

Database Management System ◽

P Systems ◽

Main Element ◽

P System ◽

Clustering Methods

Models of computation are fundamental notions in computer science; consequently, they have been the subject of countless research papers, with numerous novel models proposed even in recent years. Amongst a multitude of different approaches, many of these methods draw inspiration from the biological processes observed in nature. P systems, or membrane systems, make an analogy between the communication in computing and the flow of information that can be perceived in living organisms. These systems serve as a basis for various concepts, ranging from the fields of computational economics and robotics to the techniques of data clustering. In this paper, such utilization of these systems—membrane system–based clustering—is taken into focus. Considering the growing number of data stored worldwide, more and more data have to be handled by clustering algorithms too. To solve this issue, bringing these methods closer to the data, their main element provides several benefits. Database systems equip their users with, for instance, well-integrated security features and more direct control over the data itself. Our goal is if the type of the database management system is given, e.g., NoSQL, but the corporation or the research team can choose which specific database management system is used, then we give a perspective, how the algorithms written like this behave in such an environment, so that, based on this, a more substantiated decision can be made, meaning which database management system should be connected to the system. For this purpose, we discover the possibilities of a clustering algorithm based on P systems when used alongside NoSQL database systems, that are designed to manage big data. Variants over two competing databases, MongoDB and Redis, are evaluated and compared to identify the advantages and limitations of using such a solution in these systems.

Download Full-text

Assessment Means Management Software

Vestnik NSU Series Information Technologies ◽

10.25205/1818-7900-2020-18-2-5-14 ◽

2020 ◽

Vol 18 (2) ◽

pp. 5-14

Author(s):

Kirill V. Batalin ◽

Gulnara E. Yakhyaeva

Keyword(s):

Categorical Data ◽

Clustering Algorithm ◽

Educational Process ◽

Assessment Tools ◽

Program System ◽

Numerical Attributes ◽

Automatic Mode ◽

Categorical Clustering ◽

The One ◽

Small Clusters

The program system of the management of assessment tools for educational process described in the article. The system reduces the time that the teacher spends on the preparation of test materials, allowing you to create a set of assessment documents in a semi-automatic mode. The system implements the ability to create and manage task banks. Each task in the bank has a set of attributes that are involved in the process of generating assessment documents. The system also implements the ability to automatically generate the required number of variants of the assessment documents. At the same time, the algorithm for generating a set of assessment documents works in such a way that, on the one hand, one set includes the most similar variants of assessment documents in structure, and on the other hand, each assessment document in the set is unique. The algorithm for generating a set of assessment documents is based on the clustering algorithm for categorical data. In this paper a modification of the CLOPE algorithm was submitted, which allows you to automatically determine the required number of clusters, depending on the input data. Also, this modification solves the problem of small clusters and the problem of categorical clustering of numerical attributes. The paper also describes an iterative algorithm for the generation of a set of assessment documents.

Download Full-text

Clustering Categorical Data with k-Modes

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch040 ◽

2011 ◽

pp. 246-250 ◽

Cited By ~ 2

Author(s):

Joshua Zhexue Huang

Keyword(s):

Real World ◽

Categorical Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Chemical Information ◽

New Techniques ◽

Small Set ◽

Categorical Attributes ◽

Categorical Attribute ◽

Numeric Data

A lot of data in real world databases are categorical. For example, gender, profession, position, and hobby of customers are usually defined as categorical attributes in the CUSTOMER table. Each categorical attribute is represented with a small set of unique categorical values such as {Female, Male} for the gender attribute. Unlike numeric data, categorical values are discrete and unordered. Therefore, the clustering algorithms for numeric data cannot be used to cluster categorical data that exists in many real world applications. In data mining research, much effort has been put on development of new techniques for clustering categorical data (Huang, 1997b; Huang, 1998; Gibson, Kleinberg, & Raghavan, 1998; Ganti, Gehrke, & Ramakrishnan, 1999; Guha, Rastogi, & Shim, 1999; Chaturvedi, Green, Carroll, & Foods, 2001; Barbara, Li, & Couto, 2002; Andritsos, Tsaparas, Miller, & Sevcik, 2003; Li, Ma, & Ogihara, 2004; Chen, & Liu, 2005; Parmar, Wu, & Blackhurst, 2007). The k-modes clustering algorithm (Huang, 1997b; Huang, 1998) is one of the first algorithms for clustering large categorical data. In the past decade, this algorithm has been well studied and widely used in various applications. It is also adopted in commercial software (e.g., Daylight Chemical Information Systems, Inc, http://www. daylight.com/).

Download Full-text

A P system for hierarchical clustering

International Journal of Modern Physics C ◽

10.1142/s0129183119500621 ◽

2019 ◽

Vol 30 (08) ◽

pp. 1950062

Author(s):

Ping Guo ◽

Wenjie Jiang ◽

Yuchi Liu

Keyword(s):

Parallel Computation ◽

Hierarchical Clustering ◽

Clustering Algorithm ◽

Membrane Computing ◽

Clustering Algorithms ◽

P System ◽

A Cell ◽

Hierarchical Clustering Algorithm

Membrane computing, also known as P system, is a distributed and parallel computation framework models. Hierarchical clustering is one of the most basic and widely applied clustering algorithms among all clustering algorithms. In this paper, the combination of membrane computing and hierarchical clustering algorithm is studied. A cell-like hierarchical clustering P system with priority evolution rules and promoters is designed by using the maximum parallelism of membrane computing. The feasibility and effectiveness of the designed P system are verified by the examples.

Download Full-text

Intelligent energy optimization for advanced IoT analytics edge computing on wireless sensor networks

International Journal of Distributed Sensor Networks ◽

10.1177/1550147720908772 ◽

2020 ◽

Vol 16 (7) ◽

pp. 155014772090877

Author(s):

Israel Edem Agbehadji ◽

Samuel Ofori Frimpong ◽

Richard C Millham ◽

Simon James Fong ◽

Jason J Jung

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Sensor Network ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Energy Optimization ◽

Base Station ◽

Wireless Sensor ◽

Optimal Time ◽

Local Optimum

The current dispensation of big data analytics requires innovative ways of data capturing and transmission. One of the innovative approaches is the use of a sensor device. However, the challenge with a sensor network is how to balance the energy load of wireless sensor networks, which can be achieved by selecting sensor nodes with an adequate amount of energy from a cluster. The clustering technique is one of the approaches to solve this challenge because it optimizes energy in order to increase the lifetime of the sensor network. In this article, a novel bio-inspired clustering algorithm was proposed for a heterogeneous energy environment. The proposed algorithm (referred to as DEEC-KSA) was integrated with a distributed energy-efficient clustering algorithm to ensure efficient energy optimization and was evaluated through simulation and compared with benchmarked clustering algorithms. During the simulation, the dynamic nature of the proposed DEEC-KSA was observed using different parameters, which were expressed in percentages as 0.1%, 4.5%, 11.3%, and 34% while the percentage of the parameter for comparative algorithms was 10%. The simulation result showed that the performance of DEEC-KSA is efficient among the comparative clustering algorithms for energy optimization in terms of stability period, network lifetime, and network throughput. In addition, the proposed DEEC-KSA has the optimal time (in seconds) to send a higher number of packets to the base station successfully. The advantage of the proposed bio-inspired technique is that it utilizes random encircling and half-life period to quickly adapt to different rounds of iteration and jumps out of any local optimum that might not lead to an ideal cluster formation and better network performance.

Download Full-text

Hymenopteran Colony Stream Clustering Algorithm and Comparison with Particle Swarm Optimization and Genetic Optimization Clustering

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2021.9402 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1336-1341

Author(s):

Nikhil Parafe ◽

M. Venkatesan ◽

Prabhavathy Panner

Keyword(s):

Particle Swarm Optimization ◽

Clustering Algorithm ◽

Applied Mathematics ◽

Clustering Algorithms ◽

Particle Swarm ◽

Random Access ◽

Genetic Optimization ◽

Data Set ◽

Swarm Optimization ◽

Stream Clustering

Stream is endlessly inbound sequence of information, streamed information is unbounded and every information are often examined one time. Streamed information are often noisy and therefore the variety of clusters within the information and their applied mathematics properties will change over time, wherever random access to the information isn’t possible and storing all the arriving information is impractical. When applying data set processing techniques and specifically stream clustering Algorithms to real time information streams, limitation in execution time and memory have to be oblige to be thought-about carefully. The projected hymenopteran colony stream clustering Algorithmic is a clustering Algorithm which forms cluster according to density variation, in which clusters are separated by high density features from low density feature region with mounted movement of hymenopteran. Result shows that it created denser cluster than antecedently projected Algorithmic program. And with mounted movement of ants conjointly it decreases the loss of data points. And conjointly the changed radius formula of cluster is projected so as to increase performance of model to create it a lot of dynamic with continuous flow of information. And also we changed probability formula for pick up and drop to reduce oulier. Results from hymenopteran experiments conjointly showed that sorting is disbursed in 2 phases, a primary clustering episode followed by a spacing part. In this paper, we have also compared proposed Algorithm with particle swarm optimization and genetic optimization using DBSCAN and k -means clustering.

Download Full-text

A Novel Clustering Algorithm Inspired by Membrane Computing

The Scientific World JOURNAL ◽

10.1155/2015/929471 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 6

Author(s):

Hong Peng ◽

Xiaohui Luo ◽

Zhisheng Gao ◽

Jun Wang ◽

Zheng Pei

Keyword(s):

Parallel Computing ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

P System ◽

Data Sets ◽

Evolutionary Clustering ◽

Distributed Parallel Computing ◽

Real Life Data ◽

Neighborhood Topology

P systems are a class of distributed parallel computing models; this paper presents a novel clustering algorithm, which is inspired from mechanism of a tissue-like P system with a loop structure of cells, called membrane clustering algorithm. The objects of the cells express the candidate centers of clusters and are evolved by the evolution rules. Based on the loop membrane structure, the communication rules realize a local neighborhood topology, which helps the coevolution of the objects and improves the diversity of objects in the system. The tissue-like P system can effectively search for the optimal partitioning with the help of its parallel computing advantage. The proposed clustering algorithm is evaluated on four artificial data sets and six real-life data sets. Experimental results show that the proposed clustering algorithm is superior or competitive tok-means algorithm and several evolutionary clustering algorithms recently reported in the literature.

Download Full-text