scholarly journals Enhanced K-Means Clustering Algorithm Using Collaborative Filtering Approach

2017 ◽  
Vol 10 (2) ◽  
pp. 474-479
Author(s):  
Ankush Saklecha ◽  
Jagdish Raikwal

Clustering is well-known unsupervised learning method. In clustering a set of essentials is separated into uniform groups.K-means is one of the most popular partition based clustering algorithms in the area of research. But in the original K-means the quality of the resulting clusters mostly depends on the selection of initial centroids, so number of iterations is increase and take more time because of that it is computationally expensive. There are so many methods have been proposed for improving accuracy, performance and efficiency of the k-means clustering algorithm. This paper proposed enhanced K-Means Clustering approach in addition to Collaborative filtering approach to recommend quality content to its users. This research would help those users who have to scroll through pages of results to find important content.

Author(s):  
R. R. Gharieb ◽  
G. Gendy ◽  
H. Selim

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.


2011 ◽  
pp. 24-32 ◽  
Author(s):  
Nicoleta Rogovschi ◽  
Mustapha Lebbah ◽  
Younès Bennani

Most traditional clustering algorithms are limited to handle data sets that contain either continuous or categorical variables. However data sets with mixed types of variables are commonly used in data mining field. In this paper we introduce a weighted self-organizing map for clustering, analysis and visualization mixed data (continuous/binary). The learning of weights and prototypes is done in a simultaneous manner assuring an optimized data clustering. More variables has a high weight, more the clustering algorithm will take into account the informations transmitted by these variables. The learning of these topological maps is combined with a weighting process of different variables by computing weights which influence the quality of clustering. We illustrate the power of this method with data sets taken from a public data set repository: a handwritten digit data set, Zoo data set and other three mixed data sets. The results show a good quality of the topological ordering and homogenous clustering.


2021 ◽  
Vol 27 (7) ◽  
pp. 667-692
Author(s):  
Lamia Berkani ◽  
Lylia Betit ◽  
Louiza Belarif

Clustering-based approaches have been demonstrated to be efficient and scalable to large-scale data sets. However, clustering-based recommender systems suffer from relatively low accuracy and coverage. To address these issues, we propose in this article an optimized multiview clustering approach for the recommendation of items in social networks. First, the selection of the initial medoids is optimized using the Bees Swarm optimization algorithm (BSO) in order to generate better partitions (i.e. refining the quality of medoids according to the objective function). Then, the multiview clustering (MV) is applied, where users are iteratively clustered from the views of both rating patterns and social information (i.e. friendships and trust). Finally, a framework is proposed for testing the different alternatives, namely: (1) the standard recommendation algorithms; (2) the clustering-based and the optimized clustering-based recommendation algorithms using BSO; and (3) the MV and the optimized MV (BSO-MV) algorithms. Experimental results conducted on two real-world datasets demonstrate the effectiveness of the proposed BSO-MV algorithm in terms of improving accuracy, as it outperforms the existing related approaches and baselines.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Lopamudra Dey ◽  
Sanjay Chakraborty

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.


Author(s):  
Song Qin ◽  
Nenad Mijatovic ◽  
Jeffrey Fries ◽  
James Kiss

Designed for detecting train presence on tracks, track circuits must maintain a level of high availability for railway signaling systems. Due to the fail-safe nature of these critical devices, any failures will result in a declaration of occupancy in a section of track which restricts train movements. It is possible to automatically diagnose and, in some cases, predict the failures of track circuits by performing analytics on the track signals. In order to perform these analytics, we need to study the coded signals transmitted to and received from the track. However, these signals consist of heterogeneous pulses that are noisy for data analysis. Thus, we need techniques which will automatically group homogeneous pulses into similar groups. In this paper, we present data cleansing techniques which will cluster pulses based on digital analysis and machine learning. We report the results of our evaluation of clustering algorithms that improve the quality of analytic data. The data were captured under revenue service conditions operated by Alstom. For clustering algorithm, we used the k-means algorithm to cluster heterogeneous pulses. By tailoring the parameters for this algorithm, we can control the pulses of the cluster, allowing for further analysis of the track circuit signals in order to gain insight regarding its performance.


2016 ◽  
Vol 10 (04) ◽  
pp. 527-555
Author(s):  
Lubomir Stanchev

In this article, we examine an algorithm for document clustering using a similarity graph. The graph stores words and common phrases from the English language as nodes and it can be used to compute the degree of semantic similarity between any two phrases. One application of the similarity graph is semantic document clustering, that is, grouping documents based on the meaning of the words in them. Since our algorithm for semantic document clustering relies on multiple parameters, we examine how fine-tuning these values affects the quality of the result. Specifically, we use the Reuters-21578 benchmark, which contains [Formula: see text] newswire stories that are grouped in 82 categories using human judgment. We apply the k-means clustering algorithm to group the documents using a similarity metric that is based on keywords matching and one that uses the similarity graph. We evaluate the results of the clustering algorithms using multiple metrics, such as precision, recall, f-score, entropy, and purity.


VLSI Design ◽  
2012 ◽  
Vol 2012 ◽  
pp. 1-14
Author(s):  
L. Rakai ◽  
A. Farshidi ◽  
L. Behjat ◽  
D. Westwick

Clustering algorithms have been used to improve the speed and quality of placement. Traditionally, clustering focuses on the local connections between cells. In this paper, a new clustering algorithm that is based on the estimated lengths of circuit interconnects and the connectivity is proposed. In the proposed algorithm, first an a priori length estimation technique is used to estimate the lengths of nets. Then, the estimated lengths are used in a clustering framework to modify a clustering technique based on algebraic multigrid (AMG), that finds the cells with the highest connectivity. Finally, based on the results from the AMG-based process, clusters are made. In addition, a new physical unclustering technique is proposed. The results show a significant improvement, reductions of up to 40%, in wire length can be achieved when using the proposed technique with three academic placers on industry-based circuits. Moreover, the runtime is not significantly degraded and can even be improved.


2018 ◽  
Vol 13 (5) ◽  
pp. 759-771 ◽  
Author(s):  
Guangchun Chen ◽  
Juan Hu ◽  
Hong Peng ◽  
Jun Wang ◽  
Xiangnian Huang

Using spectral clustering algorithm is diffcult to find the clusters in the cases that dataset has a large difference in density and its clustering effect depends on the selection of initial centers. To overcome the shortcomings, we propose a novel spectral clustering algorithm based on membrane computing framework, called MSC algorithm, whose idea is to use membrane clustering algorithm to realize the clustering component in spectral clustering. A tissue-like P system is used as its computing framework, where each object in cells denotes a set of cluster centers and velocity-location model is used as the evolution rules. Under the control of evolutioncommunication mechanism, the tissue-like P system can obtain a good clustering partition for each dataset. The proposed spectral clustering algorithm is evaluated on three artiffcial datasets and ten UCI datasets, and it is further compared with classical spectral clustering algorithms. The comparison results demonstrate the advantage of the proposed spectral clustering algorithm.


2019 ◽  
Vol 8 (4) ◽  
pp. 6036-6040

Data Mining is the foremost vital space of analysis and is pragmatically utilized in totally different domains, It becomes a highly demanding field because huge amounts of data have been collected in various applications. The database can be clustered in more number of ways depending on the clustering algorithm used, parameter settings and other factors. Multiple clustering algorithms can be combined to get the final partitioning of data which provides better clustering results. In this paper, Ensemble hybrid KMeans and DBSCAN (HDKA) algorithm has been proposed to overcome the drawbacks of DBSCAN and KMeans clustering algorithms. The performance of the proposed algorithm improves the selection of centroid points through the centroid selection strategy.For experimental results we have used two dataset Colon and Leukemia from UCI machine learning repository.


Author(s):  
Elmustafa Sayed Ali Ahmed ◽  
Zahraa Tagelsir Mohammed ◽  
Mona Bakri Hassan ◽  
Rashid A. Saeed

Internet of vehicles (IoV) has recently become an emerging promising field of research due to the increasing number of vehicles each day. It is a part of the internet of things (IoT) which deals with vehicle communications. As vehicular nodes are considered always in motion, they cause frequent changes in the network topology. These changes cause issues in IoV such as scalability, dynamic topology changes, and shortest path for routing. In this chapter, the authors will discuss different optimization algorithms (i.e., clustering algorithms, ant colony optimization, best interface selection [BIS] algorithm, mobility adaptive density connected clustering algorithm, meta-heuristics algorithms, and quality of service [QoS]-based optimization). These algorithms provide an important intelligent role to optimize the operation of IoV networks and promise to develop new intelligent IoV applications.


Sign in / Sign up

Export Citation Format

Share Document