scholarly journals An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining

Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 553
Author(s):  
Salim Miloudi ◽  
Yulin Wang ◽  
Wenjia Ding

Clustering algorithms for multi-database mining (MDM) rely on computing (n2−n)/2 pairwise similarities between n multiple databases to generate and evaluate m∈[1,(n2−n)/2] candidate clusterings in order to select the ideal partitioning that optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the n databases in one cluster or by returning n singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness of the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms, which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in fewer upper-bounded iterations. To achieve our goal, we use coordinate descent (CD) and back-propagation to search for the optimal clustering of the n multiple database in a way that minimizes a convex clustering quality measure L(θ) in less than (n2−n)/2 iterations. By using a max-heap data structure within our CD algorithm, we optimally choose the largest weight variable θp,q(i) at each iteration i such that taking the partial derivative of L(θ) with respect to θp,q(i) allows us to attain the next steepest descent minimizing L(θ) without using a learning rate. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM.

Author(s):  
Salim Miloudi ◽  
Yulin Wang ◽  
Wenjia Ding

Clustering algorithms for multi-database mining (MDM) rely on computing $(n^2-n)/2$ pairwise similarities between $n$ multiple databases to generate and evaluate $m\in[1, (n^2-n)/2]$ candidate clusterings in order to select the ideal partitioning which optimizes a predefined goodness measure. However, when these pairwise similarities are distributed around the mean value, the clustering algorithm becomes indecisive when choosing what database pairs are considered eligible to be grouped together. Consequently, a trivial result is produced by putting all the $n$ databases in one cluster or by returning $n$ singleton clusters. To tackle the latter problem, we propose a learning algorithm to reduce the fuzziness in the similarity matrix by minimizing a weighted binary entropy loss function via gradient descent and back-propagation. As a result, the learned model will improve the certainty of the clustering algorithm by correctly identifying the optimal database clusters. Additionally, in contrast to gradient-based clustering algorithms which are sensitive to the choice of the learning rate and require more iterations to converge, we propose a learning-rate-free algorithm to assess the candidate clusterings generated on the fly in a fewer upper-bounded iterations. Through a series of experiments on multiple database samples, we show that our algorithm outperforms the existing clustering algorithms for MDM.


2015 ◽  
Vol 13 (1-2) ◽  
pp. 10-24
Author(s):  
Ieroham Baruch ◽  
Edmundo P. Reynaud

Abstract In this work, a Recursive Levenberg-Marquardt learning algorithm in the complex domain is developed and applied in the training of two adaptive control schemes composed by Complex-Valued Recurrent Neural Networks. Furthermore, we apply the identification and both control schemes for a particular case of nonlinear, oscillatory mechanical plant to validate the performance of the adaptive neural controller and the learning algorithm. The comparative simulation results show the better performance of the newly proposed Complex-Valued Recursive Levenberg-Marquardt learning algorithm over the gradient-based recursive Back-propagation one.


2011 ◽  
Vol 48-49 ◽  
pp. 753-756
Author(s):  
Xin Quan Chen

Facing to the shortcoming of Affinity Propagation algorithm (AP), we present two expanded and improved AP algorithms. In the two algorithms, the AP algorithm based on Grid Cell (APGC) is an effective extension of AP algorithm on the level of grid cells, and the AP clustering algorithm based on Near neighbour Sampling (APNS) is trying to make some improving in time and space complexity. From some simulated comparison experiments of three algorithms, we know that APGC and APNS algorithms have evident improving than AP algorithm in time and space complexity. They can not only get a good clustering quality for massive data sets, but also filtrate noises and isolates well. So we can say they are two effective clustering algorithms with much applied prospect. At last, several research directions are presented.


2020 ◽  
pp. 1-11
Author(s):  
Yufeng Li ◽  
HaiTian Jiang ◽  
Jiyong Lu ◽  
Xiaozhong Li ◽  
Zhiwei Sun ◽  
...  

Many classical clustering algorithms have been fitted into MapReduce, which provides a novel solution for clustering big data. However, several iterations are required to reach an acceptable result in most of the algorithms. For each iteration, a new MapReduce job must be executed to load the dataset into main memory, which results in high I/O overhead and poor efficiency. BIRCH algorithm stores only the statistical information of objects with CF entries and CF tree to cluster big data, but with the increase of the tree nodes, the main memory will be insufficient to contain more objects. Hence, BIRCH has to reduce the tree, which will degrade the clustering quality and decelerate the whole execution efficiency. To deal with the problem, BIRCH was fitted into MapReduce called MR-BIRCH in this paper. In contrast to a great number of MapReduce-based algorithms, MR-BIRCH loads dataset only once, and the dataset is processed parallel in several machines. The complexity and scalability were analyzed to evaluate the quality of MR-BIRCH, and MR-BIRCH was compared with Python sklearn BIRCH and Apache Mahout k-means on real-world and synthetic datasets. Experimental results show, most of the time, MR-BIRCH was better or equal to sklearn BIRCH, and it was competitive to Mahout k-means.


2014 ◽  
Vol 501-504 ◽  
pp. 391-394
Author(s):  
Yi Ming Xiang ◽  
Xue Yan Liu ◽  
Gui Xiang Ling ◽  
Bin Du

An adaptive neuro-fuzzy inference system (ANFIS) model has been developed to predict frost heaving in seasonal frozen regions. The structure of ANFIS is initialized by the subtractive clustering algorithm. The hybrid learning algorithm consisting of back-propagation and least-squares estimation is used to adjust parameters of ANFIS and automatically produce fuzzy rules. The data of frost heaving test obtained from a literature are used to train and check the system. The predicted results show that the proposed model outperforms the back propagation neural network (BPNN) in terms of computational speed, forecast errors, and efficiency. The ANFIS based model proves to be an effective approach to achieve both high accuracy and less computational complexity for predicting frost heaving.


2016 ◽  
Vol 43 (2) ◽  
pp. 275-292 ◽  
Author(s):  
Aytug Onan ◽  
Hasan Bulut ◽  
Serdar Korukoglu

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.


2013 ◽  
Vol 765-767 ◽  
pp. 580-584
Author(s):  
Yu Yang ◽  
Cheng Gui Zhao

Spectral clustering algorithms inevitable exist computational time and memory use problems for large-scale spectral clustering, owing to compute-intensive and data-intensive. We analyse the time complexity of constructing similarity matrix, doing eigendecomposition and performing k-means and exploiting SPMD parallel structure supported by MATLAB Parallel Computing Toolbox (PCT) to decrease eigendecomposition computational time. We propose using MATLAB Distributed Computing Server to parallel construct similarity matrix, whilst using t-nearest neighbors approach to reduce memory use. Ultimately, we present clustering time, clustering quality and clustering accuracy in the experiments.


Author(s):  
Md. Zakir Hossain ◽  
Md. Jakirul Islam ◽  
Md. Waliur Rahman Miah ◽  
Jahid Hasan Rony ◽  
Momotaz Begum

<p>The amount of data has been increasing exponentially in every sector such as banking securities, healthcare, education, manufacturing, consumer-trade, transportation, and energy. Most of these data are noise, different in shapes, and outliers. In such cases, it is challenging to find the desired data clusters using conventional clustering algorithms. DBSCAN is a popular clustering algorithm which is widely used for noisy, arbitrary shape, and outlier data. However, its performance highly depends on the proper selection of cluster radius <em>(Eps)</em> and the minimum number of points <em>(MinPts)</em> that are required for forming clusters for the given dataset. In the case of real-world clustering problems, it is a difficult task to select the exact value of Eps and <em>(MinPts)</em> to perform the clustering on unknown datasets. To address these, this paper proposes a dynamic DBSCAN algorithm that calculates the suitable value for <em>(Eps)</em> and <em>(MinPts)</em> dynamically by which the clustering quality of the given problem will be increased. This paper evaluates the performance of the dynamic DBSCAN algorithm over seven challenging datasets. The experimental results confirm the effectiveness of the dynamic DBSCAN algorithm over the well-known clustering algorithms.</p>


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Tushar Jain

PurposeThe overall goal of this research is to develop algorithms for feature-based recognition of 2D parts from intensity images. Most present industrial vision systems are custom-designed systems, which can only handle a specific application. This is not surprising, since different applications have different geometry, different reflectance properties of the parts.Design/methodology/approachComputer vision recognition has attracted the attention of researchers in many application areas and has been used to solve many ranges of problems. Object recognition is a type of pattern recognition. Object recognition is widely used in the manufacturing industry for the purpose of inspection. Machine vision techniques are being applied in areas ranging from medical imaging to remote sensing, industrial inspection to document processing and nanotechnology to multimedia databases. In this work, recognition of objects manufactured in mechanical industry is considered. Mechanically manufactured parts have recognition difficulties due to manufacturing process including machine malfunctioning, tool wear and variations in raw material. This paper considers the problem of recognizing and classifying the objects of such mechanical part. Red, green and blue RGB images of five objects are used as an input. The Fourier descriptor technique is used for recognition of objects. Artificial neural network (ANN) is used for classification of five different objects. These objects are kept in different orientations for invariant rotation, translation and scaling. The feed forward neural network with back-propagation learning algorithm is used to train the network. This paper shows the effect of different network architecture and numbers of hidden nodes on the classification accuracy of objects as well as the effect of learning rate and momentum.FindingsOne important finding is that there is not any considerable change in the network performances after 500 iterations. It has been found that for data smaller network structure, smaller learning rate and momentum are required. The relative sample size also has a considerable effect on the performance of the classifier. Further studies suggest that classification accuracy is achieved with the confusion matrix of the data used. Hence, with these results the proposed system can be used efficiently for more objects. Depending upon the manufacturing product and process used, the dimension verification and surface roughness may be integrated with proposed technique to develop a comprehensive vision system. The proposed technique is also highly suitable for web inspections, which do not require dimension and roughness measurement and where desired accuracy is to be achieved at a given speed. In general, most recognition problems provide identity of object with pose estimation. Therefore, the proposed recognition (pose estimation) approach may be integrated with inspection stage.Originality/valueThis paper considers the problem of recognizing and classifying the objects of such mechanical part. RGB images of five objects are used as an input. The Fourier descriptor technique is used for recognition of objects. ANN is used for classification of five different objects. These objects are kept in different orientations for invariant rotation, translation and scaling. The feed forward neural network with back-propagation learning algorithm is used to train the network. This paper shows the effect of different network architecture and numbers of hidden nodes on the classification accuracy of objects as well as the effect of learning rate and momentum.


Author(s):  
Nazri Mohd Nawi ◽  
Faridah Hamzah ◽  
Norhamreeza Abdul Hamid ◽  
Muhammad Zubair Rehman ◽  
Mohammad Aamir ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document