scholarly journals A Modified MinMaxk-Means Algorithm Based on PSO

2016 ◽  
Vol 2016 ◽  
pp. 1-13 ◽  
Author(s):  
Xiaoyan Wang ◽  
Yanping Bai

The MinMaxk-means algorithm is widely used to tackle the effect of bad initialization by minimizing the maximum intraclustering errors. Two parameters, including the exponent parameter and memory parameter, are involved in the executive process. Since different parameters have different clustering errors, it is crucial to choose appropriate parameters. In the original algorithm, a practical framework is given. Such framework extends the MinMaxk-means to automatically adapt the exponent parameter to the data set. It has been believed that if the maximum exponent parameter has been set, then the programme can reach the lowest intraclustering errors. However, our experiments show that this is not always correct. In this paper, we modified the MinMaxk-means algorithm by PSO to determine the proper values of parameters which can subject the algorithm to attain the lowest clustering errors. The proposed clustering method is tested on some favorite data sets in several different initial situations and is compared to thek-means algorithm and the original MinMaxk-means algorithm. The experimental results indicate that our proposed algorithm can reach the lowest clustering errors automatically.

2015 ◽  
Vol 17 (5) ◽  
pp. 719-732
Author(s):  
Dulakshi Santhusitha Kumari Karunasingha ◽  
Shie-Yui Liong

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.


Author(s):  
V. Suresh Babu ◽  
P. Viswanath ◽  
Narasimha M. Murty

Non-parametric methods like the nearest neighbor classifier (NNC) and the Parzen-Window based density estimation (Duda, Hart & Stork, 2000) are more general than parametric methods because they do not make any assumptions regarding the probability distribution form. Further, they show good performance in practice with large data sets. These methods, either explicitly or implicitly estimates the probability density at a given point in a feature space by counting the number of points that fall in a small region around the given point. Popular classifiers which use this approach are the NNC and its variants like the k-nearest neighbor classifier (k-NNC) (Duda, Hart & Stock, 2000). Whereas the DBSCAN is a popular density based clustering method (Han & Kamber, 2001) which uses this approach. These methods show good performance, especially with larger data sets. Asymptotic error rate of NNC is less than twice the Bayes error (Cover & Hart, 1967) and DBSCAN can find arbitrary shaped clusters along with noisy outlier detection (Ester, Kriegel & Xu, 1996). The most prominent difficulty in applying the non-parametric methods for large data sets is its computational burden. The space and classification time complexities of NNC and k-NNC are O(n) where n is the training set size. The time complexity of DBSCAN is O(n2). So, these methods are not scalable for large data sets. Some of the remedies to reduce this burden are as follows. (1) Reduce the training set size by some editing techniques in order to eliminate some of the training patterns which are redundant in some sense (Dasarathy, 1991). For example, the condensed NNC (Hart, 1968) is of this type. (2) Use only a few selected prototypes from the data set. For example, Leaders-subleaders method and l-DBSCAN method are of this type (Vijaya, Murthy & Subramanian, 2004 and Viswanath & Rajwala, 2006). These two remedies can reduce the computational burden, but this can also result in a poor performance of the method. Using enriched prototypes can improve the performance as done in (Asharaf & Murthy, 2003) where the prototypes are derived using adaptive rough fuzzy set theory and as in (Suresh Babu & Viswanath, 2007) where the prototypes are used along with their relative weights. Using a few selected prototypes can reduce the computational burden. Prototypes can be derived by employing a clustering method like the leaders method (Spath, 1980), the k-means method (Jain, Dubes, & Chen, 1987), etc., which can find a partition of the data set where each block (cluster) of the partition is represented by a prototype called leader, centroid, etc. But these prototypes can not be used to estimate the probability density, since the density information present in the data set is lost while deriving the prototypes. The chapter proposes to use a modified leader clustering method called the counted-leader method which along with deriving the leaders preserves the crucial density information in the form of a count which can be used in estimating the densities. The chapter presents a fast and efficient nearest prototype based classifier called the counted k-nearest leader classifier (ck-NLC) which is on-par with the conventional k-NNC, but is considerably faster than the k-NNC. The chapter also presents a density based clustering method called l-DBSCAN which is shown to be a faster and scalable version of DBSCAN (Viswanath & Rajwala, 2006). Formally, under some assumptions, it is shown that the number of leaders is upper-bounded by a constant which is independent of the data set size and the distribution from which the data set is drawn.


Robotics ◽  
2019 ◽  
Vol 8 (3) ◽  
pp. 58
Author(s):  
Yusuke Adachi ◽  
Masahide Ito ◽  
Tadashi Naruse

This paper addresses a strategy learning problem in the RoboCupSoccer Small Size League (SSL). We propose a novel method based on action sequences to cluster an opponent’s strategies online. Our proposed method is composed of the following three steps: (1) extracting typical actions from geometric data to make action sequences, (2) calculating the dissimilarity of the sequences, and (3) clustering the sequences by using the dissimilarity. This method can reduce the amount of data used in the clustering process; handling action sequences instead of geometric data as data-set makes it easier to search actions. As a result, the proposed clustering method is online feasible and also is applicable to countering an opponent’s strategy. The effectiveness of the proposed method was validated by experimental results.


2013 ◽  
Vol 3 (4) ◽  
pp. 1-14 ◽  
Author(s):  
S. Sampath ◽  
B. Ramya

Cluster analysis is a branch of data mining, which plays a vital role in bringing out hidden information in databases. Clustering algorithms help medical researchers in identifying the presence of natural subgroups in a data set. Different types of clustering algorithms are available in the literature. The most popular among them is k-means clustering. Even though k-means clustering is a popular clustering method widely used, its application requires the knowledge of the number of clusters present in the given data set. Several solutions are available in literature to overcome this limitation. The k-means clustering method creates a disjoint and exhaustive partition of the data set. However, in some situations one can come across objects that belong to more than one cluster. In this paper, a clustering algorithm capable of producing rough clusters automatically without requiring the user to give as input the number of clusters to be produced. The efficiency of the algorithm in detecting the number of clusters present in the data set has been studied with the help of some real life data sets. Further, a nonparametric statistical analysis on the results of the experimental study has been carried out in order to analyze the efficiency of the proposed algorithm in automatic detection of the number of clusters in the data set with the help of rough version of Davies-Bouldin index.


Geophysics ◽  
2019 ◽  
Vol 84 (1) ◽  
pp. V81-V96 ◽  
Author(s):  
Tiago A. Coimbra ◽  
Jorge H. Faccipieri ◽  
João H. Speglich ◽  
Leiv-J. Gelius ◽  
Martin Tygel

Exploration of redundancy contained in the seismic data set assures enhancement of images that are based on stacking results. This enhancement is the goal of developing multiparametric traveltime equations that are able to approximate reflection and diffraction events in general source-receiver configurations. The main challenge of using these equations is to estimate a large number of parameters in a computationally feasible, reliable, and fast way. To obtain a better fit for diffraction traveltime events than the ones in the literature, we have derived a finite-offset (FO) double-square-root (DSR) diffraction traveltime equation (which depends on 10 parameters in three dimensions and four parameters in two dimensions). Moreover, to reduce the number of parameters, we have developed another version called simplified FO-DSR diffraction traveltime equation (which depends on five parameters in three dimensions and two parameters in two dimensions), which delivers a similar performance. We have developed operators that make use of the simplified FO-DSR traveltime equation to construct the so-called diffraction-only data set volumes (or, more simply, D-volumes) assuring enhancement in the diffraction extraction process. The D-volume construction has two steps: first, a stacking procedure to separate the diffraction events from the input data set and second, a spreading procedure to enhance the quality of these diffractions. As proof of concept, our approach has been tested on 2D/3D synthetic and 2D field data sets with successful results.


2004 ◽  
Vol 14 (06) ◽  
pp. 355-371 ◽  
Author(s):  
HAMED HAMID MUHAMMED

A new more efficient variant of a recently developed algorithm for unsupervised fuzzy clustering is introduced. A Weighted Incremental Neural Network (WINN) is introduced and used for this purpose. The new approach is called FC-WINN (Fuzzy Clustering using WINN). The WINN algorithm produces a net of nodes connected by edges, which reflects and preserves the topology of the input data set. Additional weights, which are proportional to the local densities in input space, are associated with the resulting nodes and edges to store useful information about the topological relations in the given input data set. A fuzziness factor, proportional to the connectedness of the net, is introduced in the system. A watershed-like procedure is used to cluster the resulting net. The number of the resulting clusters is determined by this procedure. Only two parameters must be chosen by the user for the FC-WINN algorithm to determine the resolution and the connectedness of the net. Other parameters that must be specified are those which are necessary for the used incremental neural network, which is a modified version of the Growing Neural Gas algorithm (GNG). The FC-WINN algorithm is computationally efficient when compared to other approaches for clustering large high-dimensional data sets.


Symmetry ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 613
Author(s):  
Guillermo Martínez-Flórez ◽  
Roger Tovar-Falón ◽  
Marvin Jimémez-Narváez

This paper introduces a new family of asymmetric distributions that allows to fit unimodal as well as bimodal and trimodal data sets. The model extends the normal model by introducing two parameters that control the shape and the asymmetry of the distribution. Basic properties of this new distribution are studied in detail. The problem of estimating parameters is addressed by considering the maximum likelihood method and Fisher information matrix is derived. A small Monte Carlo simulation study is conducted to examine the performance of the obtained estimators. Finally, two data set are considered to illustrate the developed methodology.


Author(s):  
Yubo Liu ◽  
Yihua Luo ◽  
Qiaoming Deng ◽  
Xuanxing Zhou

AbstractThis paper aims to explore the idea and method of using deep learning with a small amount sample to realize campus layout generation. From the perspective of the architect, we construct two small amount sample campus layout data sets through artificial screening with the preference of the specific architects. These data sets are used to train the ability of Pix2Pix model to automatically generate the campus layout under the condition of the given campus boundary and surrounding roads. Through the analysis of the experimental results, this paper finds that under the premise of effective screening of the collected samples, even using a small amount sample data set for deep learning can achieve a good result.


2006 ◽  
Vol 12 (4) ◽  
pp. 283-288
Author(s):  
Jolita Bernatavičienė ◽  
Gintautas Dzemyda ◽  
Olga Kurasova ◽  
Virginijus Marcinkevičius

In this paper, a method of large multidimensional data visualization that associates the multidimensional scaling (MDS) with clustering is modified and investigated. In the original algorithm, the visualization process is divided into three steps: the basis vector set is constructed using the k‐means clustering method; this set is projected onto the plane using the MDS algorithm; the remaining data set is visualized using the relative MDS algorithm. We propose a modification which differs from the original algorithm in the strategy of selecting the basis vectors. In our modification, the set of basis vectors consists of vectors that are selected from k clusters in a new way. The experimental investigation showed that the modification exceeds the original algorithm in visualization quality and computational expenses.


2021 ◽  
Vol 257 ◽  
pp. 01032
Author(s):  
Dong Hong Huang ◽  
Dan Liu ◽  
Ming Wen ◽  
Xin Li Dong ◽  
Min Wen ◽  
...  

For the design and planning of gas-fired boiler system, the load of gas-fired boiler is an important basic data. Load clustering analysis, combined with the application of data mining technology and gas boiler system, excavates the hidden load patterns in a large number of disordered and irregular loads, and classifies them, so as to solve many problems in gas boiler system. The current load clustering methods have more or less problems. The invention first carries out data PVA dimension reduction processing on the huge gas data, and then carries out cluster analysis. In the actual application of gas-fired boilers, the data objects we are faced with are usually unbalanced data sets. In order to solve the problem of sample imbalance, we use the FCM-SMOTE algorithm to oversample the clustered data to make the data set into a balanced data set.


Sign in / Sign up

Export Citation Format

Share Document