The Influence on Clustering Results of Electricity Load Curves Using Different Distances

2013 ◽  
Vol 401-403 ◽  
pp. 1440-1443 ◽  
Author(s):  
Tie Feng Zhang ◽  
Fei Lv ◽  
Rong Gu

Distance choice is an important issue in power load pattern extraction using clustering techniques, so it is necessary to find the influence on clustering result of load curves using different distances in clustering algorithms. In this paper several distances are used in the k-means algorithm for clustering load curves and their influences on the clustering results are analyzed, therefore, the suitable distance for the k-means algorithms is obtained. An example with 147 electricity customers load curves shows distances have different influences on clustering results using the same clustering algorithm. The comparison results indicate that the choice of distances is an important issue in power load pattern extraction using clustering techniques and a suitable distance may improve the accuracy of mining algorithms.

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1295 ◽  
Author(s):  
Mohiuddin Ahmed ◽  
Raihan Seraj ◽  
Syed Mohammed Shamsul Islam

The k-means clustering algorithm is considered one of the most powerful and popular data mining algorithms in the research community. However, despite its popularity, the algorithm has certain limitations, including problems associated with random initialization of the centroids which leads to unexpected convergence. Additionally, such a clustering algorithm requires the number of clusters to be defined beforehand, which is responsible for different cluster shapes and outlier effects. A fundamental problem of the k-means algorithm is its inability to handle various data types. This paper provides a structured and synoptic overview of research conducted on the k-means algorithm to overcome such shortcomings. Variants of the k-means algorithms including their recent developments are discussed, where their effectiveness is investigated based on the experimental analysis of a variety of datasets. The detailed experimental analysis along with a thorough comparison among different k-means clustering algorithms differentiates our work compared to other existing survey papers. Furthermore, it outlines a clear and thorough understanding of the k-means algorithm along with its different research directions.


2018 ◽  
Vol 13 (5) ◽  
pp. 759-771 ◽  
Author(s):  
Guangchun Chen ◽  
Juan Hu ◽  
Hong Peng ◽  
Jun Wang ◽  
Xiangnian Huang

Using spectral clustering algorithm is diffcult to find the clusters in the cases that dataset has a large difference in density and its clustering effect depends on the selection of initial centers. To overcome the shortcomings, we propose a novel spectral clustering algorithm based on membrane computing framework, called MSC algorithm, whose idea is to use membrane clustering algorithm to realize the clustering component in spectral clustering. A tissue-like P system is used as its computing framework, where each object in cells denotes a set of cluster centers and velocity-location model is used as the evolution rules. Under the control of evolutioncommunication mechanism, the tissue-like P system can obtain a good clustering partition for each dataset. The proposed spectral clustering algorithm is evaluated on three artiffcial datasets and ten UCI datasets, and it is further compared with classical spectral clustering algorithms. The comparison results demonstrate the advantage of the proposed spectral clustering algorithm.


Author(s):  
Naohiko Kinoshita ◽  
◽  
Yasunori Endo ◽  
Akira Sugawara ◽  
◽  
...  

Clustering is representative unsupervised classification. Many researchers have proposed clustering algorithms based on mathematical models – methods we call model-based clustering. Clustering techniques are very useful for determining data structures, but model-based clustering is difficult to use for analyzing data correctly because we cannot select a suitable method unless we know the data structure at least partially. The new clustering algorithm we propose introduces soft computing techniques such as fuzzy reasoning in what we call linguistic-based clustering, whose features are not incident to the data structure. We verify the method’s effectiveness through numerical examples.


Author(s):  
S. May

Abstract. Partition based clustering techniques are widely used in data mining and also to analyze hyperspectral images. Unsupervised clustering only depends on data, without any external knowledge. It creates a complete partition of the image with many classes. And so, sparse labeled samples may be used to label each cluster, and so simplify the supervised step. Each clustering algorithm has its own advantages, drawbacks (initialization, training complexity). We propose in this paper to use a recursive hierarchical clustering based on standard clustering strategies such as K-Means or Fuzzy-C-Means. The recursive hierarchical approach reduces the algorithm complexity, in order to process large amount of input pixels, and also to produce a clustering with a high number of clusters. Moreover, in hyperspectral images, a classical question is related to the high dimensionality and also to the distance that shall be used. Classical clustering algorithms usually use the Euclidean distance to compute distance between samples and centroids. We propose to implement the spectral angle distance instead and evaluate its performance. It better fits the pixel spectrums and is less sensitive to illumination change or spectrum variability inside a semantic class. Different scenes are processed with this method in order to demonstrate its potential.


2018 ◽  
Vol 2 (4) ◽  
Author(s):  
Pengfei Zhang ◽  
Hwee-Pink Tan ◽  
Gaoxi Xiao

Motivated by recent developments in Wireless Sensor Networks(WSNs), we present distributed clustering algorithms for maximizingthe lifetime of WSNs, i.e., the duration till the first node dies. Westudy the joint problem of prolonging network lifetime by introducing clustering techniques and energy-harvesting (EH) nodes. Firstlywe propose distributed clustering algorithm for maximizing the lifetime of clustered WSN, which includes EH nodes, serving as relaynodes for cluster heads (CHs). Secondly graph-based and LP-basedEH-CH matching algorithms are proposed which serve as benchmarkalgorithms. Extensive simulation results show that the proposed algorithms can achieve optimal or suboptimal solutions efficiently


2016 ◽  
Vol 43 (1) ◽  
pp. 54-74 ◽  
Author(s):  
Baojun Ma ◽  
Hua Yuan ◽  
Ye Wu

Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Author(s):  
Yuancheng Li ◽  
Yaqi Cui ◽  
Xiaolong Zhang

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.


Author(s):  
M. Tanveer ◽  
Tarun Gupta ◽  
Miten Shah ◽  

Twin Support Vector Clustering (TWSVC) is a clustering algorithm inspired by the principles of Twin Support Vector Machine (TWSVM). TWSVC has already outperformed other traditional plane based clustering algorithms. However, TWSVC uses hinge loss, which maximizes shortest distance between clusters and hence suffers from noise-sensitivity and low re-sampling stability. In this article, we propose Pinball loss Twin Support Vector Clustering (pinTSVC) as a clustering algorithm. The proposed pinTSVC model incorporates the pinball loss function in the plane clustering formulation. Pinball loss function introduces favorable properties such as noise-insensitivity and re-sampling stability. The time complexity of the proposed pinTSVC remains equivalent to that of TWSVC. Extensive numerical experiments on noise-corrupted benchmark UCI and artificial datasets have been provided. Results of the proposed pinTSVC model are compared with TWSVC, Twin Bounded Support Vector Clustering (TBSVC) and Fuzzy c-means clustering (FCM). Detailed and exhaustive comparisons demonstrate the better performance and generalization of the proposed pinTSVC for noise-corrupted datasets. Further experiments and analysis on the performance of the above-mentioned clustering algorithms on structural MRI (sMRI) images taken from the ADNI database, face clustering, and facial expression clustering have been done to demonstrate the effectiveness and feasibility of the proposed pinTSVC model.


Sign in / Sign up

Export Citation Format

Share Document