Determination of Optimal Cluster Number in Connection to SCADA

Author(s):  
Jan Vávra ◽  
Martin Hromada
2013 ◽  
Vol 52 (12) ◽  
pp. 2699-2714 ◽  
Author(s):  
Peter Hoffmann ◽  
K. Heinke Schlünzen

AbstractA classification of weather patterns (WP) is derived that is tailored to best represent situations relevant for the urban heat island (UHI). Three different types of k-means-based cluster methods are conducted. The explained cluster variance is used as a measure for the quality. Several variables of the 700-hPa fields from the 40-yr ECMWF Re-Analysis (ERA-40) were tested for the classification. The variables as well as the domain for the clustering are chosen in a way to explain the variability of the UHI as best as possible. It turned out that the combination of geopotential height, relative humidity, vorticity, and the 1000–700-hPa thickness is best suited. To determine the optimal cluster number k several statistical measures are applied. Except for autumn (k = 12) an optimal cluster number of k = 7 is found. The WP frequency changes are analyzed using climate projections of two regional climate models (RCM). Both RCMs, the Regional Model (REMO) and Climate Limited-Area Model (CLM), are driven with the A1B simulations from the global climate model ECHAM5. Focusing on the periods 2036–65 and 2071–2100, no change can be found of the frequency for the anticyclonic WP when compared with 1971–2000. Since these WPs are favorable for the development of a strong UHI, the frequency of strong UHI days stays the same for the city of Hamburg,Germany. For other WPs changes can be found for both future periods. At the end of the century, a large increase (17%–40%) in the frequency of the zonal WP and a large decrease (20%–26%) in the southwesterly WP are projected.


2021 ◽  
Author(s):  
Congming Shi ◽  
Bingtao Wei ◽  
Shoulin Wei ◽  
Wen Wang ◽  
Hai Liu ◽  
...  

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.


2021 ◽  
Vol 5 (3) ◽  
pp. 421-428
Author(s):  
Diana Purwitasari ◽  
Aida Muflichah ◽  
Novrindah Alvi Hasanah ◽  
Agus Zainal Arifin

Undergraduate thesis as the final project, or in Indonesian called as Tugas Akhir, for each undergraduate student is a pre-requisite before student graduation and the successfulness in finishing the project becomes as one of learning outcomes among others. Determining the topic of the final project according to the ability of students is an important thing. One strategy to decide the topic is reading some literatures but it takes up more time. There is a need for a recommendation system to help students in determining the topic according to their abilities or subject understanding which is based on their academic transcripts. This study focused on a system for final project topic recommendations based on evaluating competencies in previous academic transcripts of graduated students. Collected data of previous final projects, namely titles and abstracts weighted by term occurences of TF-IDF (term frequency–inverse document frequency) and grouped by using K-Means Clustering. From each cluster result, we prepared candidates for recommended topics using Latent Dirichlet Allocation (LDA) with Gibbs Sampling that focusing on the word distribution of each topic in the cluster. Some evaluations were performed to evaluate the optimal cluster number, topic number and then made more thorough exploration on the recommendation results. Our experiments showed that the proposed system could recommend final project topic ideas based on student competence represented in their academic transcripts.


2005 ◽  
Vol 169 (2) ◽  
pp. 1172-1185 ◽  
Author(s):  
Judong Shen ◽  
Shing I. Chang ◽  
E. Stanley Lee ◽  
Youping Deng ◽  
Susan J. Brown

Water ◽  
2019 ◽  
Vol 11 (10) ◽  
pp. 2066
Author(s):  
Chan ◽  
Chin

In this paper, a fusion of unsupervised clustering and incremental similarity tracking of hourly water demand series is proposed. Current research using unsupervised methodologies to detect anomalous water is limited and may possess several limitations such as a large amount of dataset, the need to select an optimal cluster number, or low detection accuracy. Our proposed approach aims to address the need for a large amount of dataset by detecting anomaly through (1) clustering points that are relatively similar at each time step, (2) clustering points at each time step by the similarity in how they vary from each time step, and (3) to compare the incoming points with a reference shape for online anomalous trend detection. Secondly, through the use of Bayesian nonparametric approach such as the Dirichlet Process Mixture Model, the need to choose an optimal cluster number is eliminated and provides a subtle solution for ‘reserving’ an empty cluster for the future anomaly. Among the 165 randomly generated anomalies, the proposed approach detected a total of 159 anomalies and other anomalous trends present in the data. As the data is unlabeled, identified anomalous trends cannot be verified. However, results show great potential in using minimally unlabeled water demand data for a preliminary anomaly detection.


2019 ◽  
Vol 41 (4) ◽  
pp. 429-440
Author(s):  
Hsu-Yao Huang ◽  
Lung-Chieh Lin ◽  
Ming-Tsun Ke ◽  
Tamilarasan Sathesh ◽  
Wen-Shing Lee

Benchmarking the energy performance of buildings has received increasing attention as striving for energy efficiency through more effective energy management has become a major concern of governments. Various methods for classifying building energy performance have been developed, and the clustering technique is considered one of the best approaches. This paper proposes a method utilizing dynamic clustering to analyze the electricity consumption patterns of buildings to decide the optimal cluster number and allocate the buildings to corresponding clusters for energy benchmarking. For the evaluation of number of clusters, this article has employed the inter–intra clustering method with particle swarm optimization algorithm. The electricity consumption data were collected through an energy survey performed in 30 junior high schools in Taipei, Taiwan. In a traditional method, the 30 schools would be grouped into one same cluster and the energy benchmarking report an average value of 541.4 kWh/year per student. The proposed method that took different electricity consumption patterns of the schools into consideration produced more detailed results as follows: the optimal cluster number was 3 with an inter–intra index value of 0.708, and the energy benchmarking index of these three clusters read, respectively, 362, 512, and 851 kWh/year per student. Practical application: The study proposed an innovative dynamic clustering technique to decide the optimal cluster number and allocate the assessed buildings. The results showed that compared to a traditional approach that tended to group assessed buildings into one cluster, the proposed method was able to classify the buildings into three clusters for further benchmarking. This method can be used by governments and large corporations. For example, in Hong Kong, primary schools are grouped into one cluster for energy benchmarking. Using the proposed method can further classify primary schools into more clusters; benchmarking index can then be developed for each cluster.


Sign in / Sign up

Export Citation Format

Share Document