clustering algorithms
Recently Published Documents





Atmosphere ◽  
2022 ◽  
Vol 13 (1) ◽  
pp. 145
Siti Mariana Che Mat Nor ◽  
Shazlyn Milleana Shaharudin ◽  
Shuhaida Ismail ◽  
Sumayyah Aimi Mohd Najib ◽  
Mou Leong Tan ◽  

This study was conducted to identify the spatiotemporal torrential rainfall patterns of the East Coast of Peninsular Malaysia, as it is the region most affected by the torrential rainfall of the Northeast Monsoon season. Dimension reduction, such as the classical Principal Components Analysis (PCA) coupled with the clustering approach, is often applied to reduce the dimension of the data while simultaneously performing cluster partitions. However, the classical PCA is highly insensitive to outliers, as it assigns equal weights to each set of observations. Hence, applying the classical PCA could affect the cluster partitions of the rainfall patterns. Furthermore, traditional clustering algorithms only allow each element to exclusively belong to one cluster, thus observations within overlapping clusters of the torrential rainfall datasets might not be captured effectively. In this study, a statistical model of torrential rainfall pattern recognition was proposed to alleviate these issues. Here, a Robust PCA (RPCA) based on Tukey’s biweight correlation was introduced and the optimum breakdown point to extract the number of components was identified. A breakdown point of 0.4 at 85% cumulative variance percentage efficiently extracted the number of components to avoid low-frequency variations or insignificant clusters on a spatial scale. Based on the extracted components, the rainfall patterns were further characterized based on cluster solutions attained using Fuzzy C-means clustering (FCM) to allow data elements to belong to more than one cluster, as the rainfall data structure permits this. Lastly, data generated using a Monte Carlo simulation were used to evaluate the performance of the proposed statistical modeling. It was found that the proposed RPCA-FCM performed better using RPCA-FCM compared to the classical PCA coupled with FCM in identifying the torrential rainfall patterns of Peninsular Malaysia’s East Coast.

Biomolecules ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 140
Georgios N. Dimitrakopoulos ◽  
Maria I. Klapa ◽  
Nicholas K. Moschonas

After more than fifteen years from the first high-throughput experiments for human protein–protein interaction (PPI) detection, we are still wondering how close the completion of the genome-scale human PPI network reconstruction is, what needs to be further explored and whether the biological insights gained from the holistic investigation of the current network are valid and useful. The unique structure of PICKLE, a meta-database of the human experimentally determined direct PPI network developed by our group, presently covering ~80% of the UniProtKB/Swiss-Prot reviewed human complete proteome, enables the evaluation of the interactome expansion by comparing the successive PICKLE releases since 2013. We observe a gradual overall increase of 39%, 182%, and 67% in protein nodes, PPIs, and supporting references, respectively. Our results indicate that, in recent years, (a) the PPI addition rate has decreased, (b) the new PPIs are largely determined by high-throughput experiments and mainly concern existing protein nodes and (c), as we had predicted earlier, most of the newly added protein nodes have a low degree. These observations, combined with a largely overlapping k-core between PICKLE releases and a network density increase, imply that an almost complete picture of a structurally defined network has been reached. The comparative unsupervised application of two clustering algorithms indicated that exploring the full interactome topology can reveal the protein neighborhoods involved in closely related biological processes as transcriptional regulation, cell signaling and multiprotein complexes such as the connexon complex associated with cancers. A well-reconstructed human protein interactome is a powerful tool in network biology and medicine research forming the basis for multi-omic and dynamic analyses.

2022 ◽  
Vol 53 (3) ◽  
pp. 466-486
Cindy Cindy ◽  
Cynthia Cynthia ◽  
Valentino Vito ◽  
Devvi Sarwinda ◽  
Bevina Desjwiandra Handari ◽  

In Indonesia, Dengue incidence tends to increase every year but has been fluctuating in recent years. The potential for Dengue outbreaks in DKI Jakarta, the capital city, deserves serious attention. Weather factors are suspected of being associated with the incidence of Dengue in Indonesia. This research used weather and Dengue incidence data for five regions of DKI Jakarta, Indonesia, from December 30, 2008, to January 2, 2017. The study used a clustering approach on time-series and non-time-series data using K-Medoids and Fuzzy C-Means Clustering. The clustering results for the non-time-series data showed a positive correlation between the number of Dengue incidents and both average relative humidity and amount of rainfall. However, Dengue incidence and average temperature were negatively correlated. Moreover, the clustering implementation on the time-series data showed that rainfall patterns most closely resembled those of Dengue incidence. Therefore, rainfall can be used to estimate Dengue incidence. Both results suggest that the government could utilize weather data to predict possible spikes in DHF incidence, especially when entering the rainy season and alert the public to greater probability of a Dengue outbreak.

Electronics ◽  
2022 ◽  
Vol 11 (2) ◽  
pp. 267
Félix Morales ◽  
Miguel García-Torres ◽  
Gustavo Velázquez ◽  
Federico Daumas-Ladouce ◽  
Pedro E. Gardel-Sotomayor ◽  

Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and five linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a configuration of six clusters.

2022 ◽  
Vol 8 (1) ◽  
Luis Lorenzo ◽  
Javier Arroyo

AbstractSince the emergence of Bitcoin, cryptocurrencies have grown significantly, not only in terms of capitalization but also in number. Consequently, the cryptocurrency market can be a conducive arena for investors, as it offers many opportunities. However, it is difficult to understand. This study aims to describe, summarize, and segment the main trends of the entire cryptocurrency market in 2018, using data analysis tools. Accordingly, we propose a new clustering-based methodology that provides complementary views of the financial behavior of cryptocurrencies, and one that looks for associations between the clustering results, and other factors that are not involved in clustering. Particularly, the methodology involves applying three different partitional clustering algorithms, where each of them use a different representation for cryptocurrencies, namely, yearly mean, and standard deviation of the returns, distribution of returns that have not been applied to financial markets previously, and the time series of returns. Because each representation provides a different outlook of the market, we also examine the integration of the three clustering results, to obtain a fine-grained analysis of the main trends of the market. In conclusion, we analyze the association of the clustering results with other descriptive features of cryptocurrencies, including the age, technological attributes, and financial ratios derived from them. This will help to enhance the profiling of the clusters with additional descriptive insights, and to find associations with other variables. Consequently, this study describes the whole market based on graphical information, and a scalable methodology that can be reproduced by investors who want to understand the main trends in the market quickly, and those that look for cryptocurrencies with different financial performance.In our analysis of the 2018 and 2019 for extended period, we found that the market can be typically segmented in few clusters (five or less), and even considering the intersections, the 6 more populations account for 75% of the market. Regarding the associations between the clusters and descriptive features, we find associations between some clusters with volume, market capitalization, and some financial ratios, which could be explored in future research.

Energies ◽  
2022 ◽  
Vol 15 (2) ◽  
pp. 462
Guilherme Henrique Apostolo ◽  
Flavia Bernardini ◽  
Luiz C. Schara Magalhães ◽  
Débora C. Muchaluat-Saade

As wireless local area networks grow in size to provide access to users, power consumption becomes an important issue. Power savings in a large-scale Wi-Fi network, with low impact to user service, is undoubtedly desired. In this work, we propose and evaluate the eSCIFI energy saving mechanism for Wireless Local Area Networks (WLANs). eSCIFI is an energy saving mechanism that uses machine learning algorithms as occupancy demand estimators. The eSCIFI mechanism is designed to cope with a broader range of WLANs, which includes Wi-Fi networks such as the Fluminense Federal University (UFF) SCIFI network. The eSCIFI can cope with WLANs that cannot acquire data in a real time manner and/or possess a limited CPU power. The eSCIFI design also includes two clustering algorithms, named cSCIFI and cSCIFI+, that help to guarantee the network’s coverage. eSCIFI uses those network clusters and machine learning predictions as input features to an energy state decision algorithm that then decides which Access Points (AP) can be switched off during the day. To evaluate eSCIFI performance, we conducted several trace-driven simulations comparing the eSCIFI mechanism using both clustering algorithms with other energy saving mechanisms found in the literature using the UFF SCIFI network traces. The results showed that eSCIFI mechanism using the cSCIFI+ clustering algorithm achieves the best performance and that it can save up to 64.32% of the UFF SCIFI network energy without affecting the user coverage.

2022 ◽  
Vol 5 (1) ◽  
pp. 12
Sakib Shahriar ◽  
A. R. Al-Ali

COVID-19 pandemic has infected millions and led to a catastrophic loss of lives globally. It has also significantly disrupted the movement of people, businesses, and industries. Additionally, electric vehicle (EV) users have faced challenges in charging their vehicles in public charging locations where there is a risk of COVID-19 exposure. However, a case study of EV charging behavior and its impacts during the SARS-CoV-2 is not addressed in the existing literature. This paper investigates the impacts of COVID-19 on EV charging behavior by analyzing the charging activity during the pandemic using a dataset from a public charging facility in the USA. Data visualization of charging behavior alongside significant timelines of the pandemic was utilized for analysis. Moreover, a cluster analysis using k-means, hierarchical clustering, and Gaussian mixture models was performed to identify common groups of charging behavior based on the vehicle arrival and departure times. Although the number of vehicles using the charging station was reduced significantly due to lockdown restrictions, the charging activity started to pick up again since May 2021 due to an increase in vaccination and easing of public restrictions. However, the charging activity currently still remains around half of the activity pre-pandemic. A noticeable decline in charging session length and an increase in energy consumption can be observed as well. Clustering algorithms identified three groups of charging behavior during the pandemic and their analysis and performance comparison using internal validation measures were also presented.

Andri M Kristijansson ◽  
Tyr Aegisson

In order to generate precise behavioural patterns or user segmentation, organisations often struggle with pulling information from data and choosing suitable Machine Learning (ML) techniques. Furthermore, many marketing teams are unfamiliar with data-driven classification methods. The goal of this research is to provide a framework that outlines the Unsupervised Machine Learning (UML) methods for User-Profiling (UP) based on essential data attributes. A thorough literature study was undertaken on the most popular UML techniques and their dataset attributes needs. For UP, a structure is developed that outlines several UML techniques. In terms of data size and dimensions, it offers two-stage clustering algorithms for category, quantitative, and mixed types of datasets. The clusters are determined in the first step using a multilevel or model-based classification method. Cluster refining is done in the second step using a non-hierarchical clustering technique. Academics and professionals may use the framework to figure out which UML techniques are best for creating strong profiles or data-driven user segmentation.

2022 ◽  
pp. 103-116
Ravishanker ◽  
Monica Sood ◽  
Prikshat Angra ◽  
Sahil Verma ◽  
Kavita ◽  

2022 ◽  
Vol 2 ◽  
Ivo V. Stuldreher ◽  
Alexandre Merasli ◽  
Nattapong Thammasan ◽  
Jan B. F. van Erp ◽  
Anne-Marie Brouwer

Research on brain signals as indicators of a certain attentional state is moving from laboratory environments to everyday settings. Uncovering the attentional focus of individuals in such settings is challenging because there is usually limited information about real-world events, as well as a lack of data from the real-world context at hand that is correctly labeled with respect to individuals' attentional state. In most approaches, such data is needed to train attention monitoring models. We here investigate whether unsupervised clustering can be combined with physiological synchrony in the electroencephalogram (EEG), electrodermal activity (EDA), and heart rate to automatically identify groups of individuals sharing attentional focus without using knowledge of the sensory stimuli or attentional focus of any of the individuals. We used data from an experiment in which 26 participants listened to an audiobook interspersed with emotional sounds and beeps. Thirteen participants were instructed to focus on the narrative of the audiobook and 13 participants were instructed to focus on the interspersed emotional sounds and beeps. We used a broad range of commonly applied dimensionality reduction ordination techniques—further referred to as mappings—in combination with unsupervised clustering algorithms to identify the two groups of individuals sharing attentional focus based on physiological synchrony. Analyses were performed using the three modalities EEG, EDA, and heart rate separately, and using all possible combinations of these modalities. The best unimodal results were obtained when applying clustering algorithms on physiological synchrony data in EEG, yielding a maximum clustering accuracy of 85%. Even though the use of EDA or heart rate by itself did not lead to accuracies significantly higher than chance level, combining EEG with these measures in a multimodal approach generally resulted in higher classification accuracies than when using only EEG. Additionally, classification results of multimodal data were found to be more consistent across algorithms than unimodal data, making algorithm choice less important. Our finding that unsupervised classification into attentional groups is possible is important to support studies on attentional engagement in everyday settings.

Sign in / Sign up

Export Citation Format

Share Document