Functional Clustering Based on Weighted Partitioning around Medoid Algorithm with Estimation of Number of Clusters

Author(s):  
Jianan Zhang
1990 ◽  
Vol 29 (03) ◽  
pp. 200-204 ◽  
Author(s):  
J. A. Koziol

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.


2018 ◽  
Author(s):  
Riana Brown ◽  
Sam G. B. Roberts ◽  
Thomas V. Pollet

Personality factors affect the properties of ‘offline’ social networks, but how they are associated with the structural properties of online networks is still unclear. We investigated how the six HEXACO personality factors (Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness and Openness to Experience) relate to Facebook use and three objectively measured Facebook network characteristics - network size, density, and number of clusters. Participants (n = 107, mean age = 20.6, 66% female) extracted their Facebook networks using the GetNet app, completed the 60-item HEXACO questionnaire and the Facebook Usage Questionnaire. Users high in Openness to Experience spent less time on Facebook. Extraversion was positively associated with network size and the number of network clusters (but not after controlling for size). These findings suggest that personality factors are associated with Facebook use and the size and structure of Facebook networks, and that personality is an important influence on both online and offline sociality.


2018 ◽  
Vol 14 (1) ◽  
pp. 11-23 ◽  
Author(s):  
Lin Zhang ◽  
Yanling He ◽  
Huaizhi Wang ◽  
Hui Liu ◽  
Yufei Huang ◽  
...  

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. <P><P> Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. <P><P> Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. <P><P> Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. <P><P> Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. <P><P> Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.


2021 ◽  
pp. 0308518X2110127
Author(s):  
Jiangping Zhou ◽  
Sam KS Ho ◽  
Shuyu Lei ◽  
Valarie CK Pang

The impacts of coronavirus disease 2019 (COVID-19) on society and economy are wide-ranging, long-lasting, and global. The experience of multiple countries or regions in fighting the pandemic indicates that there could be multiple COVID-19 surges, where a growing number of cases can be observed in the more recent surge(s). Were COVID-19 cases and clusters of cases (across surges) randomly distributed in spaces? Did population density and activity centres influence clusters of cases and associated venues? Based on information on the associated venues of the four surges of COVID-19 cases between January 2020 and February 2021 as well as population density, visuals were made to distinguish the relationships between population density, activity centres, and clusters of cases in Hong Kong. Different spatial patterns were observed across the four surges: fewer cases were observed in the first surge with a more evenly distributed pattern of clusters; the second surge as compared to the first surge saw a wider distribution and an increase in the number/layer of clusters; compared to the second surge, the third surge suffered from many more cases but saw a decrease in the general number of clusters; and compared to the previous three surges, the fourth surge had the largest number of cases, yet even fewer clusters were observed, where several clusters are again concentrated in specific areas similar to the previous surge. Across the four surges, a few locales could see recurrent clusters of cases and a few communities were without cases.


2021 ◽  
Vol 11 (3) ◽  
pp. 1241
Author(s):  
Sergio D. Saldarriaga-Zuluaga ◽  
Jesús M. López-Lezama ◽  
Nicolás Muñoz-Galeano

Microgrids constitute complex systems that integrate distributed generation (DG) and feature different operational modes. The optimal coordination of directional over-current relays (DOCRs) in microgrids is a challenging task, especially if topology changes are taken into account. This paper proposes an adaptive protection approach that takes advantage of multiple setting groups that are available in commercial DOCRs to account for network topology changes in microgrids. Because the number of possible topologies is greater than the available setting groups, unsupervised learning techniques are explored to classify network topologies into a number of clusters that is equal to the number of setting groups. Subsequently, optimal settings are calculated for every topology cluster. Every setting is saved in the DOCRs as a different setting group that would be activated when a corresponding topology takes place. Several tests are performed on a benchmark IEC (International Electrotechnical Commission) microgrid, evidencing the applicability of the proposed approach.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


2013 ◽  
Vol 321-324 ◽  
pp. 1947-1950
Author(s):  
Lei Gu ◽  
Xian Ling Lu

In the initialization of the traditional k-harmonic means clustering, the initial centers are generated randomly and its number is equal to the number of clusters. Although the k-harmonic means clustering is insensitive to the initial centers, this initialization method cannot improve clustering performance. In this paper, a novel k-harmonic means clustering based on multiple initial centers is proposed. The number of the initial centers is more than the number of clusters in this new method. The new method with multiple initial centers can divide the whole data set into multiple groups and combine these groups into the final solution. Experiments show that the presented algorithm can increase the better clustering accuracies than the traditional k-means and k-harmonic methods.


Sign in / Sign up

Export Citation Format

Share Document