Functional Clustering Based on Weighted Partitioning around Medoid Algorithm with Estimation of Number of Clusters

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.

Download Full-text

HEXACO personality factors and their associations with Facebook use and Facebook network characteristics

10.31234/osf.io/3zvhq ◽

2018 ◽

Author(s):

Riana Brown ◽

Sam G. B. Roberts ◽

Thomas V. Pollet

Keyword(s):

Social Networks ◽

Network Size ◽

Openness To Experience ◽

Personality Factors ◽

Number Of Clusters ◽

Network Characteristics ◽

Online Networks ◽

Facebook Use ◽

Objectively Measured ◽

Size And Structure

Personality factors affect the properties of ‘offline’ social networks, but how they are associated with the structural properties of online networks is still unclear. We investigated how the six HEXACO personality factors (Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness and Openness to Experience) relate to Facebook use and three objectively measured Facebook network characteristics - network size, density, and number of clusters. Participants (n = 107, mean age = 20.6, 66% female) extracted their Facebook networks using the GetNet app, completed the 60-item HEXACO questionnaire and the Facebook Usage Questionnaire. Users high in Openness to Experience spent less time on Facebook. Extraversion was positively associated with network size and the number of network clusters (but not after controlling for size). These findings suggest that personality factors are associated with Facebook use and the size and structure of Facebook networks, and that personality is an important influence on both online and offline sociality.

Download Full-text

Method for determining optimal number of clusters in K-means clustering algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01995 ◽

2010 ◽

Vol 30 (8) ◽

pp. 1995-1998 ◽

Cited By ~ 18

Author(s):

Shi-bing ZHOU ◽

Zhen-yuan XU ◽

Xu-qing TANG

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text

Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

Current Bioinformatics ◽

10.2174/1574893613666180601080008 ◽

2018 ◽

Vol 14 (1) ◽

pp. 11-23 ◽

Cited By ~ 3

Author(s):

Lin Zhang ◽

Yanling He ◽

Huaizhi Wang ◽

Hui Liu ◽

Yufei Huang ◽

...

Keyword(s):

Clustering Analysis ◽

Methylation Level ◽

Optimal Number ◽

Generative Model ◽

Methylation Data ◽

Sequencing Data ◽

Number Of Clusters ◽

Rna Methylation ◽

Clustering Effect ◽

Optimal Number Of Clusters

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Download Full-text

Population density, activity centres, and pandemic: Visualizing clusters of COVID-19 cases in Hong Kong

Environment and Planning A Economy and Space ◽

10.1177/0308518x211012700 ◽

2021 ◽

pp. 0308518X2110127

Author(s):

Jiangping Zhou ◽

Sam KS Ho ◽

Shuyu Lei ◽

Valarie CK Pang

Keyword(s):

Hong Kong ◽

Population Density ◽

Spatial Patterns ◽

Number Of Clusters ◽

The Third ◽

General Number

The impacts of coronavirus disease 2019 (COVID-19) on society and economy are wide-ranging, long-lasting, and global. The experience of multiple countries or regions in fighting the pandemic indicates that there could be multiple COVID-19 surges, where a growing number of cases can be observed in the more recent surge(s). Were COVID-19 cases and clusters of cases (across surges) randomly distributed in spaces? Did population density and activity centres influence clusters of cases and associated venues? Based on information on the associated venues of the four surges of COVID-19 cases between January 2020 and February 2021 as well as population density, visuals were made to distinguish the relationships between population density, activity centres, and clusters of cases in Hong Kong. Different spatial patterns were observed across the four surges: fewer cases were observed in the first surge with a more evenly distributed pattern of clusters; the second surge as compared to the first surge saw a wider distribution and an increase in the number/layer of clusters; compared to the second surge, the third surge suffered from many more cases but saw a decrease in the general number of clusters; and compared to the previous three surges, the fourth surge had the largest number of cases, yet even fewer clusters were observed, where several clusters are again concentrated in specific areas similar to the previous surge. Across the four surges, a few locales could see recurrent clusters of cases and a few communities were without cases.

Download Full-text

Optimal Coordination of Over-Current Relays in Microgrids Using Unsupervised Learning Techniques

Applied Sciences ◽

10.3390/app11031241 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1241

Author(s):

Sergio D. Saldarriaga-Zuluaga ◽

Jesús M. López-Lezama ◽

Nicolás Muñoz-Galeano

Keyword(s):

Unsupervised Learning ◽

Distributed Generation ◽

Network Topology ◽

International Electrotechnical Commission ◽

Number Of Clusters ◽

Learning Techniques ◽

Topology Changes ◽

Network Topologies ◽

Optimal Coordination ◽

Operational Modes

Microgrids constitute complex systems that integrate distributed generation (DG) and feature different operational modes. The optimal coordination of directional over-current relays (DOCRs) in microgrids is a challenging task, especially if topology changes are taken into account. This paper proposes an adaptive protection approach that takes advantage of multiple setting groups that are available in commercial DOCRs to account for network topology changes in microgrids. Because the number of possible topologies is greater than the available setting groups, unsupervised learning techniques are explored to classify network topologies into a number of clusters that is equal to the number of setting groups. Subsequently, optimal settings are calculated for every topology cluster. Every setting is saved in the DOCRs as a different setting group that would be activated when a corresponding topology takes place. Several tests are performed on a benchmark IEC (International Electrotechnical Commission) microgrid, evidencing the applicability of the proposed approach.

Download Full-text

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text

An Efficient Approach to Determine Number of Clusters Using Principal Component Analysis

2018 International Conference on Current Trends towards Converging Technologies (ICCTCT) ◽

10.1109/icctct.2018.8551182 ◽

2018 ◽

Cited By ~ 1

Author(s):

V. Divya ◽

K. Nirmala Devi

Keyword(s):

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Number Of Clusters ◽

Efficient Approach

Download Full-text

A Novel K-Harmonic Means Clustering Based on Multiple Initial Centers

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.321-324.1947 ◽

2013 ◽

Vol 321-324 ◽

pp. 1947-1950

Author(s):

Lei Gu ◽

Xian Ling Lu

Keyword(s):

New Method ◽

Final Solution ◽

Data Set ◽

Number Of Clusters ◽

Multiple Groups ◽

Harmonic Means

In the initialization of the traditional k-harmonic means clustering, the initial centers are generated randomly and its number is equal to the number of clusters. Although the k-harmonic means clustering is insensitive to the initial centers, this initialization method cannot improve clustering performance. In this paper, a novel k-harmonic means clustering based on multiple initial centers is proposed. The number of the initial centers is more than the number of clusters in this new method. The new method with multiple initial centers can divide the whole data set into multiple groups and combine these groups into the final solution. Experiments show that the presented algorithm can increase the better clustering accuracies than the traditional k-means and k-harmonic methods.

Download Full-text

Finding Number of Clusters in a Gene Co-expression Network Using Independent Sets

2013 International Conference on Social Computing ◽

10.1109/socialcom.2013.125 ◽

2013 ◽

Author(s):

Harun Pirim

Keyword(s):

Independent Sets ◽

Number Of Clusters

Download Full-text