scholarly journals Scalable Clustering Algorithm for N-Body Simulations in a Shared-Nothing Cluster

Author(s):  
YongChul Kwon ◽  
Dylan Nunley ◽  
Jeffrey P. Gardner ◽  
Magdalena Balazinska ◽  
Bill Howe ◽  
...  
2021 ◽  
Author(s):  
James Anibal ◽  
Alexandre Day ◽  
Erol Bahadiroglu ◽  
Liam O'Neill ◽  
Long Phan ◽  
...  

Data clustering plays a significant role in biomedical sciences, particularly in single-cell data analysis. Researchers use clustering algorithms to group individual cells into populations that can be evaluated across different levels of disease progression, drug response, and other clinical statuses. In many cases, multiple sets of clusters must be generated to assess varying levels of cluster specificity. For example, there are many subtypes of leukocytes (e.g. T cells), whose individual preponderance and phenotype must be assessed for statistical/functional significance. In this report, we introduce a novel hierarchical density clustering algorithm (HAL-x) that uses supervised linkage methods to build a cluster hierarchy on raw single-cell data. With this new approach, HAL-x can quickly predict multiple sets of labels for immense datasets, achieving a considerable improvement in computational efficiency on large datasets compared to existing methods. We also show that cell clusters generated by HAL-x yield near-perfect F1-scores when classifying different clinical statuses based on single-cell profiles. Our hierarchical density clustering algorithm achieves high accuracy in single cell classification in a scalable, tunable and rapid manner. We make HAL-x publicly available at: https://pypi.org/project/hal-x/


2017 ◽  
Vol 15 (06) ◽  
pp. 1740006 ◽  
Author(s):  
Mohammad Arifur Rahman ◽  
Nathan LaPierre ◽  
Huzefa Rangwala ◽  
Daniel Barbara

Metagenomics is the collective sequencing of co-existing microbial communities which are ubiquitous across various clinical and ecological environments. Due to the large volume and random short sequences (reads) obtained from community sequences, analysis of diversity, abundance and functions of different organisms within these communities are challenging tasks. We present a fast and scalable clustering algorithm for analyzing large-scale metagenome sequence data. Our approach achieves efficiency by partitioning the large number of sequence reads into groups (called canopies) using hashing. These canopies are then refined by using state-of-the-art sequence clustering algorithms. This canopy-clustering (CC) algorithm can be used as a pre-processing phase for computationally expensive clustering algorithms. We use and compare three hashing schemes for canopy construction with five popular and state-of-the-art sequence clustering methods. We evaluate our clustering algorithm on synthetic and real-world 16S and whole metagenome benchmarks. We demonstrate the ability of our proposed approach to determine meaningful Operational Taxonomic Units (OTU) and observe significant speedup with regards to run time when compared to different clustering algorithms. We also make our source code publicly available on Github. a


2021 ◽  
Author(s):  
Priti Maratha ◽  
Kapil Gupta

Abstract In spite of the severe limitations on the resources of the sensor nodes such as memory, computational power, transmission range and battery, the application areas of Wireless Sensor Networks (WSNs) are increasing day by day. The main challenge in WSNs is energy consumption. It becomes significant when a large number of nodes are deployed. Although clustering is one of the solutions to cater to this problem, but it suffers from severe energy consumption due to the non-uniform selection of CHs and frequent re-clustering. In this paper, we propose a heuristic and fuzzy based load balanced, scalable clustering algorithm for WSNs called HFLBSC. In this algorithm, we have segregated the network into a layered structure using the area under intersection over union curve. We have selected the CHs by considering residual energy and distance threshold. We have stalled the frequent re-clustering by utilizing the decision made with the help of fuzzy logic. Our proposed scheme is capable enough to elongate the network lifetime. Statistical analysis and simulation results confirm the superiority of proposed work in comparison to its competitor protocol.


Sign in / Sign up

Export Citation Format

Share Document