A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU–GPU clusters

Author(s):  
You Fu ◽  
Wei Zhou
2018 ◽  
Vol 46 (6) ◽  
pp. e33-e33 ◽  
Author(s):  
Ariful Azad ◽  
Georgios A Pavlopoulos ◽  
Christos A Ouzounis ◽  
Nikos C Kyrpides ◽  
Aydin Buluç

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Biqiu Li ◽  
Jiabin Wang ◽  
Xueli Liu

Data is an important source of knowledge discovery, but the existence of similar duplicate data not only increases the redundancy of the database but also affects the subsequent data mining work. Cleaning similar duplicate data is helpful to improve work efficiency. Based on the complexity of the Chinese language and the bottleneck of the single machine system to large-scale data computing performance, this paper proposes a Chinese data cleaning method that combines the BERT model and a k-means clustering algorithm and gives a parallel implementation scheme of the algorithm. In the process of text to vector, the position vector is introduced to obtain the context features of words, and the vector is dynamically adjusted according to the semantics so that the polysemous words can obtain different vector representations in different contexts. At the same time, the parallel implementation of the process is designed based on Hadoop. After that, k-means clustering algorithm is used to cluster similar duplicate data to achieve the purpose of cleaning. Experimental results on a variety of data sets show that the parallel cleaning algorithm proposed in this paper not only has good speedup and scalability but also improves the precision and recall of similar duplicate data cleaning, which will be of great significance for subsequent data mining.


2019 ◽  
Vol 48 (4) ◽  
pp. 673-681
Author(s):  
Shufen Zhang ◽  
Zhiyu Liu ◽  
Xuebin Chen ◽  
Changyin Luo

In order to solve the problem of traditional K-Means clustering algorithm in dealing with large-scale data set, a Hadoop K-Means (referred to HKM) clustering algorithm is proposed. Firstly, according to the sample density, the algorithm eliminates the effects of noise points in the data set. Secondly, it optimizes the selection of the initial center point using the thought of the max-min distance. Finally, it uses a MapReduce programming model to realize the parallelization. Experimental results show that the proposed algorithm not only has high accuracy and stability in clustering results, but can also solve the problems of scalability encountered by traditional clustering algorithms in dealing with large scale data.


2020 ◽  
Vol 3 (3) ◽  
pp. 191-200
Author(s):  
M. Syamsuddin Wisnubroto ◽  
Marsudi Siburian ◽  
Febri Dwi Irawati

Proteins interact with other proteins, DNA, and other molecules, forming large-scale protein interaction networks and for easy analysis, clustering methods are needed. Regularized Markov clustering algorithm is an improvement of MCL where operations on expansion are replaced by new operations that update the flow distributions of each node. But to reduce the weaknesses of the RMCL optimization, Pigeon Inspired Optimization Algorithm (PIO) is used to replace the inflation parameters. The simulation results of IPC SARS-Cov-2 (COVID-19) inflation parameters  get the result of 42 proteins as the center of the cluster and 8 protein pairs interacting with each other. Proteins of COVID-19 that interact with 20 or more proteins are ORF8, NSP13, NSP7, M, N, ORF9C, NSP8, and NSP1. Their interactions might be used as a target for drug research.


2012 ◽  
Vol 11 (1) ◽  
pp. 48-64 ◽  
Author(s):  
Mauro Bisson ◽  
Massimo Bernaschi ◽  
Simone Melchionna ◽  
Sauro Succi ◽  
Efthimios Kaxiras

AbstractThe parallel implementation of MUPHY, a concurrent multiscale code for large-scale hemodynamic simulations in anatomically realistic geometries, for multi-GPU platforms is presented. Performance tests show excellent results, with a nearly linear parallel speed-up on up to 32GPUs and a more than tenfold GPU/CPU acceleration, all across the range of GPUs. The basic MUPHY scheme combines a hydrokinetic (Lattice Boltzmann) representation of the blood plasma, with a Particle Dynamics treatment of suspended biological bodies, such as red blood cells. To the best of our knowledge, this represents the first effort in the direction of laying down general design principles for multiscale/physics parallel Particle Dynamics applications in non-ideal geometries. This configures the present multi-GPU version of MUPHY as one of the first examples of a high-performance parallel code for multiscale/physics biofluidic applications in realistically complex geometries.


2009 ◽  
Vol 35 (7) ◽  
pp. 859-866
Author(s):  
Ming LIU ◽  
Xiao-Long WANG ◽  
Yuan-Chao LIU

2021 ◽  
Vol 13 (3) ◽  
pp. 355
Author(s):  
Weixian Tan ◽  
Borong Sun ◽  
Chenyu Xiao ◽  
Pingping Huang ◽  
Wei Xu ◽  
...  

Classification based on polarimetric synthetic aperture radar (PolSAR) images is an emerging technology, and recent years have seen the introduction of various classification methods that have been proven to be effective to identify typical features of many terrain types. Among the many regions of the study, the Hunshandake Sandy Land in Inner Mongolia, China stands out for its vast area of sandy land, variety of ground objects, and intricate structure, with more irregular characteristics than conventional land cover. Accounting for the particular surface features of the Hunshandake Sandy Land, an unsupervised classification method based on new decomposition and large-scale spectral clustering with superpixels (ND-LSC) is proposed in this study. Firstly, the polarization scattering parameters are extracted through a new decomposition, rather than other decomposition approaches, which gives rise to more accurate feature vector estimate. Secondly, a large-scale spectral clustering is applied as appropriate to meet the massive land and complex terrain. More specifically, this involves a beginning sub-step of superpixels generation via the Adaptive Simple Linear Iterative Clustering (ASLIC) algorithm when the feature vector combined with the spatial coordinate information are employed as input, and subsequently a sub-step of representative points selection as well as bipartite graph formation, followed by the spectral clustering algorithm to complete the classification task. Finally, testing and analysis are conducted on the RADARSAT-2 fully PolSAR dataset acquired over the Hunshandake Sandy Land in 2016. Both qualitative and quantitative experiments compared with several classification methods are conducted to show that proposed method can significantly improve performance on classification.


2021 ◽  
Vol 13 (2) ◽  
pp. 176
Author(s):  
Peng Zheng ◽  
Zebin Wu ◽  
Jin Sun ◽  
Yi Zhang ◽  
Yaoqin Zhu ◽  
...  

As the volume of remotely sensed data grows significantly, content-based image retrieval (CBIR) becomes increasingly important, especially for cloud computing platforms that facilitate processing and storing big data in a parallel and distributed way. This paper proposes a novel parallel CBIR system for hyperspectral image (HSI) repository on cloud computing platforms under the guide of unmixed spectral information, i.e., endmembers and their associated fractional abundances, to retrieve hyperspectral scenes. However, existing unmixing methods would suffer extremely high computational burden when extracting meta-data from large-scale HSI data. To address this limitation, we implement a distributed and parallel unmixing method that operates on cloud computing platforms in parallel for accelerating the unmixing processing flow. In addition, we implement a global standard distributed HSI repository equipped with a large spectral library in a software-as-a-service mode, providing users with HSI storage, management, and retrieval services through web interfaces. Furthermore, the parallel implementation of unmixing processing is incorporated into the CBIR system to establish the parallel unmixing-based content retrieval system. The performance of our proposed parallel CBIR system was verified in terms of both unmixing efficiency and accuracy.


Sign in / Sign up

Export Citation Format

Share Document