ELS: Emulation system for debugging and tuning large-scale parallel programs on small clusters

Author(s):  
Fang Lin ◽  
Yi Liu ◽  
Yayu Guo ◽  
Depei Qian
Author(s):  
Adrian Jackson ◽  
Michèle Weiland

This chapter describes experiences using Cloud infrastructures for scientific computing, both for serial and parallel computing. Amazon’s High Performance Computing (HPC) Cloud computing resources were compared to traditional HPC resources to quantify performance as well as assessing the complexity and cost of using the Cloud. Furthermore, a shared Cloud infrastructure is compared to standard desktop resources for scientific simulations. Whilst this is only a small scale evaluation these Cloud offerings, it does allow some conclusions to be drawn, particularly that the Cloud can currently not match the parallel performance of dedicated HPC machines for large scale parallel programs but can match the serial performance of standard computing resources for serial and small scale parallel programs. Also, the shared Cloud infrastructure cannot match dedicated computing resources for low level benchmarks, although for an actual scientific code, performance is comparable.


2003 ◽  
Vol 19 (5) ◽  
pp. 689-700
Author(s):  
Dieter Kranzlmüller ◽  
Nam Thoai ◽  
Jens Volkert

2019 ◽  
Author(s):  
Kexue Li ◽  
Lili Wang ◽  
Lizhen Shi ◽  
Li Deng ◽  
Zhong Wang

ABSTRACTMotivationMetagenome assembly from short next-generation sequencing data is a challenging process due to its large scale and computational complexity. Clustering short reads before assembly offers a unique opportunity for parallel downstream assembly of genomes with individualized optimization. However, current read clustering methods suffer either false negative (under-clustering) or false positive (over-clustering) problems.ResultsBased on a previously developed scalable read clustering method on Apache Spark, SpaRC, that has very low false positives, here we extended its capability by adding a new method to further cluster small clusters. This method exploits statistics derived from multiple samples in a dataset to reduce the under-clustering problem. Using a synthetic dataset from mouse gut microbiomes we show that this method has the potential to cluster almost all of the reads from genomes with sufficient sequencing coverage. We also explored several clustering parameters that deferentially affect genomes with various sequencing coverage.Availabilityhttps://bitbucket.org/berkeleylab/jgi-sparc/[email protected]


Sign in / Sign up

Export Citation Format

Share Document