Parallel distributed-memory simplex for large-scale stochastic LP problems

AbstractDe novo genome assembly is a fundamental problem in the field of bioinformatics, that aims to assemble the DNA sequence of an unknown genome from numerous short DNA fragments (aka reads) obtained from it. With the advent of high-throughput sequencing technologies, billions of reads can be generated in a matter of hours, necessitating efficient parallelization of the assembly process. While multiple parallel solutions have been proposed in the past, conducting a large-scale assembly at scale remains a challenging problem because of the inherent complexities associated with data movement, and irregular access footprints of memory and I/O operations. In this paper, we present a novel algorithm, called PaKman, to address the problem of performing large-scale genome assemblies on a distributed memory parallel computer. Our approach focuses on improving performance through a combination of novel data structures and algorithmic strategies for reducing the communication and I/O footprint during the assembly process. PaKman presents a solution for the two most time-consuming phases in the full genome assembly pipeline, namely, k-mer counting and contig generation.A key aspect of our algorithm is its graph data structure, which comprises fat nodes (or what we call “macro-nodes”) that reduce the communication burden during contig generation. We present an extensive performance and qualitative evaluation of our algorithm, including comparisons to other state-of-the-art parallel assemblers. Our results demonstrate the ability to achieve near-linear speedups on up to 8K cores (tested); outperform state-of-the-art distributed memory and shared memory tools in performance while delivering comparable (if not better) quality; and reduce time to solution significantly. For instance, PaKman is able to generate a high-quality set of assembled contigs for complex genomes such as the human and wheat genomes in a matter of minutes on 8K cores.

Download Full-text

Numerical simulation of large-scale combustion processes on distributed memory parallel computers using MPI

Parallel Computational Fluid Dynamics 1996 ◽

10.1016/b978-044482327-4/50119-4 ◽

1997 ◽

pp. 416-423 ◽

Cited By ~ 1

Author(s):

J. Lepper ◽

U. Schnell ◽

K.R.G. Hein

Keyword(s):

Numerical Simulation ◽

Large Scale ◽

Distributed Memory ◽

Parallel Computers ◽

Combustion Processes

Download Full-text

Large-Scale Normal Coordinate Analysis on Distributed Memory Parallel Systems

The International Journal of High Performance Computing Applications ◽

10.1177/109434200201600404 ◽

2002 ◽

Vol 16 (4) ◽

pp. 409-424 ◽

Cited By ~ 1

Author(s):

Chao Yang ◽

Padma Raghavan ◽

Lloyd Arrowood ◽

Donald W. Noid ◽

Bobby G. Sumpter ◽

...

Keyword(s):

Large Scale ◽

Distributed Memory ◽

Performance Enhancement ◽

Low Frequency ◽

Parallel Systems ◽

Data Partitioning ◽

Molecular Vibration ◽

Lanczos Algorithm ◽

Molecular Systems ◽

Computing Platforms

Summary A parallel computational scheme for analyzing large-scale molecular vibration on distributed memory computing platforms is presented in this paper. This method combines the implicitly restarted Lanczos algorithm with a state-of-art parallel sparse direct solver to compute a set of low frequency vibrational modes for molecular systems containing tens of thousands of atoms. Although the original motivation for developing such a scheme was to overcome memory limitations on traditional sequential and shared memory machines, our computational experiments show that with a careful parallel design and data partitioning scheme one can achieve scalable performance on lightly coupled distributed memory parallel systems. In particular, we demonstrate performance enhancement achieved by using the latency tolerant “selective inversion” scheme in the sparse triangular substitution phase of the computation.

Download Full-text

An effective garbage collection strategy for parallel programming languages on large scale distributed-memory machines

ACM SIGPLAN Notices ◽

10.1145/263767.263801 ◽

1997 ◽

Vol 32 (7) ◽

pp. 264-275 ◽

Cited By ~ 1

Author(s):

Kenjiro Taura ◽

Akinori Yonezawa

Keyword(s):

Programming Languages ◽

Parallel Programming ◽

Large Scale ◽

Distributed Memory ◽

Garbage Collection ◽

Distributed Memory Machines ◽

Parallel Programming Languages

Download Full-text

Object-Oriented Support for Adaptive Methods on Paranel Machines

Scientific Programming ◽

10.1155/1993/474972 ◽

1993 ◽

Vol 2 (4) ◽

pp. 179-192

Author(s):

Sandeep Bhatt ◽

Marina Chen ◽

James Cowie ◽

Cheng-Yee Lin ◽

Pangfeng Liu

Keyword(s):

Load Balancing ◽

Data Structures ◽

Large Scale ◽

Distributed Memory ◽

Object Oriented ◽

Adaptive Methods ◽

Programming Methodology ◽

Distributed Memory Machines ◽

Irregular Data ◽

Tree Codes

This article reports on experiments from our ongoing project whose goal is to develop a C++ library which supports adaptive and irregular data structures on distributed memory supercomputers. We demonstrate the use of our abstractions in implementing "tree codes" for large-scale N-body simulations. These algorithms require dynamically evolving treelike data structures, as well as load-balancing, both of which are widely believed to make the application difficult and cumbersome to program for distributed-memory machines. The ease of writing the application code on top of our C++ library abstractions (which themselves are application independent), and the low overhead of the resulting C++ code (over hand-crafted C code) supports our belief that object-oriented approaches are eminently suited to programming distributed-memory machines in a manner that (to the applications programmer) is architecture-independent. Our contribution in parallel programming methodology is to identify and encapsulate general classes of communication and load-balancing strategies useful across applications and MIMD architectures. This article reports experimental results from simulations of half a million particles using multiple methods.

Download Full-text

SOME METACOMPUTING EXPERIENCES FOR SCIENTIFIC APPLICATIONS

Parallel Processing Letters ◽

10.1142/s0129626499000232 ◽

1999 ◽

Vol 09 (02) ◽

pp. 243-252 ◽

Cited By ~ 2

Author(s):

O. LARSSON ◽

M. FEIG ◽

L. JOHNSSON

Keyword(s):

Molecular Dynamics ◽

Thin Layer ◽

High Performance ◽

Large Scale ◽

Distributed Memory ◽

Parallel Applications ◽

Globus Toolkit ◽

Performance Measurements ◽

Correct Execution ◽

Application Codes

We demonstrate good metacomputing efficiency and portability for three typical large-scale parallel applications; one molecular dynamics code and two electromagnetics codes. The codes were developed for distributed memory parallel platforms using Fortran77 or Fortran90 with MPI. The performance measurements were made for a testbed of two IBM SPs connected through the vBNS. No change of the application codes were required for correct execution of the codes on the testbed using the Globus Toolkit for the required metacomputing services. However, we observe that for good performance, it may be necessary for MPI codes to make use of overlapped computation and communication. For such MPI codes, a communications library designed for hierarchical or clustered communication can yield very good metacomputing efficiencies when high-performance networks, such as the vBNS or the Abilene networks, such as the vBNS or the Abilene networks, are used for platform connectivity. We demonstrate this by inserting a thin layer between the MPI application and the MPI libraries, providing some clustering of communications between platforms.

Download Full-text