scholarly journals Divide-and-conquer quantum mechanical material simulations with exascale supercomputers

2014 ◽  
Vol 1 (4) ◽  
pp. 604-617 ◽  
Author(s):  
Lin-Wang Wang

Abstract Recent developments in large-scale materials science simulations, especially under the divide-and-conquer method, are reviewed. The pros and cons of the divide-and-conquer method are discussed. It is argued that the divide-and-conquer method, such as the linear-scaling 3D fragment method, is an ideal approach to take advantage of the heterogeneous architectures of modern-day supercomputers despite their relatively large prefactors among linear-scaling methods. Some developments in graphics processing unit (GPU) electronic structure calculations are also reviewed. The accelerators like GPU could be an essential part for the future exascale supercomputing.

Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


Author(s):  
Timothy Dykes ◽  
Claudio Gheller ◽  
Marzia Rivi ◽  
Mel Krokos

With the increasing size and complexity of data produced by large-scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous high-performance computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel’s coprocessor based upon the new many integrated core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally we compare performance against results achieved with the Graphics Processing Unit (GPU) based implementation of Splotch.


Author(s):  
Shen Lu ◽  
Richard S. Segall

Big data is large-scale data and can be either discrete or continuous. This article entails research that discusses the continuous case of big data often called “data streaming.” More and more businesses will depend on being able to process and make decisions on streams of data. This article utilizes the algorithmic side of data stream processing often called “stream analytics” or “stream mining.” Data streaming Windows Join can be improved by using graphics processing unit (GPU) for higher performance computing. Data streams are generated by two independent threads: one thread can be used to generate Data Stream A, and the other thread can be used to generate Data Stream B. One would use a Windows Join thread to merge the two data streams, which is also the process of “Data Stream Window Join.” The Window Join process can be implemented in parallel that can efficiently improve the computing speed. Experiments are provided for Data Stream Window Joins using both static and dynamic data.


2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Congying Han ◽  
Tingting Feng ◽  
Guoping He ◽  
Tiande Guo

A modified parallel variable distribution (PVD) algorithm for solving large-scale constrained optimization problems is developed, which modifies quadratic subproblemQPlat each iteration instead of theQPl0of the SQP-type PVD algorithm proposed by C. A. Sagastizábal and M. V. Solodov in 2002. The algorithm can circumvent the difficulties associated with the possible inconsistency ofQPl0subproblem of the original SQP method. Moreover, we introduce a nonmonotone technique instead of the penalty function to carry out the line search procedure with more flexibly. Under appropriate conditions, the global convergence of the method is established. In the final part, parallel numerical experiments are implemented on CUDA based on GPU (Graphics Processing unit).


2010 ◽  
Vol 20 (04) ◽  
pp. 293-306 ◽  
Author(s):  
NIALL EMMART ◽  
CHARLES WEEMS

In this paper we evaluate the potential for using an NVIDIA graphics processing unit (GPU) to accelerate high precision integer multiplication, addition, and subtraction. The reported peak vector performance for a typical GPU appears to offer good potential for accelerating such a computation. Because of limitations in the on-chip memory, the high cost of kernel launches, and the nature of the architecture's support for parallelism, we used a hybrid algorithmic approach to obtain good performance on multiplication. On the GPU itself we adapt the Strassen FFT algorithm to multiply 32KB chunks, while on the CPU we adapt the Karatsuba divide-and-conquer approach to optimize application of the GPU's partial multiplies, which are viewed as "digits" by our implementation of Karatsuba. Even with this approach, the result is at best a factor of three increase in performance, compared with using the GMP package on a 64-bit CPU at a comparable technology node. Our implementations of addition and subtraction achieve up to a factor of eight improvement. We identify the issues that limit performance and discuss the likely impact of planned advances in GPU architecture.


Sign in / Sign up

Export Citation Format

Share Document