scholarly journals Coarrays in the Context of XcalableMP

Author(s):  
Hidetoshi Iwashita ◽  
Masahiro Nakao

AbstractCoarray features have been implemented on the Omni XcalableMP compiler with a source-to-source translator and layered runtime libraries. Three memory allocation methods for coarrays were implemented for the GASNet and MPI-3 communication libraries and the native interface of Fujitsu. For the coarray PUT/GET communication, algorithms using DMA (zero-copy) and buffering were introduced. Important techniques for achieving high performance were the non-blocking PUT communication implemented in the runtime library and the optimization for the GET communication in the translator. Using the ping-pong benchmark and the modified version, the fundamental performance was evaluated and analyzed. The MPI version of the Himeno benchmark was ported to the coarray version and modified for fully using the non-blocking PUT. As a result of the evaluation, the non-blocking coarray version clearly outperformed the original and non-blocking MPI versions.

2000 ◽  
Vol 01 (02) ◽  
pp. 73-94
Author(s):  
A. FERREIRA ◽  
A. GOLDMAN ◽  
S. W. SONG

In most distributed memory MIMD multiprocessors, processors are connected by a point-to-point interconnection network, usually modeled by a graph where processors are nodes and communication links are edges. Since interprocessor communication frequently constitutes serious bottlenecks, several architectures were proposed that enhance point-to-point topologies with the help of multiple bus systems so as to improve the communication efficiency. In this paper we study parallel architectures where the communication means are constituted solely by buses. These architectures can use the power of bus technologies, providing a way to interconnect much more processors in a simple and efficient manner. We present the hyperpath, hypergrid, hyperring, and hypertorus architectures, which are the bus-based versions of the well used point-to-point interconnection networks. Using (hyper) graph theoretic concepts to model inter-processor communication in such networks, we give optimal algorithms for broadcasting a message from one processor to all the others. For deriving high performance communication patterns we developed a new tool called simplification. The idea is to construct a graph, to be called representative graph, from the original hyper-topology, in such a way that it will become easy to describe and perform communication schemes to the former that will fit to the latter, because the simplification concept also allows us to partially use some already known communication algorithms for usual networks.


2011 ◽  
Vol 320 ◽  
pp. 335-340 ◽  
Author(s):  
Ji Tang Liu ◽  
Zhao Song Ma ◽  
Shi Hai Li ◽  
Ying Zhao

GPUs are high performance co-processors of CPU for scientific computing including CFD. We present an optimistic shared memory allocation strategy to solve 2D CFD problems using Red-Black SOR method on GPU with CUDA (Compute Unified Device Architecture). Lid-driven results are compared with the benchmark data. The speed up ratio of same problem size by using NVDIA GTX480 and Intel Core-Dual 3.0GHz processor is discussed, the performance of GPU is 120 times faster than the sequential code on CPU with the problem size of 756756. Based on this work, we conclude that using the memory hierarchy properly has a key role in improving the computational performance of GPU.


2013 ◽  
Vol 60 (6) ◽  
pp. 4595-4602 ◽  
Author(s):  
Gerry Bauer ◽  
Ulf Behrens ◽  
James Branson ◽  
Sebastian Bukowiec ◽  
Olivier Chaze ◽  
...  

2011 ◽  
Vol 48-49 ◽  
pp. 902-905
Author(s):  
Jiang Sun ◽  
Ju Long Lan ◽  
Yu Feng Li

According to zero-copy idea and the application of multi-core binding to realize a high-performance packet capture platform based on multi-core binding(MCPCP).By modifying the memory management mode about sk_buff in kernel,realize the user space program to directly access the data packet, which is a kind of universal significance of the zero-copy scheme. And then through the multi-core binding technique, for each CPU core scheduling and control, with multi-threaded user programs can minimize the cache jitter to improve the efficiency of packet capture. Experiments show that in the case of low-end configuration, the throughputs of MCPCP for 64Byte and 1500Byte messages are 620 ,000pps (about 320Mbps) and 78,000pps (about 941Mbps) respectively. In the high-end configuration, can reach 1.46 million pps (748Mbps) and 81,000 pps (979Mbps).MCPCP surpasses the traditional ones' in performance.


2007 ◽  
Vol 189 (11) ◽  
pp. 3954-3959 ◽  
Author(s):  
Zhe Yang ◽  
Chung-Dar Lu

ABSTRACT The arginine transaminase (ATA) pathway represents one of the multiple pathways for l-arginine catabolism in Pseudomonas aeruginosa. The AruH protein was proposed to catalyze the first step in the ATA pathway, converting the substrates l-arginine and pyruvate into 2-ketoarginine and l-alanine. Here we report the initial biochemical characterization of this enzyme. The aruH gene was overexpressed in Escherichia coli, and its product was purified to homogeneity. High-performance liquid chromatography and mass spectrometry (MS) analyses were employed to detect the presence of the transamination products 2-ketoarginine and l-alanine, thus demonstrating the proposed biochemical reaction catalyzed by AruH. The enzymatic properties and kinetic parameters of dimeric recombinant AruH were determined by a coupled reaction with NAD+ and l-alanine dehydrogenase. The optimal activity of AruH was found at pH 9.0, and it has a novel substrate specificity with an order of preference of Arg > Lys > Met > Leu > Orn > Gln. With l-arginine and pyruvate as the substrates, Lineweaver-Burk plots of the data revealed a series of parallel lines characteristic of a ping-pong kinetic mechanism with calculated V max and k cat values of 54.6 ± 2.5 μmol/min/mg and 38.6 ± 1.8 s−1. The apparent Km and catalytic efficiency (k cat/Km ) were 1.6 ± 0.1 mM and 24.1 mM−1 s−1 for pyruvate and 13.9 ± 0.8 mM and 2.8 mM−1 s−1 for l-arginine. When l-lysine was used as the substrate, MS analysis suggested Δ1-piperideine-2-carboxylate as its transamination product. These results implied that AruH may have a broader physiological function in amino acid catabolism.


Sign in / Sign up

Export Citation Format

Share Document