A STATIC EXECUTION MODEL FOR DATA PARALLELISM

1994 ◽  
Vol 04 (04) ◽  
pp. 367-378 ◽  
Author(s):  
C. GERMAIN ◽  
F. DELAPLACE ◽  
R. CARLIER

The performance of parallel architectures are limited at least as much by data transfer ability as by computing power. The main limit concerns the transfers on the interconnection network. But it becomes apparent that a majority of these communications can be known at compile time. The static model intends to exploit this a priori knowledge in order to drastically reduce the overhead of message passing, the ultimate goal being to confine the oommunication delays to the hardware propagation delays. In the paper, we present an abstract machine which is the target of a static-oriented compilation. We show how to recognize and sequence the static communication patterns, and we discuss the application scope of the model.

Author(s):  
Michael Withnall ◽  
Edvard Lindelöf ◽  
Ola Engkvist ◽  
Hongming Chen

We introduce Attention and Edge Memory schemes to the existing Message Passing Neural Network framework for graph convolution, and benchmark our approaches against eight different physical-chemical and bioactivity datasets from the literature. We remove the need to introduce <i>a priori</i> knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.


2021 ◽  
Vol 08 (03) ◽  
pp. 01-15
Author(s):  
Celine Azar

Embedded platforms are projected to integrate hundreds of cores in the near future, and expanding the interconnection network remains a key challenge. We propose SNet, a new Scalable NETwork paradigm that extends the NoCs area to include a software/hardware dynamic routing mechanism. To design routing pathways among communicating processes, it uses a distributed, adaptive, non-supervised routing method based on the ACO algorithm (Ant Colony Optimization). A small footprint hardware unit called DMC speeds up data transfer (Direct Management of Communications). SNet has the benefit of being extremely versatile, allowing for the creation of a broad range of routing topologies to meet the needs of various applications. We provide the DMC module in this work and assess SNet performance by executing a large number of test cases.


2015 ◽  
Vol 8 (10) ◽  
pp. 8981-9020 ◽  
Author(s):  
C. Zhang ◽  
L. Liu ◽  
G. Yang ◽  
R. Li ◽  
B. Wang

Abstract. Data transfer, which means transferring data fields between two component models or rearranging data fields among processes of the same component model, is a fundamental operation of a coupler. Most of state-of-the-art coupler versions currently use an implementation based on the point-to-point (P2P) communication of the Message Passing Interface (MPI) (call such an implementation "P2P implementation" for short). In this paper, we reveal the drawbacks of the P2P implementation, including low communication bandwidth due to small message size, variable and big number of MPI messages, and jams during communication. To overcome these drawbacks, we propose a butterfly implementation for data transfer. Although the butterfly implementation can outperform the P2P implementation in many cases, it degrades the performance in some cases because the total message size transferred by the butterfly implementation is larger than that by the P2P implementation. To make the data transfer completely improved, we design and implement an adaptive data transfer library that combines the advantages of both butterfly implementation and P2P implementation. Performance evaluation shows that the adaptive data transfer library significantly improves the performance of data transfer in most cases and does not decrease the performance in any cases. Now the adaptive data transfer library is open to the public and has been imported into a coupler version C-Coupler1 for performance improvement of data transfer. We believe that it can also improve other coupler versions.


2009 ◽  
Vol 6 (2) ◽  
pp. 23
Author(s):  
Siti Arpah Ahmad ◽  
Mohamed Faidz Mohamed Said ◽  
Norazan Mohamed Ramli ◽  
Mohd Nasir Taib

This paper focuses on the performance of basic communication primitives, namely the overlap of message transfer with computation in the point-to-point communication within a small cluster of four nodes. The mpptest has been implemented to measure the basic performance of MPI message passing routines with a variety of message sizes. The mpptest is capable of measuring performance with many participating processes thus exposing contention and scalability problems. This enables programmers to select message sizes in order to isolate and evaluate sudden changes in performance. Investigating these matters is interesting in that non-blocking calls have the advantage of allowing the system to schedule communications even when many processes are running simultaneously. On the other hand, understanding the characteristics of computation and communication overlap is significant, because high- performance kernels often strive to achieve this, since it is both advantageous with respect to data transfer and latency hiding. The results indicate that certain overlap sizes utilize greater node processing power either in blocking send and receive operations or non-blocking send and receive operations. The results have elucidated a detailed MPI characterization of the performance regarding the overlap of message transfer with computation in a small cluster system. 


2020 ◽  
Vol 12 (1) ◽  
Author(s):  
M. Withnall ◽  
E. Lindelöf ◽  
O. Engkvist ◽  
H. Chen

AbstractNeural Message Passing for graphs is a promising and relatively recent approach for applying Machine Learning to networked data. As molecules can be described intrinsically as a molecular graph, it makes sense to apply these techniques to improve molecular property prediction in the field of cheminformatics. We introduce Attention and Edge Memory schemes to the existing message passing neural network framework, and benchmark our approaches against eight different physical–chemical and bioactivity datasets from the literature. We remove the need to introduce a priori knowledge of the task and chemical descriptor calculation by using only fundamental graph-derived properties. Our results consistently perform on-par with other state-of-the-art machine learning approaches, and set a new standard on sparse multi-task virtual screening targets. We also investigate model performance as a function of dataset preprocessing, and make some suggestions regarding hyperparameter selection.


2001 ◽  
Vol 02 (03) ◽  
pp. 345-364 ◽  
Author(s):  
DAVID RIDDOCH ◽  
STEVE POPE ◽  
DEREK ROBERTS ◽  
GLENFORD MAPP ◽  
DAVID CLARKE ◽  
...  

Existing user-level network interfaces deliver high bandwidth, low latency performance to applications, but are typically unable to support diverse styles of communication and are unsuitable for use in multiprogrammed environments. Often this is because the network abstraction is presented at too high a level, and support for synchronisation is inflexible. In this paper we present a new primitive for in-band synchronisation: the Tripwire. Tripwires provide a flexible, efficient and scalable means for synchronisation that is orthogonal to data transfer. We describe the implementation of a non-coherent distributed shared memory network interface, with Tripwires for synchronisation. This interface provides a low-level communications model with gigabit class bandwidth and very low overhead and latency. We show how it supports a variety of communication styles, including remote procedure call, message passing and streaming.


2000 ◽  
Vol 01 (02) ◽  
pp. 73-94
Author(s):  
A. FERREIRA ◽  
A. GOLDMAN ◽  
S. W. SONG

In most distributed memory MIMD multiprocessors, processors are connected by a point-to-point interconnection network, usually modeled by a graph where processors are nodes and communication links are edges. Since interprocessor communication frequently constitutes serious bottlenecks, several architectures were proposed that enhance point-to-point topologies with the help of multiple bus systems so as to improve the communication efficiency. In this paper we study parallel architectures where the communication means are constituted solely by buses. These architectures can use the power of bus technologies, providing a way to interconnect much more processors in a simple and efficient manner. We present the hyperpath, hypergrid, hyperring, and hypertorus architectures, which are the bus-based versions of the well used point-to-point interconnection networks. Using (hyper) graph theoretic concepts to model inter-processor communication in such networks, we give optimal algorithms for broadcasting a message from one processor to all the others. For deriving high performance communication patterns we developed a new tool called simplification. The idea is to construct a graph, to be called representative graph, from the original hyper-topology, in such a way that it will become easy to describe and perform communication schemes to the former that will fit to the latter, because the simplification concept also allows us to partially use some already known communication algorithms for usual networks.


2005 ◽  
Vol 06 (03) ◽  
pp. 245-263 ◽  
Author(s):  
Dang Minh Quan ◽  
Odej Kao

Service Level Agreements (SLAs) are currently one of the major research topics in Grid Computing, as they serve as a foundation for reliable and predictable Grids. SLAs define an explicit statement of expectations and obligations in a business relationship between provider and customer. Thus, SLAs should guarantee the desired and a-priori negotiated Quality of Service (QoS), which is a mandatory prerequisite for the Next Generation Grids. This development is proved by a manifold research work about SLAs and architectures for implementing SLAs in Grid environments. However, this work is mostly related to SLAs for standard, monolithic Grid jobs and neglects the dependencies between different steps of operation. The complexity of an SLA-specification for workflows grows significantly, as characteristics of correlated sub-jobs, the data transfer phases, the deadline constraints and possible failures have to be considered. Thus, an architecture for an SLA-aware workflow implementation needs sophisticated mechanisms for specification and management, sub-job mapping, data transfer optimization and fault reaction. Therefore, this paper presents an architecture for SLA-aware Grid workflows. The main contributions are an improved specification language for SLA-aware workflows, a mapping and optimization algorithm for sub-job assignment to Grid resources and a prototype implementation using standard middleware. Experimental measurements prove the quality of the development.


2020 ◽  
Vol 245 ◽  
pp. 03027
Author(s):  
David Cameron ◽  
Vincent Garonne ◽  
Paul Millar ◽  
Shaojun Sun ◽  
Wenjing Wu

ATLAS@Home is a volunteer computing project which enables members of the public to contribute computing power to run simulations of the ATLAS experiment at CERN’s Large Hadron Collider. The computing resources provided to ATLAS@Home increasingly come not only from traditional volunteers, but also from data centres or office computers at institutes associated to ATLAS. The design of ATLAS@Home was built around not giving out sensitive credentials to volunteers, which means that a sandbox is needed to bridge data transfers between trusted and untrusted domains. As the scale of ATLAS@Home increases, this sandbox becomes a potential data management bottleneck. This paper explores solutions to this problem based on relaxing the constraints of sending credentials to trusted volunteers, allowing direct data transfer to grid storage and avoiding the intermediate sandbox. Fully trusted resources such as grid worker nodes can run with full access to grid storage, whereas semi-trusted resources such as student desktops can be provided with “macaroons”: time-limited access tokens which can only be used for specific files. The steps towards implementing these solutions as well as initial results with real ATLAS simulation tasks are discussed along with the experience gained so far and the next steps in the project.


Sign in / Sign up

Export Citation Format

Share Document