scholarly journals Improving data transfer for model coupling

2015 ◽  
Vol 8 (10) ◽  
pp. 8981-9020 ◽  
Author(s):  
C. Zhang ◽  
L. Liu ◽  
G. Yang ◽  
R. Li ◽  
B. Wang

Abstract. Data transfer, which means transferring data fields between two component models or rearranging data fields among processes of the same component model, is a fundamental operation of a coupler. Most of state-of-the-art coupler versions currently use an implementation based on the point-to-point (P2P) communication of the Message Passing Interface (MPI) (call such an implementation "P2P implementation" for short). In this paper, we reveal the drawbacks of the P2P implementation, including low communication bandwidth due to small message size, variable and big number of MPI messages, and jams during communication. To overcome these drawbacks, we propose a butterfly implementation for data transfer. Although the butterfly implementation can outperform the P2P implementation in many cases, it degrades the performance in some cases because the total message size transferred by the butterfly implementation is larger than that by the P2P implementation. To make the data transfer completely improved, we design and implement an adaptive data transfer library that combines the advantages of both butterfly implementation and P2P implementation. Performance evaluation shows that the adaptive data transfer library significantly improves the performance of data transfer in most cases and does not decrease the performance in any cases. Now the adaptive data transfer library is open to the public and has been imported into a coupler version C-Coupler1 for performance improvement of data transfer. We believe that it can also improve other coupler versions.

2016 ◽  
Vol 9 (6) ◽  
pp. 2099-2113
Author(s):  
Cheng Zhang ◽  
Li Liu ◽  
Guangwen Yang ◽  
Ruizhe Li ◽  
Bin Wang

Abstract. Data transfer means transferring data fields from a sender to a receiver. It is a fundamental and frequently used operation of a coupler. Most versions of state-of-the-art couplers currently use an implementation based on the point-to-point (P2P) communication of the message passing interface (MPI) (referred to as “P2P implementation” hereafter). In this paper, we reveal the drawbacks of the P2P implementation when the parallel decompositions of the sender and the receiver are different, including low communication bandwidth due to small message size, variable and high number of MPI messages, as well as network contention. To overcome these drawbacks, we propose a butterfly implementation for data transfer. Although the butterfly implementation outperforms the P2P implementation in many cases, it degrades the performance when the sender and the receiver have similar parallel decompositions or when the number of processes used for running models is small. To ensure data transfer with optimal performance, we design and implement an adaptive data transfer library that combines the advantages of both butterfly implementation and P2P implementation. As the adaptive data transfer library automatically uses the best implementation for data transfer, it outperforms the P2P implementation in many cases while it does not decrease the performance in any cases. Now, the adaptive data transfer library is open to the public and has been imported into the C-Coupler1 coupler for performance improvement of data transfer. We believe that other couplers can also benefit from this.


2009 ◽  
Vol 6 (2) ◽  
pp. 23
Author(s):  
Siti Arpah Ahmad ◽  
Mohamed Faidz Mohamed Said ◽  
Norazan Mohamed Ramli ◽  
Mohd Nasir Taib

This paper focuses on the performance of basic communication primitives, namely the overlap of message transfer with computation in the point-to-point communication within a small cluster of four nodes. The mpptest has been implemented to measure the basic performance of MPI message passing routines with a variety of message sizes. The mpptest is capable of measuring performance with many participating processes thus exposing contention and scalability problems. This enables programmers to select message sizes in order to isolate and evaluate sudden changes in performance. Investigating these matters is interesting in that non-blocking calls have the advantage of allowing the system to schedule communications even when many processes are running simultaneously. On the other hand, understanding the characteristics of computation and communication overlap is significant, because high- performance kernels often strive to achieve this, since it is both advantageous with respect to data transfer and latency hiding. The results indicate that certain overlap sizes utilize greater node processing power either in blocking send and receive operations or non-blocking send and receive operations. The results have elucidated a detailed MPI characterization of the performance regarding the overlap of message transfer with computation in a small cluster system. 


Author(s):  
Omer Subasi ◽  
Tatiana Martsinkevich ◽  
Ferad Zyulkyarov ◽  
Osman Unsal ◽  
Jesus Labarta ◽  
...  

We present a unified fault-tolerance framework for task-parallel message-passing applications to mitigate transient errors. First, we propose a fault-tolerant message-logging protocol that only requires the restart of the task that experienced the error and transparently handles any message passing interface calls inside the task. In our experiments we demonstrate that our fault-tolerant solution has a reasonable overhead, with a maximum observed overhead of 4.5%. We also show that fine-grained parallelization is important for hiding the overheads related to the protocol as well as the recovery of tasks. Secondly, we develop a mathematical model to unify task-level checkpointing and our protocol with system-wide checkpointing in order to provide complete failure coverage. We provide closed formulas for the optimal checkpointing interval and the performance score of the unified scheme. Experimental results show that the performance improvement can be as high as 98% with the unified scheme.


Author(s):  
Brendan J. Donegan ◽  
Daniel C. Doolan ◽  
Sabin Tabirca

The Mobile Message Passing Interface is a library which implements MPI functionality on Bluetooth enabled mobile phones. It provides many of the functions available in MPI, including point-to-point and global communication. The main restriction of the library is that it was designed to work over Bluetooth piconets. Piconet based networks provide for a maximum of eight devices connected together simultaneously. This limits the libraries usefulness for parallel computing. A solution to solve this problem is presented that provides the same functionality as the original Mobile MPI library, but implemented over a Bluetooth scatternet. A scatternet may be defined as a number of piconets interconnected by common node(s). An outline of the scatternet design is explained and its major components discussed.


Author(s):  
Daniel C. Doolan ◽  
Sabin Tabirca ◽  
Laurence T. Yang

The Message Passing Interface (MPI) was published as a standard in 1992. Since then, many implementations have been developed. The MPICH library is one of the most well-known and freely available implementations. These libraries allow for the simplification of parallel computing on clusters and parallel machines. The system provides the developer with an easy-to-use set of functions for point-to-point and global communications. The details of how the actual communication takes place are hidden from the programmers, allowing them to focus on the domain-specific problem at hand. Communication between nodes on such systems is carried out via high-speed cabled interconnects (Gigabit Ethernet and upwards). The world of mobile computing, especially mobile phones, is now a ubiquitous technology. Mobile devices do not have any facility to allow for connections using traditional high-speed cabling; therefore, it is necessary to make use of wireless communication mechanisms to achieve interdevice communication. The majority of medium- to high-end phones are Bluetooth-enabled as standard, allowing for wireless communication to take place. The Mobile Message Passing Interface (MMPI) provides the developer with an intuitive set of functions to allow for communications between nodes (mobile phones) across a Bluetooth network. This chapter looks at the MMPI library and how it may be used for parallel computing on mobile phones (Smartphones).


2012 ◽  
Vol 9 (4) ◽  
pp. 1361-1383
Author(s):  
Yufei Lin ◽  
Xinhai Xu ◽  
Yuhua Tang ◽  
Xin Zhang ◽  
Xiaowei Guo

Designing and implementing a large-scale parallel system can be time-consuming and costly. It is therefore desirable to enable system developers to predict the performance of a parallel system at its design phase so that they can evaluate design alternatives to better meet performance requirements. Before the target machine is completely built, the developers can always build an symmetric multi-processor (SMP) for evaluation purposes. In this paper, we introduce an SMP-based discrete-event execution-driven performance simulation method for message passing interface (MPI) programs and describe the design and implementation of a simulator called SMP-SIM. As the processes share the same memory space in an SMP, SMP-SIM manages the events globally at the granularity of central processing units (CPUs). Furthermore, by re-implementing core MPI point-to-point communication primitives, SMP-SIM handles the communication virtually and sequential computation actually. Our experimental results show that SMP-SIM is highly accurate and scalable, resulting in errors of less than 7.60% for both SMP and SMP-Cluster target machines.


2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye ◽  
Juniarto Samsudin ◽  
Yongqing Zhu

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.


Energies ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 2284
Author(s):  
Krzysztof Przystupa ◽  
Mykola Beshley ◽  
Olena Hordiichuk-Bublivska ◽  
Marian Kyryk ◽  
Halyna Beshley ◽  
...  

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.


1996 ◽  
Vol 22 (6) ◽  
pp. 789-828 ◽  
Author(s):  
William Gropp ◽  
Ewing Lusk ◽  
Nathan Doss ◽  
Anthony Skjellum

Sign in / Sign up

Export Citation Format

Share Document