rMPI: Message Passing on Multicore Processors with On-Chip Interconnect

The growing number of cores increases the demand for a powerful memory subsystem which leads to enhancement in the size of caches in multicore processors. Caches are responsible for giving processing elements a faster, higher bandwidth local memory to work with. In this chapter, an attempt has been made to analyze the impact of cache size on performance of Multi-core processors by varying L1 and L2 cache size on the multicore processor with internal network (MPIN) referenced from NIAGRA architecture. As the number of core's increases, traditional on-chip interconnects like bus and crossbar proves to be low in efficiency as well as suffer from poor scalability. In order to overcome the scalability and efficiency issues in these conventional interconnect, ring based design has been proposed. The effect of interconnect on the performance of multicore processors has been analyzed and a novel scalable on-chip interconnection mechanism (INOC) for multicore processors has been proposed. The benchmark results are presented by using a full system simulator. Results show that, using the proposed INoC, compared with the MPIN; the execution time are significantly reduced.

Download Full-text

Trio: A Triple Class On-chip Network Design for Efficient Multicore Processors

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems ◽

10.1109/hpcc-css-icess.2015.44 ◽

2015 ◽

Cited By ~ 2

Author(s):

Thomas Canhao Xu ◽

Ville Leppanen ◽

Pasi Liljeberg ◽

Juha Plosila ◽

Hannu Tenhunen

Keyword(s):

Network Design ◽

Multicore Processors ◽

On Chip

Download Full-text

Network-on-chip customizations for message passing interface primitives††Part of this research was first presented at the IEEE 23rd International Conference on Application-Specific Systems, Architectures and Processors (ASAP-2012) [23].

Networks-On-Chip ◽

10.1016/b978-0-12-800979-6.00009-3 ◽

2015 ◽

pp. 285-315

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Network On Chip ◽

International Conference ◽

On Chip ◽

Systems Architectures ◽

Application Specific

Download Full-text

A Parallel Algorithm of String Matching Based on Message Passing Interface for Multicore Processors

International Journal of Hybrid Information Technology ◽

10.14257/ijhit.2016.9.3.04 ◽

2016 ◽

Vol 9 (3) ◽

pp. 31-38 ◽

Cited By ~ 3

Author(s):

Jiaxing Qu ◽

Guoyin Zhang ◽

Zhou Fang ◽

Jiahui Liu

Keyword(s):

Parallel Algorithm ◽

Message Passing ◽

Message Passing Interface ◽

Multicore Processors ◽

String Matching

Download Full-text

On-Chip Message Passing Sub-System for Embedded Inter-Domain Communication

IEEE Computer Architecture Letters ◽

10.1109/lca.2015.2419260 ◽

2016 ◽

Vol 15 (1) ◽

pp. 33-36 ◽

Cited By ~ 1

Author(s):

P. Garcia ◽

T. Gomes ◽

J. Monteiro ◽

A. Tavares ◽

M. Ekpanyapong

Keyword(s):

Message Passing ◽

On Chip

Download Full-text

A Parallel Encryption Algorithm Based on Piecewise Linear Chaotic Map

Mathematical Problems in Engineering ◽

10.1155/2013/537934 ◽

2013 ◽

Vol 2013 ◽

pp. 1-7 ◽

Cited By ~ 3

Author(s):

Xizhong Wang ◽

Deyun Chen

Keyword(s):

Parallel Algorithm ◽

Message Passing ◽

Message Passing Interface ◽

Multicore Processors ◽

Piecewise Linear ◽

Chaotic Map ◽

Encryption Algorithm ◽

Communication Model ◽

Piecewise Linear Chaotic Map ◽

Encryption Decryption

We introduce a parallel chaos-based encryption algorithm for taking advantage of multicore processors. The chaotic cryptosystem is generated by the piecewise linear chaotic map (PWLCM). The parallel algorithm is designed with a master/slave communication model with the Message Passing Interface (MPI). The algorithm is suitable not only for multicore processors but also for the single-processor architecture. The experimental results show that the chaos-based cryptosystem possesses good statistical properties. The parallel algorithm provides much better performance than the serial ones and would be useful to apply in encryption/decryption file with large size or multimedia.

Download Full-text

Accurate On-Chip Temperature Sensing for Multicore Processors Using Embedded Thermal Sensors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/tvlsi.2020.3012833 ◽

2020 ◽

Vol 28 (11) ◽

pp. 2328-2341

Author(s):

Xin Li ◽

Zhi Li ◽

Wei Zhou ◽

Zhemin Duan

Keyword(s):

Multicore Processors ◽

Temperature Sensing ◽

Thermal Sensors ◽

Chip Temperature ◽

On Chip

Download Full-text

Hybrid Message-Passing and Shared-Memory Programming in a Molecular Dynamics Application On Multicore Clusters

The International Journal of High Performance Computing Applications ◽

10.1177/1094342009106188 ◽

2009 ◽

Vol 23 (3) ◽

pp. 196-211 ◽

Cited By ~ 4

Author(s):

Martin J. Chorley ◽

David W. Walker ◽

Martyn F. Guest

Keyword(s):

Molecular Dynamics ◽

Shared Memory ◽

Hybrid Model ◽

Message Passing ◽

Multicore Processors ◽

Parallel Application ◽

Gigabit Ethernet ◽

Code Performance ◽

Programming Techniques ◽

Multicore Clusters

Hybrid programming, whereby shared-memory and message-passing programming techniques are combined within a single parallel application, has often been discussed as a method for increasing code performance on clusters of symmetric multiprocessors (SMPs). This paper examines whether the hybrid model brings any performance benefits for clusters based on multicore processors. A molecular dynamics application has been parallelized using both MPI and hybrid MPI/OpenMP programming models. The performance of this application has been examined on two high-end multicore clusters using both Infiniband and Gigabit Ethernet interconnects. The hybrid model has been found to perform well on the higher-latency Gigabit Ethernet connection, but offers no performance benefit on low-latency Infiniband interconnects. The changes in performance are attributed to the differing communication profiles of the hybrid and MPI codes.

Download Full-text

PERFORMANCE ANALYSIS OF A JPEG ENCODER MAPPED ONTO A VIRTUAL MPSoC-NoC ARCHITECTURE USING TLM 2.0.1

Journal of Circuits System and Computers ◽

10.1142/s0218126613500369 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1350036

Author(s):

F. A. ESCOBAR-JUZGA ◽

F. E. SEGURA-QUIJANO

Keyword(s):

Message Passing ◽

Programming Model ◽

Task Graph ◽

Modeling Tools ◽

Networks On Chip ◽

Hardware Description ◽

On Chip ◽

Network Speed ◽

High Level ◽

The Impact

Networks on Chip (NoCs) are commonly used to integrate complex embedded systems and multiprocessor platforms due to their scalability and versatility. Modeling tools used at the functional level use SystemC to perform hardware–software co-design and error correction concurrently, thus, reducing time to market. This work analyzes a JPEG encoding algorithm mapped onto a configurable M × N, mesh/torus, NoC platform described in SystemC with the transaction level modeling (TLM) standard; timing constraints for both, the router and network interface controller, are assigned according to a hardware description language (HDL) model written for this purpose. Processing nodes are also described as SystemC threads and their computation delays are assigned depending on the amount and cost of the operations they perform. The programming model employed is message passing. We start by describing and profiling the JPEG algorithm as a task graph; then, four partitioning proposals are mapped onto three NoCs of different size. Our analysis comprises changes in topology, virtual channel depth, routing algorithms, network speed and task-node assignments. Through several high-level simulations we evaluate the impact of each parameter and we show that, for the proposed model, most improvements come from the algorithm partitioning.

Download Full-text

Effective On-Chip Communication for Message Passing Programs on Multi-Core Processors

Electronics ◽

10.3390/electronics10212681 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2681

Author(s):

Joonmoo Huh ◽

Deokwoo Lee

Keyword(s):

Parallel Programming ◽

Shared Memory ◽

Message Passing ◽

Programming Model ◽

Multicore Architectures ◽

Worst Case ◽

High Performing ◽

Parallel Programming Model ◽

On Chip ◽

Sharing Patterns

Shared memory is the most popular parallel programming model for multi-core processors, while message passing is generally used for large distributed machines. However, as the number of cores on a chip increases, the relative merits of shared memory versus message passing change, and we argue that message passing becomes a viable, high performing, and parallel programming model. To demonstrate this hypothesis, we compare a shared memory architecture with a new message passing architecture on a suite of applications tuned for each system independently. Perhaps surprisingly, the fundamental behaviors of the applications studied in this work, when optimized for both models, are very similar to each other, and both could execute efficiently on multicore architectures despite many implementations being different from each other. Furthermore, if hardware is tuned to support message passing by supporting bulk message transfer and the elimination of unnecessary coherence overheads, and if effective support is available for global operations, then some applications would perform much better on a message passing architecture. Leveraging our insights, we design a message passing architecture that supports both memory-to-memory and cache-to-cache messaging in hardware. With the new architecture, message passing is able to outperform its shared memory counterparts on many of the applications due to the unique advantages of the message passing hardware as compared to cache coherence. In the best case, message passing achieves up to a 34% increase in speed over its shared memory counterpart, and it achieves an average 10% increase in speed. In the worst case, message passing is slowed down in two applications—CG (conjugate gradient) and FT (Fourier transform)—because it could not perform well on the unique data sharing patterns as its counterpart of shared memory. Overall, our analysis demonstrates the importance of considering message passing as a high performing and hardware-supported programming model on future multicore architectures.

Download Full-text