Optimized Communication Architecture of MPSoCs with a Hardware Scheduler

Diandian Zhang; Han Zhang; Jeronimo Castrillon; Torsten Kempf; Bart Vanthournout; Gerd Ascheid; Rainer Leupers

doi:10.4018/jertcs.2011070101

Optimized Communication Architecture of MPSoCs with a Hardware Scheduler

Adoption and Optimization of Embedded and Real-Time Communication Systems ◽

10.4018/978-1-4666-2776-5.ch009 ◽

2013 ◽

pp. 163-180 ◽

Cited By ~ 1

Author(s):

Diandian Zhang ◽

Han Zhang ◽

Jeronimo Castrillon ◽

Torsten Kempf ◽

Bart Vanthournout ◽

...

Keyword(s):

High Performance ◽

Programming Model ◽

Real Life ◽

Point Of View ◽

Communication Architecture ◽

Dynamic Task ◽

Computational Performance ◽

Systems On Chip ◽

Promising Solution ◽

On Chip

Efficient runtime resource management in multi-processor systems-on-chip (MPSoCs) for achieving high performance and low energy consumption is one of the key challenges for system designers. OSIP, an operating system application-specific instruction-set processor, together with its well-defined programming model, provides a promising solution. It delivers high computational performance to deal with dynamic task scheduling and mapping. Being programmable, it can easily be adapted to different systems. However, the distributed computation among the different processing elements introduces complexity to the communication architecture, which tends to become the bottleneck of such systems. In this work, the authors highlight the vital importance of the communication architecture for OSIP-based systems and optimize the communication architecture. Furthermore, the effects of OSIP and the communication architecture are investigated jointly from the system point of view, based on a broad case study for a real life application (H.264) and a synthetic benchmark application.

Download Full-text

Dynamic Task Distribution Model for On-Chip Reconfigurable High Speed Computing System

International Journal of Reconfigurable Computing ◽

10.1155/2015/783237 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12

Author(s):

Mahendra Vucha ◽

Arvind Rajawat

Keyword(s):

High Speed ◽

High Performance ◽

Real Life ◽

Computing System ◽

Distribution Model ◽

Design Parameters ◽

Dynamic Task ◽

Task Distribution ◽

Field Programmable ◽

On Chip

Modern embedded systems are being modeled as Reconfigurable High Speed Computing System (RHSCS) where Reconfigurable Hardware, that is, Field Programmable Gate Array (FPGA), and softcore processors configured on FPGA act as computing elements. As system complexity increases, efficient task distribution methodologies are essential to obtain high performance. A dynamic task distribution methodology based on Minimum Laxity First (MLF) policy (DTD-MLF) distributes the tasks of an application dynamically onto RHSCS and utilizes available RHSCS resources effectively. The DTD-MLF methodology takes the advantage of runtime design parameters of an application represented as DAG and considers the attributes of tasks in DAG and computing resources to distribute the tasks of an application onto RHSCS. In this paper, we have described the DTD-MLF model and verified its effectiveness by distributing some of real life benchmark applications onto RHSCS configured on Virtex-5 FPGA device. Some benchmark applications are represented as DAG and are distributed to the resources of RHSCS based on DTD-MLF model. The performance of the MLF based dynamic task distribution methodology is compared with static task distribution methodology. The comparison shows that the dynamic task distribution model with MLF criteria outperforms the static task distribution techniques in terms of schedule length and effective utilization of available RHSCS resources.

Download Full-text

МЕТОДЫ ДОСТИЖЕНИЯ МАКСИМАЛЬНОЙ ЭФФЕКТИВНОСТИ ПЛАТФОРМЫ ПРОТОТИПИРОВАНИЯ ВЫСОКОПРОИЗВОДИТЕЛЬНЫХ СИСТЕМ НА КРИСТАЛЛЕ НА ЗАДАЧАХ ИСКУССТВЕННОГО ИНТЕЛЛЕКТА

Nanoindustry Russia ◽

10.22184/1993-8578.2020.13.3s.585.588 ◽

2020 ◽

Vol 96 (3s) ◽

pp. 585-588

Author(s):

С.Е. Фролова ◽

Е.С. Янакова

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Computer Vision ◽

High Performance ◽

Systems On Chip ◽

High Performance Systems ◽

On Chip ◽

Network Technologies ◽

Neural Network Technologies

Предлагаются методы построения платформ прототипирования высокопроизводительных систем на кристалле для задач искусственного интеллекта. Изложены требования к платформам подобного класса и принципы изменения проекта СнК для имплементации в прототип. Рассматриваются методы отладки проектов на платформе прототипирования. Приведены результаты работ алгоритмов компьютерного зрения с использованием нейросетевых технологий на FPGA-прототипе семантических ядер ELcore. Methods have been proposed for building prototyping platforms for high-performance systems-on-chip for artificial intelligence tasks. The requirements for platforms of this class and the principles for changing the design of the SoC for implementation in the prototype have been described as well as methods of debugging projects on the prototyping platform. The results of the work of computer vision algorithms using neural network technologies on the FPGA prototype of the ELcore semantic cores have been presented.

Download Full-text

Framework for Design Exploration and Performance Analysis of RF-NoC Manycore Architecture

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea10040037 ◽

2020 ◽

Vol 10 (4) ◽

pp. 37

Author(s):

Habiba Lahdhiri ◽

Jordane Lorandel ◽

Salvatore Monteleone ◽

Emmanuelle Bourdel ◽

Maurizio Palesi

Keyword(s):

High Performance ◽

Design Space Exploration ◽

Routing Algorithm ◽

Long Distance ◽

Promising Solution ◽

And Performance ◽

On Chip ◽

Many Core ◽

High Degree ◽

Real Traffic

The Network-on-chip (NoC) paradigm has been proposed as a promising solution to enable the handling of a high degree of integration in multi-/many-core architectures. Despite their advantages, wired NoC infrastructures are facing several performance issues regarding multi-hop long-distance communications. RF-NoC is an attractive solution offering high performance and multicast/broadcast capabilities. However, managing RF links is a critical aspect that relies on both application-dependent and architectural parameters. This paper proposes a design space exploration framework for OFDMA-based RF-NoC architecture, which takes advantage of both real application benchmarks simulated using Sniper and RF-NoC architecture modeled using Noxim. We adopted the proposed framework to finely configure a routing algorithm, working with real traffic, achieving up to 45% of delay reduction, compared to a wired NoC setup in similar conditions.

Download Full-text

20 nm FDSOI process and design platforms for high performance/ low power systems on chip

2012 IEEE International SOI Conference (SOI) ◽

10.1109/soi.2012.6404361 ◽

2012 ◽

Cited By ~ 1

Author(s):

M. Haond

Keyword(s):

Power Systems ◽

Low Power ◽

High Performance ◽

Systems On Chip ◽

On Chip ◽

20 Nm

Download Full-text

A New Router Architecture for High-Performance Intrachip Networks

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v3i1.278 ◽

2008 ◽

Vol 3 (1) ◽

pp. 23-31

Author(s):

Everton Carara ◽

Ney Calazans ◽

Fernando Moraes

Keyword(s):

High Performance ◽

Packet Switching ◽

Circuit Switching ◽

Transmission Method ◽

Communication Architecture ◽

Virtual Channels ◽

Trade Offs ◽

Communication Architectures ◽

Session Layer ◽

On Chip

For almost a decade now, Network on Chip (NoC) concepts have evolved to provide an interesting alternative to more traditional intrachip communication architectures (e.g. shared busses) for the design of complex Systems on Chip (SoCs). A considerable number of NoC proposals are available, focusing on different sets of optimization aspects, related to specific classes of applications. Each such application employs a NoC as part of its underlying implementation infrastructure. Many of the mentioned optimization aspects target results such as Quality of Service (QoS) achievement and/or power consumption reduction. On the other hand, the use of NoCs brings about the solution of new design problems, such to the choice of synchronization method to employ between NoC routers and application modules mapping. Although the availability of NoC structures is already rather ample, some design choices are at base of many, if not most, NoC proposals. These include the use of wormhole packet switching and virtual channels. This work pledges against this practice. It discusses trade-offs of using circuit or packet switching, arguing in favor the use of the former with fixed size packets (cells). Quantitative data supports the argumentation. Also, the work proposes and justifies replacing the use of virtual channels by replicated channels, based on the abundance of wires in current and expected deep sub-micron technologies. Finally, the work proposes a transmission method coupling the use of session layer structures to circuit switching to better support application implementation. The main reported result is the availability of a router with reduced latency and area, a communication architecture adapted for high-performance applications.

Download Full-text

System-Level Analysis of MPSoCs with a Hardware Scheduler

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Advancing Embedded Systems and Real-Time Communications with Emerging Technologies ◽

10.4018/978-1-4666-6034-2.ch014 ◽

2014 ◽

pp. 335-367

Author(s):

Diandian Zhang ◽

Jeronimo Castrillon ◽

Stefan Schürmans ◽

Gerd Ascheid ◽

Rainer Leupers ◽

...

Keyword(s):

Resource Management ◽

Distributed Memory ◽

Point Of View ◽

System Level ◽

Video Decoder ◽

Communication Architecture ◽

Communication Architectures ◽

Runtime Management ◽

On Chip ◽

Ip Blocks

Efficient runtime resource management in heterogeneous Multi-Processor Systems-on-Chip (MPSoCs) for achieving high performance and energy efficiency is one key challenge for system designers. In the past years, several IP blocks have been proposed that implement system-wide runtime task and resource management. As the processor count continues to increase, it is important to analyze the scalability of runtime managers at the system-level for different communication architectures. In this chapter, the authors analyze the scalability of an Application-Specific Instruction-Set Processor (ASIP) for runtime management called OSIP on two platform paradigms: shared and distributed memory. For the former, a generic bus is used as interconnect. For distributed memory, a Network-on-Chip (NoC) is used. The effects of OSIP and the communication architecture are jointly investigated from the system point of view, based on a broad case study with real applications (an H.264 video decoder and a digital receiver for wireless communications) and a synthetic benchmark application.

Download Full-text

3D Stacked Cache Data Management for Energy Minimization of 3D Chip Multiprocessor

International Journal of Students Research in Technology & Management ◽

10.18510/ijsrtm.2015.325 ◽

2015 ◽

Vol 3 (2) ◽

pp. 264-268

Author(s):

K. Suresh Kumar ◽

S. Anitha ◽

M. Gayathri

Keyword(s):

Temperature Distribution ◽

High Performance ◽

Chip Multiprocessors ◽

Electrical Power ◽

Chip Multiprocessor ◽

Energy Reduction ◽

Experimental Result ◽

Data Mapping ◽

Promising Solution ◽

On Chip

In this model a runtime cache data mapping is discussed for 3-D stacked L2 caches to minimize the overall energy of 3-D chip multiprocessors (CMPs). The suggested method considers both temperature distribution and memory traffic of 3-D CMPs. Experimental result shows energy reduction achieving up to 22.88% compared to an existing solution which considers only the temperature distribution. New tendencies envisage 3D Multi-Processor System-On-Chip (MPSoC) design as a promising solution to keep increasing the performance of the next-generation high performance computing (HPC) systems. However, as the power density of HPC systems increases with the arrival of 3D MPSoCs with energy reduction achieving up to 19.55% by supplying electrical power to the computing equipment and constantly removing the generated heat is rapidly becoming the dominant cost in any HPC facility.

Download Full-text