Design Aspects of Self-Organizing Heterogeneous Multi-Core Architectures Entwurfsaspekte selbstorganisierender, heterogener Multicore-Architekturen

AbstractAlready today we face architectures featuring up to several hundreds of processors, being able to manage several thousand concurrent threads. Future architectures, however, will not only see an increase in parallelism but also feature an increase in heterogeneity and reconfigurability. Judging from current production and prototype architectures, we also see that such systems will be tiled, i. e., individual cores with local memory interconnected through some means of on-chip communication. Current discussions show that existing approaches to application mapping, parallelization, data locality optimization, and system management do not match these upcoming architectures well, thus rather hampering than harnessing the power of future systems. We will therefore outline the requirements of upcoming architectures and demonstrate how self-organization, including bio-inspired, techniques may help to manage system complexity. Key to these techniques is a sophisticated decentralized, hierarchical monitoring approach suitable for sustained real-time monitoring and event correlation for current and future high-performance architectures.

Download Full-text

Scheduling computation and communication on a software-defined photonic Network-on-Chip architecture for high-performance real-time systems

Journal of Systems Architecture ◽

10.1016/j.sysarc.2018.07.007 ◽

2018 ◽

Vol 90 ◽

pp. 54-71 ◽

Cited By ~ 5

Author(s):

Hüseyin Temuçin ◽

Kayhan M. İmre

Keyword(s):

Real Time ◽

High Performance ◽

Network On Chip ◽

Real Time Systems ◽

Photonic Network ◽

On Chip ◽

Time Systems

Download Full-text

Evaluation of a Connectionless Technique for System-on-Chip Interconnection

Journal of Circuits System and Computers ◽

10.1142/s0218126616300051 ◽

2016 ◽

Vol 25 (10) ◽

pp. 1630005 ◽

Cited By ~ 2

Author(s):

Marcelo Daniel Berejuck ◽

Antônio A. Fröhlich

Keyword(s):

Real Time ◽

High Performance ◽

Communication Channel ◽

Resource Reservation ◽

Multimedia Applications ◽

Worst Case ◽

Average Latency ◽

On Chip ◽

Hard Real Time

We present the design and evaluation of a high-performance network-on-chip (NoC) focused on telecommunication and multimedia applications that tolerate latency and bandwidth variations. The design is based on a connectionless strategy in which flits from different communication flows are interleaved in the same communication channel. Each flit carries routing information that is used by routers to perform arbitration and scheduling of the corresponding output ports in order to balance channel utilization. In order to compare our approach with others, we introduce an analytic model for the worst-case latency (WCL) of our NoC and recall those of related approaches. Analytic comparisons and experimental data show that our approach keeps average WCL lower for variable-bit-rate multimedia applications than a network based on resource reservation. For these applications, the overall throughput is larger than that of networks that perform resource reservation. A case study based on the proposed NoC shows that the average latency was 28% lower than the WCL expected for the experiment. Indeed, hard real-time flows designed considering the absolute WCL of the network will always meet the requirements of the associated hard real-time tasks, so no deadline can be lost due to network contention.

Download Full-text

Network-On-Chip Topologies: Potentials, Technical Challenges, Recent Advances and Research Direction

10.5772/intechopen.97262 ◽

2021 ◽

Author(s):

Isiaka A. Alimi ◽

Romil K. Patel ◽

Oluyomi Aboderin ◽

Abdelgader M. Abdalla ◽

Ramoni A. Gbadamosi ◽

...

Keyword(s):

High Performance ◽

Research Direction ◽

Network On Chip ◽

Limiting Factor ◽

Single Chip ◽

Chip Design ◽

Application Mapping ◽

Technology Advancement ◽

On Chip ◽

Heterogeneous Cores

Integration technology advancement has impacted the System-on-Chip (SoC) in which heterogeneous cores are supported on a single chip. Based on the huge amount of supported heterogeneous cores, efficient communication between the associated processors has to be considered at all levels of the system design to ensure global interconnection. This can be achieved through a design-friendly, flexible, scalable, and high-performance interconnection architecture. It is noteworthy that the interconnections between multiple cores on a chip present a considerable influence on the performance and communication of the chip design regarding the throughput, end-to-end delay, and packets loss ratio. Although hierarchical architectures have addressed the majority of the associated challenges of the traditional interconnection techniques, the main limiting factor is scalability. Network-on-Chip (NoC) has been presented as a scalable and well-structured alternative solution that is capable of addressing communication issues in the on-chip systems. In this context, several NoC topologies have been presented to support various routing techniques and attend to different chip architectural requirements. This book chapter reviews some of the existing NoC topologies and their associated characteristics. Also, application mapping algorithms and some key challenges of NoC are considered.

Download Full-text

THAMON: Thermal-aware High-performance Application Mapping onto Opto-electrical network-on-chip

Journal of Systems Architecture ◽

10.1016/j.sysarc.2021.102315 ◽

2021 ◽

pp. 102315

Author(s):

Meisam Abdollahi ◽

Yasaman Firouzabadi ◽

Fatemeh Dehghani ◽

Siamak Mohammadi

Keyword(s):

High Performance ◽

Network On Chip ◽

Electrical Network ◽

Application Mapping ◽

On Chip ◽

High Performance Application

Download Full-text

Architecture Design for Feature Extraction and Template Matching in a Real-Time Iris Recognition System

Electronics ◽

10.3390/electronics10030241 ◽

2021 ◽

Vol 10 (3) ◽

pp. 241

Author(s):

Hau Ngo ◽

Ryan Rakvic ◽

Randy Broussard ◽

Robert Ives ◽

Matthew Carothers

Keyword(s):

Feature Extraction ◽

Real Time ◽

Approximation Method ◽

Template Matching ◽

High Performance ◽

Iris Recognition ◽

Recognition System ◽

Architecture Design ◽

Pipeline Architecture ◽

On Chip

Real-time support for an iris recognition algorithm is a considerable challenge for a portable system that is commonly used in the field. In this paper, an efficient parallel and pipeline architecture design for the feature extraction and template matching processes in the Ridge Energy Direction (RED) algorithm for iris recognition is presented. Several techniques used in the proposed architecture design to reduce the computational complexity while supporting a high performance capability include (i) a circle approximation method for the iris unwrapping process, (ii) a parallel design with an on-chip buffer for 2D convolution in the feature extraction process, and (iii) an approximation method for log2 and inverse-log2 conversion in the template matching process. Performance analysis shows that the proposed architecture achieves a speedup of 881 times compared to the conventional method. The proposed design can be integrated with an embedded microprocessor to realize a complete system-on-chip solution for a portable iris recognition system.

Download Full-text

Industrial robot arm controller based on programmable System-on-Chip device

FME Transaction ◽

10.5937/fme2104025c ◽

2021 ◽

Vol 49 (4) ◽

pp. 1025-1034

Author(s):

Vo Cong

Keyword(s):

Real Time ◽

High Performance ◽

Industrial Robot ◽

Industrial Applications ◽

System On Chip ◽

Robot Arm ◽

Single Chip ◽

Arm Processor ◽

Field Programmable ◽

On Chip

Field-programmable gate arrays (FPGAs) and, recently, System on Chip (SoC) devices have been applied in a wide area of applications due to their flexibility for real-time implementations, increasing the processing capability on hardware as well as the speed of processing information in real-time. The most important applications based on FPGA/SoC devices are focused on signal/image processing, Internet of Things (IoT) technology, artificial intelligence (AI) algorithms, energy systems applications, automatic control and industrial applications. This paper develops a robot arm controller based on a programmable System-OnChip (SoC) device that combines the high-performance and flexibility of a CPU and the processing power of an FPGA. The CPU consists of a dual-core ARM processor that handles algorithm calculations, motion planning and manages communication and data manipulation. FPGA is mainly used to generate signals to control servo and read the feedback signals from encoders. Data from the ARM processor is transferred to the programmable logic side via the AXI protocol. This combination delivers superior parallel-processing and computing power, real-time performance and versatile connectivity. Additionally, having the complete controller on a single chip allows the hardware design to be simpler, more reliable, and less expensive.

Download Full-text

Novel CNN-Based AP2D-Net Accelerator: An Area and Power Efficient Solution for Real-Time Applications on Mobile FPGA

Electronics ◽

10.3390/electronics9050832 ◽

2020 ◽

Vol 9 (5) ◽

pp. 832 ◽

Cited By ~ 2

Author(s):

Shuai Li ◽

Kuangyuan Sun ◽

Yukui Luo ◽

Nandakishor Yadav ◽

Ken Choi

Keyword(s):

Real Time ◽

Power Efficiency ◽

High Performance ◽

Memory Storage ◽

Ultra Low Power ◽

Power Efficient ◽

Battery Capacity ◽

Point Representation ◽

On Chip ◽

Better Than

Standard convolutional neural networks (CNNs) have large amounts of data redundancy, and the same accuracy can be obtained even in lower bit weights instead of floating-point representation. Most CNNs have to be developed and executed on high-end GPU-based workstations, for which it is hard to transplant the existing implementations onto portable edge FPGAs because of the limitation of on-chip block memory storage size and battery capacity. In this paper, we present adaptive pointwise convolution and 2D convolution joint network (AP2D-Net), an ultra-low power and relatively high throughput system combined with dynamic precision weights and activation. Our system has high performance, and we make a trade-off between accuracy and power efficiency by adopting unmanned aerial vehicle (UAV) object detection scenarios. We evaluate our system on the Zynq UltraScale+ MPSoC Ultra96 mobile FPGA platform. The target board can get the real-time speed of 30 fps under 5.6 W, and the FPGA on-chip power is only 0.6 W. The power efficiency of our system is 2.8× better than the best system design on a Jetson TX2 GPU and 1.9× better than the design on a PYNQ-Z1 SoC FPGA.

Download Full-text

ON DESIGN AND APPLICATION MAPPING OF A NETWORK-ON-CHIP(NOC) ARCHITECTURE

Parallel Processing Letters ◽

10.1142/s0129626408003363 ◽

2008 ◽

Vol 18 (02) ◽

pp. 239-255 ◽

Cited By ~ 16

Author(s):

JUN HO BAHN ◽

SEUNG EUN LEE ◽

YOON SEOK YANG ◽

JUNGSOOK YANG ◽

NADER BAGHERZADEH

Keyword(s):

Low Power ◽

High Performance ◽

Programming Model ◽

Current System ◽

Future Generation ◽

Network On Chip ◽

System Level ◽

Processor Array ◽

Application Mapping ◽

On Chip

As the number of integrated IP cores in the current System-on-Chips (SoCs) keeps increasing, communication requirements among cores can not be sufficiently satisfied using either traditional or multi-layer bus architectures because of their poor scalability and bandwidth limitation on a single bus. While new interconnection techniques have been explored to overcome such a limitation, the notion of utilizing Network-on-Chip (NoC) technologies for the future generation of high performance and low power chips for myriad of applications, in particular for wireless communication and multimedia processing, has been of great importance. In order for the NoC technologies to succeed, realistic specifications such as throughput, latency, moderate design complexity, programming model, and design tools are necessary requirements. For this purpose, we have covered some of the key and challenging design issues specific to the NoC architecture such as the router design, network interface (NI) issues, and complete system-level modeling. In this paper, we propose a multi-processor system platform adopting NoC techniques, called NePA (Network-based Processor Array). As a component of system platform, the fundamental NoC techniques including the router architecture and generic NI are defined and implemented adopting low power and clock efficient techniques. Using a high-level cycle-accurate simulation, various parameters relevant to its performance and its systematic modeling are extracted and analyzed. By combining various developed systematic models, we construct the tool chain to pursue hardware/software design tradeoffs necessary for better understanding of the NoC techniques. Finally utilizing implementation of parallel FFT algorithms on the homogeneous NePA, the feasibility and advantages of using NoC techniques are shown.

Download Full-text

NEURAL NETWORK FEATURE DETECTOR FOR REAL-TIME VIDEO SIGNAL PROCESSING

International Journal of Neural Systems ◽

10.1142/s0129065793000286 ◽

1993 ◽

Vol 04 (04) ◽

pp. 337-349

Author(s):

DAVID NAYLOR ◽

SIMON JONES ◽

DAVID MYERS ◽

JOHN VINCENT

Keyword(s):

Image Processing ◽

Real Time ◽

High Performance ◽

Feature Recognition ◽

Learning Algorithm ◽

Neural System ◽

Neural Learning ◽

Network Feature ◽

On Chip ◽

Time Image

The application of artificial neural networks to real-time image processing tasks requires the use of dedicated, high performance hardware. A linear array processor called HANNIBAL has been developed which implements the backpropagation neural learning algorithm on-chip. This paper considers the design of a complete neural system which integrates HANNIBAL into an existing image processing environment. The goals for the design of the system have been set partly by the primary application, namely feature recognition, but mainly by the desire for a flexible, high performance hardware tool for the study and evaluation of range of neural image processing applications.

Download Full-text

Real Time Network on Chip (NOC) Architecture with CDMA Techniques with Audio Decoders

Oct. 17-19, 2017 Dubai (UAE) ◽

10.15242/dirpub.dir1017020 ◽

2018 ◽

Keyword(s):

Real Time ◽

Network On Chip ◽

On Chip

Download Full-text