Design Aspects of Self-Organizing Heterogeneous Multi-Core Architectures Entwurfsaspekte selbstorganisierender, heterogener Multicore-Architekturen

2008 ◽  
Vol 50 (5) ◽  
Author(s):  
Rainer Buchty ◽  
Wolfgang Karl

AbstractAlready today we face architectures featuring up to several hundreds of processors, being able to manage several thousand concurrent threads. Future architectures, however, will not only see an increase in parallelism but also feature an increase in heterogeneity and reconfigurability. Judging from current production and prototype architectures, we also see that such systems will be tiled, i. e., individual cores with local memory interconnected through some means of on-chip communication. Current discussions show that existing approaches to application mapping, parallelization, data locality optimization, and system management do not match these upcoming architectures well, thus rather hampering than harnessing the power of future systems. We will therefore outline the requirements of upcoming architectures and demonstrate how self-organization, including bio-inspired, techniques may help to manage system complexity. Key to these techniques is a sophisticated decentralized, hierarchical monitoring approach suitable for sustained real-time monitoring and event correlation for current and future high-performance architectures.

2016 ◽  
Vol 25 (10) ◽  
pp. 1630005 ◽  
Author(s):  
Marcelo Daniel Berejuck ◽  
Antônio A. Fröhlich

We present the design and evaluation of a high-performance network-on-chip (NoC) focused on telecommunication and multimedia applications that tolerate latency and bandwidth variations. The design is based on a connectionless strategy in which flits from different communication flows are interleaved in the same communication channel. Each flit carries routing information that is used by routers to perform arbitration and scheduling of the corresponding output ports in order to balance channel utilization. In order to compare our approach with others, we introduce an analytic model for the worst-case latency (WCL) of our NoC and recall those of related approaches. Analytic comparisons and experimental data show that our approach keeps average WCL lower for variable-bit-rate multimedia applications than a network based on resource reservation. For these applications, the overall throughput is larger than that of networks that perform resource reservation. A case study based on the proposed NoC shows that the average latency was 28% lower than the WCL expected for the experiment. Indeed, hard real-time flows designed considering the absolute WCL of the network will always meet the requirements of the associated hard real-time tasks, so no deadline can be lost due to network contention.


2021 ◽  
Author(s):  
Isiaka A. Alimi ◽  
Romil K. Patel ◽  
Oluyomi Aboderin ◽  
Abdelgader M. Abdalla ◽  
Ramoni A. Gbadamosi ◽  
...  

Integration technology advancement has impacted the System-on-Chip (SoC) in which heterogeneous cores are supported on a single chip. Based on the huge amount of supported heterogeneous cores, efficient communication between the associated processors has to be considered at all levels of the system design to ensure global interconnection. This can be achieved through a design-friendly, flexible, scalable, and high-performance interconnection architecture. It is noteworthy that the interconnections between multiple cores on a chip present a considerable influence on the performance and communication of the chip design regarding the throughput, end-to-end delay, and packets loss ratio. Although hierarchical architectures have addressed the majority of the associated challenges of the traditional interconnection techniques, the main limiting factor is scalability. Network-on-Chip (NoC) has been presented as a scalable and well-structured alternative solution that is capable of addressing communication issues in the on-chip systems. In this context, several NoC topologies have been presented to support various routing techniques and attend to different chip architectural requirements. This book chapter reviews some of the existing NoC topologies and their associated characteristics. Also, application mapping algorithms and some key challenges of NoC are considered.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 241
Author(s):  
Hau Ngo ◽  
Ryan Rakvic ◽  
Randy Broussard ◽  
Robert Ives ◽  
Matthew Carothers

Real-time support for an iris recognition algorithm is a considerable challenge for a portable system that is commonly used in the field. In this paper, an efficient parallel and pipeline architecture design for the feature extraction and template matching processes in the Ridge Energy Direction (RED) algorithm for iris recognition is presented. Several techniques used in the proposed architecture design to reduce the computational complexity while supporting a high performance capability include (i) a circle approximation method for the iris unwrapping process, (ii) a parallel design with an on-chip buffer for 2D convolution in the feature extraction process, and (iii) an approximation method for log2 and inverse-log2 conversion in the template matching process. Performance analysis shows that the proposed architecture achieves a speedup of 881 times compared to the conventional method. The proposed design can be integrated with an embedded microprocessor to realize a complete system-on-chip solution for a portable iris recognition system.


2021 ◽  
Vol 49 (4) ◽  
pp. 1025-1034
Author(s):  
Vo Cong

Field-programmable gate arrays (FPGAs) and, recently, System on Chip (SoC) devices have been applied in a wide area of applications due to their flexibility for real-time implementations, increasing the processing capability on hardware as well as the speed of processing information in real-time. The most important applications based on FPGA/SoC devices are focused on signal/image processing, Internet of Things (IoT) technology, artificial intelligence (AI) algorithms, energy systems applications, automatic control and industrial applications. This paper develops a robot arm controller based on a programmable System-OnChip (SoC) device that combines the high-performance and flexibility of a CPU and the processing power of an FPGA. The CPU consists of a dual-core ARM processor that handles algorithm calculations, motion planning and manages communication and data manipulation. FPGA is mainly used to generate signals to control servo and read the feedback signals from encoders. Data from the ARM processor is transferred to the programmable logic side via the AXI protocol. This combination delivers superior parallel-processing and computing power, real-time performance and versatile connectivity. Additionally, having the complete controller on a single chip allows the hardware design to be simpler, more reliable, and less expensive.


Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 832 ◽  
Author(s):  
Shuai Li ◽  
Kuangyuan Sun ◽  
Yukui Luo ◽  
Nandakishor Yadav ◽  
Ken Choi

Standard convolutional neural networks (CNNs) have large amounts of data redundancy, and the same accuracy can be obtained even in lower bit weights instead of floating-point representation. Most CNNs have to be developed and executed on high-end GPU-based workstations, for which it is hard to transplant the existing implementations onto portable edge FPGAs because of the limitation of on-chip block memory storage size and battery capacity. In this paper, we present adaptive pointwise convolution and 2D convolution joint network (AP2D-Net), an ultra-low power and relatively high throughput system combined with dynamic precision weights and activation. Our system has high performance, and we make a trade-off between accuracy and power efficiency by adopting unmanned aerial vehicle (UAV) object detection scenarios. We evaluate our system on the Zynq UltraScale+ MPSoC Ultra96 mobile FPGA platform. The target board can get the real-time speed of 30 fps under 5.6 W, and the FPGA on-chip power is only 0.6 W. The power efficiency of our system is 2.8× better than the best system design on a Jetson TX2 GPU and 1.9× better than the design on a PYNQ-Z1 SoC FPGA.


2008 ◽  
Vol 18 (02) ◽  
pp. 239-255 ◽  
Author(s):  
JUN HO BAHN ◽  
SEUNG EUN LEE ◽  
YOON SEOK YANG ◽  
JUNGSOOK YANG ◽  
NADER BAGHERZADEH

As the number of integrated IP cores in the current System-on-Chips (SoCs) keeps increasing, communication requirements among cores can not be sufficiently satisfied using either traditional or multi-layer bus architectures because of their poor scalability and bandwidth limitation on a single bus. While new interconnection techniques have been explored to overcome such a limitation, the notion of utilizing Network-on-Chip (NoC) technologies for the future generation of high performance and low power chips for myriad of applications, in particular for wireless communication and multimedia processing, has been of great importance. In order for the NoC technologies to succeed, realistic specifications such as throughput, latency, moderate design complexity, programming model, and design tools are necessary requirements. For this purpose, we have covered some of the key and challenging design issues specific to the NoC architecture such as the router design, network interface (NI) issues, and complete system-level modeling. In this paper, we propose a multi-processor system platform adopting NoC techniques, called NePA (Network-based Processor Array). As a component of system platform, the fundamental NoC techniques including the router architecture and generic NI are defined and implemented adopting low power and clock efficient techniques. Using a high-level cycle-accurate simulation, various parameters relevant to its performance and its systematic modeling are extracted and analyzed. By combining various developed systematic models, we construct the tool chain to pursue hardware/software design tradeoffs necessary for better understanding of the NoC techniques. Finally utilizing implementation of parallel FFT algorithms on the homogeneous NePA, the feasibility and advantages of using NoC techniques are shown.


1993 ◽  
Vol 04 (04) ◽  
pp. 337-349
Author(s):  
DAVID NAYLOR ◽  
SIMON JONES ◽  
DAVID MYERS ◽  
JOHN VINCENT

The application of artificial neural networks to real-time image processing tasks requires the use of dedicated, high performance hardware. A linear array processor called HANNIBAL has been developed which implements the backpropagation neural learning algorithm on-chip. This paper considers the design of a complete neural system which integrates HANNIBAL into an existing image processing environment. The goals for the design of the system have been set partly by the primary application, namely feature recognition, but mainly by the desire for a flexible, high performance hardware tool for the study and evaluation of range of neural image processing applications.


Sign in / Sign up

Export Citation Format

Share Document