Improving Performance-Power-Programmability in Space Avionics with Edge Devices: VBN on Myriad2 SoC

The advent of powerful edge devices and AI algorithms has already revolutionized many terrestrial applications; however, for both technical and historical reasons, the space industry is still striving to adopt these key enabling technologies in new mission concepts. In this context, the current work evaluates an heterogeneous multi-core system-on-chip processor for use on-board future spacecraft to support novel, computationally demanding digital signal processors and AI functionalities. Given the importance of low power consumption in satellites, we consider the Intel Movidius Myriad2 system-on-chip and focus on SW development and performance aspects. We design a methodology and framework to accommodate efficient partitioning, mapping, parallelization, code optimization, and tuning of complex algorithms. Furthermore, we propose an avionics architecture combining this commercial off-the-shelf chip with a field programmable gate array device to facilitate, among others, interfacing with traditional space instruments via SpaceWire transcoding. We prototype our architecture in the lab targeting vision-based navigation tasks. We implement a representative computer vision pipeline to track the 6D pose of ENVISAT using megapixel images during hypothetical spacecraft proximity operations. Overall, we achieve 2.6 to 4.9 FPS with only 0.8 to 1.1 W on Myriad2 , i.e., 10-fold acceleration versus modern rad-hard processors. Based on the results, we assess various benefits of utilizing Myriad2 instead of conventional field programmable gate arrays and CPUs.

Download Full-text

FPGA Prototyping of Micro-Blaze soft-processor based Multi-core System on Chip

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.16.11416 ◽

2018 ◽

Vol 7 (2.16) ◽

pp. 57

Author(s):

G Prasad Acharya ◽

M Asha Rani

Keyword(s):

Computer Aided Design ◽

Processing System ◽

System On Chip ◽

Core System ◽

Design Cycle ◽

Field Programmable ◽

Programmable Gate Arrays ◽

On Chip ◽

Aided Design ◽

Level Parallelism

The increased demand for processor-level parallelism has many-folded the challenges for SoC designers to design, simulate and verify/validate today’s Multi-core System-On-Chip (SoC) due to the increased system complexity. There is also a need to reduce the design cycle time to produce a complex multi-core SOC system thereby the product can be brought into the market within an affordable time. The Computer-Aided Design (CAD) tools and Field Programmable Gate Arrays (FPGAs) provide a solution for rapidly prototyping and validating the system. This paper presents an implementation of multi-core SoC consisting of 6 Xilinx Micro-Blaze soft-core processors integrated to the Zynq Processing System (PS) using IP Integrator and these cores will be communicated through AXI bus. The functionality of the system is verified using Micro-Blaze system debugger. The hardware framework for the implemented system is implemented and verified on FPGA.

Download Full-text

High-Performance Time Server Core for FPGA System-on-Chip

Electronics ◽

10.3390/electronics8050528 ◽

2019 ◽

Vol 8 (5) ◽

pp. 528 ◽

Cited By ~ 1

Author(s):

Julian Viejo ◽

Jorge Juan-Chico ◽

Manuel J. Bellido ◽

Paulino Ruiz-de-Clavijo ◽

David Guerrero ◽

...

Keyword(s):

High Performance ◽

System On Chip ◽

The Core ◽

Performance Time ◽

Network Time ◽

Wide Range ◽

Network Time Protocol ◽

Field Programmable ◽

And Performance ◽

On Chip

This paper presents the complete design and implementation of a low-cost, low-footprint, network time protocol server core for field programmable gate arrays. The core uses a carefully designed modular architecture, which is fully implemented in hardware using digital circuits and systems. Most remarkable novelties introduced are a hardware-optimized timekeeping algorithm implementation, and a full-hardware protocol stack and automatic network configuration. As a result, the core is able to achieve similar accuracy and performance to typical high-performance network time protocol server equipment. The core uses a standard global positioning system receiver as time reference, has a small footprint and can easily fit in a low-range field-programmable chip, greatly scaling down from previous system-on-chip time synchronization systems. Accuracy and performance results show that the core can serve hundreds of thousands of network time clients with negligible accuracy degradation, in contrast to state-of-the-art high-performance time server equipment. Therefore, this core provides a valuable time server solution for a wide range of emerging embedded and distributed network applications such as the Internet of Things and the smart grid, at a fraction of the cost and footprint of current discrete and embedded solutions.

Download Full-text

Low-Process–Voltage–Temperature-Sensitivity Multi-Stage Timing Monitor for System-on-Chip Applications

Electronics ◽

10.3390/electronics10131587 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1587

Author(s):

Duo Sheng ◽

Hsueh-Ru Lin ◽

Li Tai

Keyword(s):

High Performance ◽

Power Reduction ◽

System On Chip ◽

Timing Information ◽

Multi Stage ◽

Dynamic Voltage ◽

And Performance ◽

On Chip ◽

Maximum Measurement ◽

Maximum Measurement Error

High performance and complex system-on-chip (SoC) design require a throughput and stable timing monitor to reduce the impacts of uncertain timing and implement the dynamic voltage and frequency scaling (DVFS) scheme for overall power reduction. This paper presents a multi-stage timing monitor, combining three timing-monitoring stages to achieve a high timing-monitoring resolution and a wide timing-monitoring range simultaneously. Additionally, because the proposed timing monitor has high immunity to the process–voltage–temperature (PVT) variation, it provides a more stable time-monitoring results. The time-monitoring resolution and range of the proposed timing monitor are 47 ps and 2.2 µs, respectively, and the maximum measurement error is 0.06%. Therefore, the proposed multi-stage timing monitor provides not only the timing information of the specified signals to maintain the functionality and performance of the SoC, but also makes the operation of the DVFS scheme more efficient and accurate in SoC design.

Download Full-text

A System-On-Chip Approach in Designing a Dedicated RISC Microcontroller Unit Using the Field-Programmable Gate Array

2010 Fifth International Conference on Systems ◽

10.1109/icons.2010.40 ◽

2010 ◽

Author(s):

Elena Roxana Buhus ◽

Alexandru Lazar ◽

Adriano Tavares

Keyword(s):

Field Programmable Gate Array ◽

System On Chip ◽

Field Programmable ◽

Gate Array ◽

On Chip ◽

Microcontroller Unit

Download Full-text

A Modular and Distributed Setup for Power and Performance Analysis of Multi-Processor System-on-Chip at Electronic System Level

2020 IEEE 39th International Performance Computing and Communications Conference (IPCCC) ◽

10.1109/ipccc50635.2020.9391516 ◽

2020 ◽

Author(s):

Muhammad Mudussir Ayub ◽

Franz Kreupl

Keyword(s):

Performance Analysis ◽

Electronic System ◽

System On Chip ◽

System Level ◽

Electronic System Level ◽

And Performance ◽

On Chip

Download Full-text

Argus CNN Accelerator Based on Kernel Clustering and Resource-Aware Pruning

Elektronika ir Elektrotechnika ◽

10.5755/j02.eie.28922 ◽

2021 ◽

Vol 27 (3) ◽

pp. 57-70

Author(s):

Damjan M. Rakanovic ◽

Vuk Vranjkovic ◽

Rastislav J. R. Struharik

Keyword(s):

Digital Signal Processor ◽

State Of The Art ◽

Digital Signal ◽

Pruning Algorithm ◽

Kernel Clustering ◽

Field Programmable ◽

Comparable Performance ◽

On Chip ◽

Resource Characteristics ◽

Resource Aware

Paper proposes a two-step Convolutional Neural Network (CNN) pruning algorithm and resource-efficient Field-programmable gate array (FPGA) CNN accelerator named “Argus”. The proposed CNN pruning algorithm first combines similar kernels into clusters, which are then pruned using the same regular pruning pattern. The pruning algorithm is carefully tailored for FPGAs, considering their resource characteristics. Regular sparsity results in high Multiply-accumulate (MAC) efficiency, reducing the amount of logic required to balance workloads among different MAC units. As a result, the Argus accelerator requires about 170 Look-up tables (LUTs) per Digital Signal Processor (DSP) block. This number is close to the average LUT/DPS ratio for various FPGA families, enabling balanced resource utilization when implementing Argus. Benchmarks conducted using Xilinx Zynq Ultrascale + Multi-Processor System-on-Chip (MPSoC) indicate that Argus is achieving up to 25 times higher frames per second than NullHop, 2 and 2.5 times higher than NEURAghe and Snowflake, respectively, and 2 times higher than NVDLA. Argus shows comparable performance to MIT’s Eyeriss v2 and Caffeine, requiring up to 3 times less memory bandwidth and utilizing 4 times fewer DSP blocks, respectively. Besides the absolute performance, Argus has at least 1.3 and 2 times better GOP/s/DSP and GOP/s/Block-RAM (BRAM) ratios, while being competitive in terms of GOP/s/LUT, compared to some of the state-of-the-art solutions.

Download Full-text

A secure field programmable gate array based System-on-Chip for Telemedicine application

International Conference on Information Society (i-Society 2011) ◽

10.1109/i-society18435.2011.5978518 ◽

2011 ◽

Author(s):

Norashikin M. Thamrin ◽

Illiasaak Ahmad ◽

Mohamed Khalil Hani

Keyword(s):

Field Programmable Gate Array ◽

System On Chip ◽

Telemedicine Application ◽

Field Programmable ◽

Gate Array ◽

On Chip

Download Full-text

Low-Complexity Nonlinear Self-Inverse Permutation for Creating Physically Clone-Resistant Identities

Cryptography ◽

10.3390/cryptography4010006 ◽

2020 ◽

Vol 4 (1) ◽

pp. 6 ◽

Cited By ~ 1

Author(s):

Saleh Mulhem ◽

Ayoub Mars ◽

Wael Adi

Keyword(s):

Field Programmable Gate Arrays ◽

Low Complexity ◽

System On Chip ◽

Large Classes ◽

Physical Unclonable Functions ◽

Gate Arrays ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Security Levels ◽

On Chip

New large classes of permutations over ℤ 2 n based on T-Functions as Self-Inverting Permutation Functions (SIPFs) are presented. The presented classes exhibit negligible or low complexity when implemented in emerging FPGA technologies. The target use of such functions is in creating the so called Secret Unknown Ciphers (SUC) to serve as resilient Clone-Resistant structures in smart non-volatile Field Programmable Gate Arrays (FPGA) devices. SUCs concepts were proposed a decade ago as digital consistent alternatives to the conventional analog inconsistent Physical Unclonable Functions PUFs. The proposed permutation classes are designed and optimized particularly to use non-consumed Mathblock cores in programmable System-on-Chip (SoC) FPGA devices. Hardware and software complexities for realizing such structures are optimized and evaluated for a sample expected target FPGA technology. The attained security levels of the resulting SUCs are evaluated and shown to be scalable and usable even for post-quantum crypto systems.

Download Full-text