Hardware-Assisted Security Monitoring Unit for Real-Time Ensuring Secure Instruction Execution and Data Processing in Embedded Systems

The hardware security of embedded systems is raising more and more concerns in numerous safety-critical applications, such as in the automotive, aerospace, avionic, and railway systems. Embedded systems are gaining popularity in these safety-sensitive sectors with high performance, low power, and great reliability, which are ideal control platforms for executing instruction operation and data processing. However, modern embedded systems are still exposing many potential hardware vulnerabilities to malicious attacks, including software-level and hardware-level attacks; these can cause program execution failure and confidential data leakage. For this reason, this paper presents a novel embedded system by integrating a hardware-assisted security monitoring unit (SMU), for achieving a reinforced system-on-chip (SoC) on ensuring program execution and data processing security. This architecture design was implemented and evaluated on a Xilinx Virtex-5 FPGA development board. Based on the evaluation of the SMU hardware implementation in terms of performance overhead, security capability, and resource consumption, the experimental results indicate that the SMU does not lead to a significant speed degradation to processor while executing different benchmarks, and its average performance overhead reduces to 2.18% on typical 8-KB I/D-Caches. Security capability evaluation confirms the monitoring effectiveness of SMU against both instruction and data tampering attacks. Meanwhile, the SoC satisfies a good balance between high-security and resource overhead.

Download Full-text

Adaptivity in high-performance embedded systems: a reactive control model for reliable and flexible design

The Knowledge Engineering Review ◽

10.1017/s0269888914000150 ◽

2014 ◽

Vol 29 (4) ◽

pp. 433-451

Author(s):

Huafeng Yu ◽

Abdoulaye Gamatié ◽

Éric Rutten ◽

Jean-Luc Dekeyser

Keyword(s):

Embedded Systems ◽

Embedded System ◽

Code Generation ◽

High Performance ◽

Control Model ◽

Multimedia System ◽

Reactive Control ◽

Data Intensive ◽

On Chip ◽

Automatic Code

AbstractSystem adaptivity is increasingly demanded in high-performance embedded systems, particularly in multimedia system-on-chip (SoC), owing to growing quality-of-service requirements. This paper presents a reactive control model that has been introduced in Gaspard, our framework dedicated to SoC hardware/software co-design. This model aims at expressing adaptivity as well as reconfigurability in systems performing data-intensive computations. It is generic enough to be used for description in the different parts of an embedded system, for example, specification of how different data-intensive algorithms can be chosen according to some computation modes at the functional level; and expression of how hardware components can be selected via the usage of a library of intellectual properties according to execution performances. The transformation of this model toward synchronous languages is also presented, in order to allow an automatic code generation usable for formal verification, based on techniques such as model checking and controller synthesis, as illustrated in the paper. This work, based on Model-Driven Engineering and the standard UML MARTE profile, has been implemented in Gaspard.

Download Full-text

Efficient Instruction and Data Caching for High Performance Embedded Processors

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jji-i3a.201201788 ◽

1970 ◽

pp. 9

Author(s):

A. Ferrerón Labari ◽

D. Suárez Gracia ◽

V. Viñals Yúfera

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

Low Power ◽

Interconnection Networks ◽

High Performance ◽

Critical Issue ◽

Content Management ◽

Structure Design ◽

Portable Devices ◽

On Chip

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.

Download Full-text

Congestion aware adaptive routing for network-on-chip communication

10.32920/ryerson.14645025.v1 ◽

2021 ◽

Author(s):

Stephen Chui

Keyword(s):

Embedded Systems ◽

High Performance ◽

Data Transfer ◽

Adaptive Routing ◽

Network On Chip ◽

Message Routing ◽

Data Packets ◽

Novel Approach ◽

Communication Links ◽

On Chip

Network-On-Chip (NoC) has surpassed the traditional bus based on-chip communication in offering better performance for data transfers among many processing, peripheral and other cores of high performance embedded systems. Adaptive routing provides an effective way of efficient on-chip communication among NoC cores. The message routing efficiency can further improve the performance of NoC based embedded systems on a chip. Congestion awareness has been applied to adaptive routing for achieving better data throughput and latency. This thesis presents a novel approach of analyzing congestion to improve NoC throughput by improving packet allocation in NoC routers. The routers would have the knowledge of the traffic conditions around themselves by utilizing the congestion information. We employ header flits to store the congestion information that does not require any additional communication links between the routers. By prioritizing data packets that are likely to suffer the worst congestion would improve overall NoC data transfer latency.

Download Full-text

Integrating Hardware and Software Filtering in Embedded System Audio Data Processing: An Embedded Systems Course Project

10.18260/1-2-1115-36458 ◽

2021 ◽

Author(s):

Vincent Winstead

Keyword(s):

Embedded Systems ◽

Data Processing ◽

Embedded System ◽

Audio Data ◽

Software Filtering

Download Full-text

Implementation of master-slave method on multiprocessor-based embedded system: case study on mobile robot

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.2.12732 ◽

2018 ◽

Vol 7 (2.2) ◽

pp. 53

Author(s):

Agusma Wajiansyah ◽

Hari Purwadi ◽

Asrina Astagani ◽

Supriadi Supriadi

Keyword(s):

Embedded Systems ◽

Mobile Robot ◽

Embedded System ◽

Execution Time ◽

Experimental Results ◽

Program Execution ◽

Time Average ◽

Single Processor ◽

Number Of Iterations

In this research the master-slave method implemented on an embedded system using 3 processor applied to the mobile robot, to know the speed of program execution of robot. As a comparison is also used a robot with an embedded system based on single processor. From the experimental results, by applying the slave master method obtained the execution time of 546,5 μs and the number of iteration 1079, while for single processor-based system obtained execution time average 67828 μs and the amount of iteration average 147 times. Where the number of iterations is obtained by running the robot for 10 s. From this experiment, it can be concluded that there is a performance increase of 7.3% when compared to embedded systems based on single processor.

Download Full-text

Two-Stage Checkpoint Based Security Monitoring and Fault Recovery Architecture for Embedded Processor

Electronics ◽

10.3390/electronics9071165 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1165

Author(s):

Xiang Wang ◽

Zongmin Zhao ◽

Dongdong Xu ◽

Zhun Zhang ◽

Qiang Hao ◽

...

Keyword(s):

Embedded System ◽

Basic Block ◽

Program Execution ◽

Fault Recovery ◽

Transient Faults ◽

Two Stage ◽

Embedded Processor ◽

Research Attention ◽

Security Monitoring ◽

Integrity Check

Nowadays, the secure program execution of embedded processor has attracted considerable research attention, since more and more code tampering attacks and transient faults are seriously affecting the security of embedded processors. The program monitoring and fault recovery strategies are not only closely related to the security of embedded devices, but also directly affect the performance of the processor. This paper presents a security monitoring and fault recovery architecture for run-time program execution, which takes regular backup copies of the two-stage checkpoint. In this framework, the integrity check technology based on the basic block (BB) is utilized to monitor the program execution in real-time, while the rollback operation is taken once the integrity check is failed. In addition, a Monitoring Cache (M-Cache) is built to buffer the reference data for integrity checking. Moreover, a recovery strategy mainly for three tampered positions (registers in processor, instructions in Cache, and codes in memory) is provided to ensure the smooth running of the embedded system. Finally, the open RISC processor is adopted to implement and verify the presented security architecture, which has been proved to be effective for program detection in the execution of tamper attack and quick recovery of the running environment as well as code.

Download Full-text

High-Efficiency Parallel Cryptographic Accelerator for Real-Time Guaranteeing Dynamic Data Security in Embedded Systems

Micromachines ◽

10.3390/mi12050560 ◽

2021 ◽

Vol 12 (5) ◽

pp. 560

Author(s):

Zhun Zhang ◽

Xiang Wang ◽

Qiang Hao ◽

Dongdong Xu ◽

Jinlei Zhang ◽

...

Keyword(s):

Embedded Systems ◽

Data Processing ◽

Data Security ◽

Data Exchange ◽

High Efficiency ◽

Main Memory ◽

Processing Efficiency ◽

Dynamic Data ◽

Field Programmable ◽

On Chip

Dynamic data security in embedded systems is raising more and more concerns in numerous safety-critical applications. In particular, the data exchanges in embedded Systems-on-Chip (SoCs) using main memory are exposing many security vulnerabilities to external attacks, which will cause confidential information leakages and program execution failures for SoCs at key points. Therefore, this paper presents a security SoC architecture with integrating a four-parallel Advanced Encryption Standard-Galois/Counter Mode (AES-GCM) cryptographic accelerator for achieving high-efficiency data processing to guarantee data exchange security between the SoC and main memory against bus monitoring, off-line analysis, and data tampering attacks. The architecture design has been implemented and verified on a Xilinx Virtex-5 Field Programmable Gate Array (FPGA) platform. Based on evaluation of the cryptographic accelerator in terms of performance overhead, security capability, processing efficiency, and resource consumption, experimental results show that the parallel cryptographic accelerator does not incur significant performance overhead on providing confidentiality and integrity protections for exchanged data; its average performance overhead reduces to as low as 2.65% on typical 8-KB I/D-Caches, and its data processing efficiency is around 3 times that of the pipelined AES-GCM construction. The reinforced SoC under the data tampering attacks and benchmark tests confirms the effectiveness against external physical attacks and satisfies a good trade-off between high-efficiency and hardware overhead.

Download Full-text

GPU Accelerated Adaptive Banded Event Alignment for Rapid Comparative Nanopore Signal Analysis

10.1101/756122 ◽

2019 ◽

Cited By ~ 1

Author(s):

Hasindu Gamaarachchi ◽

Chun Wai Lam ◽

Gihan Jayatilaka ◽

Hiruna Samarakoon ◽

Jared T. Simpson ◽

...

Keyword(s):

Dna Methylation ◽

Embedded System ◽

High Performance ◽

Point Of Care ◽

Dynamic Programming Algorithm ◽

Reference Sequence ◽

Programming Algorithm ◽

Sequencing Data ◽

On Chip ◽

Gpu Architectures

AbstractNanopore sequencing has the potential to revolutionise genomics by realising portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these applications requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. For instance, comparing raw nanopore signals to a biological reference sequence is a computationally complex task despite leveraging a dynamic programming algorithm for Adaptive Banded Event Alignment (ABEA)—a commonly used approach to polish sequencing data and identify non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. By optimising memory, compute and load balancing between CPU and GPU, we demonstrate how f5c can perform ~3-5× faster than the original implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c.

Download Full-text

Implementation of AMBA Based AHB2APB Bridge

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6908.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1033-1037

Keyword(s):

Embedded System ◽

High Speed ◽

High Performance ◽

Data Loss ◽

Verilog Hdl ◽

Main Target ◽

Functional Blocks ◽

Timing Simulation ◽

On Chip ◽

Bus Architecture

The Advance Micro controller Bus Architecture bus protocol is used to build high performance SoC designs (system on chip). This achieves communication through the connection of different functional blocks ( or IP ). By using multiple controllers and peripherals, it makes possible to develop multiprocessor unit. It provides reusability of IP of different buses of AMBA, which can reduce the communication gap between high performance buses and low speed buses. To perform high-speed pipelined data transfers, AMBA based embedded system becomes a demanding hypothesis analytical wise, by using different bus signals supported by AMBA. To synthesize as well as simulate the composite annexation which connects advance high performance bus and advance peripheral bus which known as AHB2APB Bridge in addition to no data loss during transfer is the main target of this work. Implementation of bridge module is designed in Verilog HDL and functional and timing simulation of bridge module are done on a platform of Xilinx.

Download Full-text

Congestion aware adaptive routing for network-on-chip communication

10.32920/ryerson.14645025 ◽

2021 ◽

Author(s):

Stephen Chui

Keyword(s):

Embedded Systems ◽

High Performance ◽

Data Transfer ◽

Adaptive Routing ◽

Network On Chip ◽

Message Routing ◽

Data Packets ◽

Novel Approach ◽

Communication Links ◽

On Chip

Download Full-text