Compiler-Aided Run-Time Performance Speed-Up in Super-Scalar Processor

Mapping Intimacies ◽

10.28945/3391 ◽

2009 ◽

Author(s):

Moshe Pelleh

Keyword(s):

Embedded Systems ◽

Embedded System ◽

General Purpose ◽

Experimental Results ◽

Medium Size ◽

Organic System ◽

Software Application ◽

Organic Systems ◽

Special Cases ◽

Run Time

In our world, where most systems become embedded systems, the approach of designing embedded systems is still frequently similar to the approach of designing organic systems (or not embedded systems). An organic system, like a personal computer or a work station, must be able to run any task submitted to it at any time (with certain constrains depending on the machine). Consequently, it must have a sophisticated general purpose Operating System (OS) to schedule, dispatch, maintain and monitor the tasks and assist them in special cases (particularly communication and synchronization between them and with external devices). These OSs require an overhead on the memory, on the cache and on the run time. Moreover, generally they are task oriented rather than machine oriented; therefore the processor's throughput is penalized. On the other hand, an embedded system, like an Anti-lock Braking System (ABS), executes always the same software application. Frequently it is a small or medium size system, or made up of several such systems. Many small or medium size embedded systems, with limited number of tasks, can be scheduled by our proposed hardware architecture, based on the Motorola 500MHz MPC7410 processor, enhancing its throughput and avoiding the software OS overhead, complexity, maintenance and price. Encouraged by our experimental results, we shall develop a compiler to assist our method. In the meantime we will present here our proposal and the experimental results.

A Generalization Performance Study Using Deep Learning Networks in Embedded Systems

Sensors ◽

10.3390/s21041031 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1031

Author(s):

Joseba Gorospe ◽

Rubén Mulero ◽

Olatz Arbelaitz ◽

Javier Muguerza ◽

Miguel Ángel Antón

Keyword(s):

Deep Learning ◽

Embedded Systems ◽

Embedded System ◽

General Purpose ◽

Learning Networks ◽

Performance Study ◽

Learning Techniques ◽

Wide Range ◽

Learning Architectures

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.

Implementation of master-slave method on multiprocessor-based embedded system: case study on mobile robot

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.2.12732 ◽

2018 ◽

Vol 7 (2.2) ◽

pp. 53

Author(s):

Agusma Wajiansyah ◽

Hari Purwadi ◽

Asrina Astagani ◽

Supriadi Supriadi

Keyword(s):

Embedded Systems ◽

Mobile Robot ◽

Embedded System ◽

Execution Time ◽

Experimental Results ◽

Program Execution ◽

Time Average ◽

Single Processor ◽

Number Of Iterations

In this research the master-slave method implemented on an embedded system using 3 processor applied to the mobile robot, to know the speed of program execution of robot. As a comparison is also used a robot with an embedded system based on single processor. From the experimental results, by applying the slave master method obtained the execution time of 546,5 μs and the number of iteration 1079, while for single processor-based system obtained execution time average 67828 μs and the amount of iteration average 147 times. Where the number of iterations is obtained by running the robot for 10 s. From this experiment, it can be concluded that there is a performance increase of 7.3% when compared to embedded systems based on single processor.

Design of Virtual Machine Monitor for Embedded Systems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.263-266.1629 ◽

2012 ◽

Vol 263-266 ◽

pp. 1629-1632

Author(s):

Sung Hoon Son

Keyword(s):

Embedded Systems ◽

Embedded System ◽

Virtual Machine ◽

Performance Metrics ◽

Virtual Machines ◽

General Purpose ◽

Virtual Machine Monitor ◽

Hardware Resource ◽

General Purpose Computer ◽

Purpose Computer

Recently virtualization has been one of the most popular research topics in system software area. While there are many commercial virtualization products for general-purpose computer system, little efforts are made to virtualize embedded system. In this paper, we design and implement a virtual machine monitor which divides each physical hardware resource of an embedded system into logical ones and reorganizes them into many virtual machines so that several real-time operating systems run concurrently on a single embedded system. We measure various performance metrics of the virtual machine monitor developed on a real embedded system. The results of the measurement study show that our virtual machine monitor has enough potentiality of its application to real-world embedded systems.

Energy Efficiency of Task Allocation for Embedded JPEG Systems

The Scientific World JOURNAL ◽

10.1155/2014/718348 ◽

2014 ◽

Vol 2014 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Yang-Hsin Fan ◽

Jan-Ou Wu ◽

San-Fu Wang

Keyword(s):

Energy Efficiency ◽

Energy Consumption ◽

Embedded Systems ◽

Embedded System ◽

Energy Saving ◽

Task Allocation ◽

Smart Home ◽

Consumer Electronics ◽

Experimental Results ◽

Software Components

Embedded system works everywhere for repeatedly performing a few particular functionalities. Well-known products include consumer electronics, smart home applications, and telematics device, and so forth. Recently, developing methodology of embedded systems is applied to conduct the design of cloud embedded system resulting in the applications of embedded system being more diverse. However, the more energy consumes result from the more embedded system works. This study presents hyperrectangle technology (HT) to embedded system for obtaining energy saving. The HT adopts drift effect to construct embedded systems with more hardware circuits than software components or vice versa. It can fast construct embedded system with a set of hardware circuits and software components. Moreover, it has a great benefit to fast explore energy consumption for various embedded systems. The effects are presented by assessing a JPEG benchmarks. Experimental results demonstrate that the HT, respectively, achieves the energy saving by 29.84%, 2.07%, and 68.80% on average to GA, GHO, and Lin.

Domain-Specific Programming Environment for Heterogeneous Multicore Embedded Systems

International Journal of Embedded and Real-Time Communication Systems ◽

10.4018/ijertcs.2014100101 ◽

2014 ◽

Vol 5 (4) ◽

pp. 1-23 ◽

Cited By ~ 13

Author(s):

Alexey Syschikov ◽

Yuriy Sheynin ◽

Boris Sedov ◽

Vera Ivanova

Keyword(s):

Embedded Systems ◽

Software Development ◽

Embedded System ◽

Multicore Processors ◽

General Purpose ◽

Domain Experts ◽

Wide Range ◽

Portable Software ◽

The Embedded System ◽

Computing Platforms

Nowadays embedded systems are used in a broad range of domains such as avionics, space, automotive, mobile, domestic appliances etc. Sophisticated software determines the quality of embedded systems and requires high-qualified experts for software development. Software becomes the main assert of embedded systems that is valuable to retain in changing computing platforms in embedded systems evolution. Computing platforms for embedded systems became multicore processors and SoC, they can change in the embedded system lifetime that could be long (dozen of years for an automobile and airplane). It requires software porting to new platforms as a regular process. Many tools and approaches allow developing of software for domain area experts, but mainly for general-purpose computing systems. In this paper the authors present the complex technology and tools that allows involving domain experts in software development for embedded systems. The proposed technology has various aspects and abilities that can be used to build verifiable and portable software for a wide range of embedded platforms.

Choosing the Optimized OS for an MPSoC Embedded System

Reconfigurable Embedded Control Systems ◽

10.4018/978-1-60960-086-0.ch016 ◽

2011 ◽

pp. 434-443

Author(s):

Abderrazak Jemai

Keyword(s):

Embedded Systems ◽

Comparative Study ◽

Embedded System ◽

Operating Systems ◽

General Purpose ◽

Multiprocessor System ◽

System A ◽

On Chip ◽

Application Specific ◽

Main Operating

This chapter provides a comparative study between recent operating systems, designed for embedded systems. Our study focuses, in particular, on systems designed for Multiprocessors implementations called MPSoC. An OS can be seen as abstract layer or an interface between the embedded application and the underlying hardware. In this chapter, we give a comparative study of main operating systems used in embedded systems. The originality of this chapter is that we specially focus on the OS ability to be optimized to support and manage a multiprocessor architecture. A multiprocessor system-on-chip is software driven and mastering the development complexity of the software part of MPSoC, is the key to reduce developing time factor. This opportunity could be reached through the use of a document giving a detailed description and analysis for criteria related to MPSoC. The wide diversity of existing operating systems, the huge complexity to develop an application specific or a general purpose, and the aggressive evolution of embedded systems makes the development of such a system a so difficult task. These considerations lead to the realization that a work that provides guidance for the MPSoC designers will be very beneficial for these communities.

A Novel Rail-Network Hardware Simulator for Embedded System Programming

Electronics ◽

10.3390/electronics10010013 ◽

2020 ◽

Vol 10 (1) ◽

pp. 13

Author(s):

Balaji M ◽

Chandrasekaran M ◽

Vaithiyanathan Dhandapani

Keyword(s):

Embedded Systems ◽

Embedded System ◽

Real Time ◽

Learning Outcomes ◽

Real World ◽

Time Constraints ◽

Network Simulator ◽

Practical Applications ◽

Rail Network ◽

Knowledge Enhancement

A Novel Rail-Network Hardware with simulation facilities is presented in this paper. The hardware is designed to facilitate the learning of application-oriented, logical, real-time programming in an embedded system environment. The platform enables the creation of multiple unique programming scenarios with variability in complexity without any hardware changes. Prior experimental hardware comes with static programming facilities that focus the students’ learning on hardware features and programming basics, leaving them ill-equipped to take up practical applications with more real-time constraints. This hardware complements and completes their learning to help them program real-world embedded systems. The hardware uses LEDs to simulate the movement of trains in a network. The network has train stations, intersections and parking slots where the train movements can be controlled by using a 16-bit Renesas RL78/G13 microcontroller. Additionally, simulating facilities are provided to enable the students to navigate the trains by manual controls using switches and indicators. This helps them get an easy understanding of train navigation functions before taking up programming. The students start with simple tasks and gradually progress to more complicated ones with real-time constraints, on their own. During training, students’ learning outcomes are evaluated by obtaining their feedback and conducting a test at the end to measure their knowledge acquisition during the training. Students’ Knowledge Enhancement Index is originated to measure the knowledge acquired by the students. It is observed that 87% of students have successfully enhanced their knowledge undergoing training with this rail-network simulator.

Singular Value Decomposition in Embedded Systems Based on ARM Cortex-M Architecture

Electronics ◽

10.3390/electronics10010034 ◽

2020 ◽

Vol 10 (1) ◽

pp. 34

Author(s):

Michele Alessandrini ◽

Giorgio Biagetti ◽

Paolo Crippa ◽

Laura Falaschetti ◽

Lorenzo Manoni ◽

...

Keyword(s):

Embedded Systems ◽

Singular Value Decomposition ◽

Embedded System ◽

Mimo Systems ◽

Singular Value ◽

Measurement Unit ◽

Comprehensive Treatment ◽

Mathematical Tool ◽

Value Decomposition ◽

Speed Accuracy

Singular value decomposition (SVD) is a central mathematical tool for several emerging applications in embedded systems, such as multiple-input multiple-output (MIMO) systems, data analytics, sparse representation of signals. Since SVD algorithms reduce to solve an eigenvalue problem, that is computationally expensive, both specific hardware solutions and parallel implementations have been proposed to overcome this bottleneck. However, as those solutions require additional hardware resources that are not in general available in embedded systems, optimized algorithms are demanded in this context. The aim of this paper is to present an efficient implementation of the SVD algorithm on ARM Cortex-M. To this end, we proceed to (i) present a comprehensive treatment of the most common algorithms for SVD, providing a fairly complete and deep overview of these algorithms, with a common notation, (ii) implement them on an ARM Cortex-M4F microcontroller, in order to develop a library suitable for embedded systems without an operating system, (iii) find, through a comparative study of the proposed SVD algorithms, the best implementation suitable for a low-resource bare-metal embedded system, (iv) show a practical application to Kalman filtering of an inertial measurement unit (IMU), as an example of how SVD can improve the accuracy of existing algorithms and of its usefulness on a such low-resources system. All these contributions can be used as guidelines for embedded system designers. Regarding the second point, the chosen algorithms have been implemented on ARM Cortex-M4F microcontrollers with very limited hardware resources with respect to more advanced CPUs. Several experiments have been conducted to select which algorithms guarantee the best performance in terms of speed, accuracy and energy consumption.

Lightweight Blockchain Processing. Case Study: Scanned Document Tracking on Tezos Blockchain

Applied Sciences ◽

10.3390/app11157169 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7169

Author(s):

Mohamed Allouche ◽

Tarek Frikha ◽

Mihai Mitrea ◽

Gérard Memmi ◽

Faten Chaabane

Keyword(s):

Load Balancing ◽

Relative Error ◽

Execution Time ◽

General Purpose ◽

Experimental Results ◽

Raspberry Pi ◽

Embedded Platform ◽

Memory Resources ◽

Processing Solution

To bridge the current gap between the Blockchain expectancies and their intensive computation constraints, the present paper advances a lightweight processing solution, based on a load-balancing architecture, compatible with the lightweight/embedding processing paradigms. In this way, the execution of complex operations is securely delegated to an off-chain general-purpose computing machine while the intimate Blockchain operations are kept on-chain. The illustrations correspond to an on-chain Tezos configuration and to a multiprocessor ARM embedded platform (integrated into a Raspberry Pi). The performances are assessed in terms of security, execution time, and CPU consumption when achieving a visual document fingerprint task. It is thus demonstrated that the advanced solution makes it possible for a computing intensive application to be deployed under severely constrained computation and memory resources, as set by a Raspberry Pi 3. The experimental results show that up to nine Tezos nodes can be deployed on a single Raspberry Pi 3 and that the limitation is not derived from the memory but from the computation resources. The execution time with a limited number of fingerprints is 40% higher than using a classical PC solution (value computed with 95% relative error lower than 5%).

The Design of a 2D Graphics Accelerator for Embedded Systems

Electronics ◽

10.3390/electronics10040469 ◽

2021 ◽

Vol 10 (4) ◽

pp. 469

Author(s):

Hyun Woo Oh ◽

Ji Kwang Kim ◽

Gwan Beom Hwang ◽

Seung Eun Lee

Keyword(s):

Embedded Systems ◽

Embedded System ◽

Real Time ◽

Line Drawing ◽

Cmos Process ◽

Embedded Processor ◽

Processor Core ◽

Field Programmable ◽

The Embedded System ◽

Graphics Processing

Recently, advances in technology have enabled embedded systems to be adopted for a variety of applications. Some of these applications require real-time 2D graphics processing running on limited design specifications such as low power consumption and a small area. In order to satisfy such conditions, including a specific 2D graphics accelerator in the embedded system is an effective method. This method reduces the workload of the processor in the embedded system by exploiting the accelerator. The accelerator assists the system to perform 2D graphics processing in real-time. Therefore, a variety of applications that require 2D graphics processing can be implemented with an embedded processor. In this paper, we present a 2D graphics accelerator for tiny embedded systems. The accelerator includes an optimized line-drawing operation based on Bresenham’s algorithm. The optimized operation enables the accelerator to deal with various kinds of 2D graphics processing and to perform the line-drawing instead of the system processor. Moreover, the accelerator also distributes the workload of the processor core by removing the need for the core to access the frame buffer memory. We measure the performance of the accelerator by implementing the processor, including the accelerator, on a field-programmable gate array (FPGA), and ascertaining the possibility of realization by synthesizing using the 180 nm CMOS process.