scholarly journals A Two-Layer Component-Based Allocation for Embedded Systems with GPUs

Designs ◽  
2019 ◽  
Vol 3 (1) ◽  
pp. 6
Author(s):  
Gabriel Campeanu ◽  
Mehrdad Saadatmand

Component-based development is a software engineering paradigm that can facilitate the construction of embedded systems and tackle its complexities. The modern embedded systems have more and more demanding requirements. One way to cope with such a versatile and growing set of requirements is to employ heterogeneous processing power, i.e., CPU–GPU architectures. The new CPU–GPU embedded boards deliver an increased performance but also introduce additional complexity and challenges. In this work, we address the component-to-hardware allocation for CPU–GPU embedded systems. The allocation for such systems is much complex due to the increased amount of GPU-related information. For example, while in traditional embedded systems the allocation mechanism may consider only the CPU memory usage of components to find an appropriate allocation scheme, in heterogeneous systems, the GPU memory usage needs also to be taken into account in the allocation process. This paper aims at decreasing the component-to-hardware allocation complexity by introducing a two-layer component-based architecture for heterogeneous embedded systems. The detailed CPU–GPU information of the system is abstracted at a high-layer by compacting connected components into single units that behave as regular components. The allocator, based on the compacted information received from the high-level layer, computes, with a decreased complexity, feasible allocation schemes. In the last part of the paper, the two-layer allocation method is evaluated using an existing embedded system demonstrator; namely, an underwater robot.

2008 ◽  
Vol 17 (06) ◽  
pp. 973-993 ◽  
Author(s):  
NASER MOHAMMADZADEH ◽  
SHAAHIN HESSABI ◽  
MAZIAR GOUDARZI ◽  
MAHDI MALAKI

The growing complexity of today's embedded systems demands new methodologies and tools to manage the problems of analysis, design, implementation, and validation of complex-embedded systems. Focusing on this issue, this paper describes a design and implementation toolset using our ODYSSEY methodology, which advocates object-oriented (OO) modeling of embedded systems and its ASIP-based implementation. The proposed approach promotes a smooth transition from high-level object-oriented specification to the final embedded system, which is composed of hardware and software components. The transition from higher to lower abstraction levels is facilitated by the use of our GUI, which supports the intermediate steps of the design and implementation process. In order to illustrate the proposed approach and related toolset, we apply this top-down design and implementation framework to real-world embedded systems, namely JPEG codec and Motion JPEG codec. Experimental results show that the developed tool remarkably decreases the design and verification time with modest performance penalty.


2016 ◽  
Vol 2016 ◽  
pp. 1-12
Author(s):  
Hua Hua ◽  
Xiaomin Yang ◽  
Binyu Yan ◽  
Kai Zhou ◽  
Wei Lu

Main challenges for image enlargement methods in embedded systems come from the requirements of good performance, low computational cost, and low memory usage. This paper proposes an efficient image enlargement method which can meet these requirements in embedded system. Firstly, to improve the performance of enlargement methods, this method extracts different kind of features for different morphologies with different approaches. Then, various dictionaries based on different kind of features are learned, which represent the image in a more efficient manner. Secondly, to accelerate the enlargement speed and reduce the memory usage, this method divides the atoms of each dictionary into several clusters. For each cluster, separate projection matrix is calculated. This method reformulates the problem as a least squares regression. The high-resolution (HR) images can be reconstructed based on a few projection matrixes. Numerous experiment results show that this method has advantages such as being efficient and real-time and having less memory cost. These advantages make this method easy to implement in mobile embedded system.


2021 ◽  
Author(s):  
Afef Salhi ◽  
Fahmi Ghozzi ◽  
Ahmed Fakhfakh

Co-design embedded system are very important step in digital vehicle and airplane. The multicore and multiprocessor SoC (MPSoC) started a new computing era. It is becoming increasingly used because it can provide designers much more opportunities to meet specific performances. Designing embedded systems includes two main phases: (i) HW/SW Partitioning performed from high-level (eclipse C/C++ or python (machine learning and deep learning)) functional and architecture models (with virtual prototype and real prototype). And (ii) Software Design performed with significantly more detailed models with scheduling and partitioning tasks algorithm DAG Directed Acyclic Graph and GGEN Generation Graph Estimation Nodes (there are automatic DAG algorithm). Partitioning decisions are made according to performance assumptions that should be validated on the more refined software models for ME block and GGEN algorithm. In this paper, we focus to optimize a execution time and amelioration for quality of video with a scheduling and partitioning tasks in video codec. We show how they can be modeled the video sequence test with the size of video in height and width (three models of scheduling tasks in four processor). This modeling with DAG and GGEN are partitioning at different platform in OVP (partitioning, SW design). We can know the optimization of consumption energy and execution time in SoC and MPSoC platform.


2013 ◽  
Vol 645 ◽  
pp. 221-226
Author(s):  
Hua Jing Zheng ◽  
Hai Chuan Lu ◽  
Quan Jiang

Developing highly efficient and reliable embedded system demands hardware/software co-design. To be high-configurable embedded systems, designers seek for transaction-level modeling and component based development. Our approach employ platform-based design and component-based design and apply it to a networked sensor system. We introduce a multiple-language environment unified component-based model and apply it to co-simulation of sensor system instances. The case studies demonstrate that our approach is readily applicable to real-world networked sensor systems and reduces the co-simulation complexities.


10.28945/3391 ◽  
2009 ◽  
Author(s):  
Moshe Pelleh

In our world, where most systems become embedded systems, the approach of designing embedded systems is still frequently similar to the approach of designing organic systems (or not embedded systems). An organic system, like a personal computer or a work station, must be able to run any task submitted to it at any time (with certain constrains depending on the machine). Consequently, it must have a sophisticated general purpose Operating System (OS) to schedule, dispatch, maintain and monitor the tasks and assist them in special cases (particularly communication and synchronization between them and with external devices). These OSs require an overhead on the memory, on the cache and on the run time. Moreover, generally they are task oriented rather than machine oriented; therefore the processor's throughput is penalized. On the other hand, an embedded system, like an Anti-lock Braking System (ABS), executes always the same software application. Frequently it is a small or medium size system, or made up of several such systems. Many small or medium size embedded systems, with limited number of tasks, can be scheduled by our proposed hardware architecture, based on the Motorola 500MHz MPC7410 processor, enhancing its throughput and avoiding the software OS overhead, complexity, maintenance and price. Encouraged by our experimental results, we shall develop a compiler to assist our method. In the meantime we will present here our proposal and the experimental results.


Electronics ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 13
Author(s):  
Balaji M ◽  
Chandrasekaran M ◽  
Vaithiyanathan Dhandapani

A Novel Rail-Network Hardware with simulation facilities is presented in this paper. The hardware is designed to facilitate the learning of application-oriented, logical, real-time programming in an embedded system environment. The platform enables the creation of multiple unique programming scenarios with variability in complexity without any hardware changes. Prior experimental hardware comes with static programming facilities that focus the students’ learning on hardware features and programming basics, leaving them ill-equipped to take up practical applications with more real-time constraints. This hardware complements and completes their learning to help them program real-world embedded systems. The hardware uses LEDs to simulate the movement of trains in a network. The network has train stations, intersections and parking slots where the train movements can be controlled by using a 16-bit Renesas RL78/G13 microcontroller. Additionally, simulating facilities are provided to enable the students to navigate the trains by manual controls using switches and indicators. This helps them get an easy understanding of train navigation functions before taking up programming. The students start with simple tasks and gradually progress to more complicated ones with real-time constraints, on their own. During training, students’ learning outcomes are evaluated by obtaining their feedback and conducting a test at the end to measure their knowledge acquisition during the training. Students’ Knowledge Enhancement Index is originated to measure the knowledge acquired by the students. It is observed that 87% of students have successfully enhanced their knowledge undergoing training with this rail-network simulator.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1031
Author(s):  
Joseba Gorospe ◽  
Rubén Mulero ◽  
Olatz Arbelaitz ◽  
Javier Muguerza ◽  
Miguel Ángel Antón

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.


Electronics ◽  
2020 ◽  
Vol 10 (1) ◽  
pp. 34
Author(s):  
Michele Alessandrini ◽  
Giorgio Biagetti ◽  
Paolo Crippa ◽  
Laura Falaschetti ◽  
Lorenzo Manoni ◽  
...  

Singular value decomposition (SVD) is a central mathematical tool for several emerging applications in embedded systems, such as multiple-input multiple-output (MIMO) systems, data analytics, sparse representation of signals. Since SVD algorithms reduce to solve an eigenvalue problem, that is computationally expensive, both specific hardware solutions and parallel implementations have been proposed to overcome this bottleneck. However, as those solutions require additional hardware resources that are not in general available in embedded systems, optimized algorithms are demanded in this context. The aim of this paper is to present an efficient implementation of the SVD algorithm on ARM Cortex-M. To this end, we proceed to (i) present a comprehensive treatment of the most common algorithms for SVD, providing a fairly complete and deep overview of these algorithms, with a common notation, (ii) implement them on an ARM Cortex-M4F microcontroller, in order to develop a library suitable for embedded systems without an operating system, (iii) find, through a comparative study of the proposed SVD algorithms, the best implementation suitable for a low-resource bare-metal embedded system, (iv) show a practical application to Kalman filtering of an inertial measurement unit (IMU), as an example of how SVD can improve the accuracy of existing algorithms and of its usefulness on a such low-resources system. All these contributions can be used as guidelines for embedded system designers. Regarding the second point, the chosen algorithms have been implemented on ARM Cortex-M4F microcontrollers with very limited hardware resources with respect to more advanced CPUs. Several experiments have been conducted to select which algorithms guarantee the best performance in terms of speed, accuracy and energy consumption.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 469
Author(s):  
Hyun Woo Oh ◽  
Ji Kwang Kim ◽  
Gwan Beom Hwang ◽  
Seung Eun Lee

Recently, advances in technology have enabled embedded systems to be adopted for a variety of applications. Some of these applications require real-time 2D graphics processing running on limited design specifications such as low power consumption and a small area. In order to satisfy such conditions, including a specific 2D graphics accelerator in the embedded system is an effective method. This method reduces the workload of the processor in the embedded system by exploiting the accelerator. The accelerator assists the system to perform 2D graphics processing in real-time. Therefore, a variety of applications that require 2D graphics processing can be implemented with an embedded processor. In this paper, we present a 2D graphics accelerator for tiny embedded systems. The accelerator includes an optimized line-drawing operation based on Bresenham’s algorithm. The optimized operation enables the accelerator to deal with various kinds of 2D graphics processing and to perform the line-drawing instead of the system processor. Moreover, the accelerator also distributes the workload of the processor core by removing the need for the core to access the frame buffer memory. We measure the performance of the accelerator by implementing the processor, including the accelerator, on a field-programmable gate array (FPGA), and ascertaining the possibility of realization by synthesizing using the 180 nm CMOS process.


Sign in / Sign up

Export Citation Format

Share Document