Design, Development, Benchmarking and Evaluation of Parallel Applications for High Performance Embedded Systems

Abstract The MATLAB is a technical computing language used in a variety of fields, such as control systems, image and signal processing, visualization, financial process simulations in an easy-to-use environment. MATLAB offers "toolboxes" which are specialized libraries for variety scientific domains, and a simplified interface to high-performance libraries (LAPACK, BLAS, FFTW too). Now MATLAB is enriched by the possibility of parallel computing with the Parallel Computing ToolboxTM and MATLAB Distributed Computing ServerTM. In this article we present some of the key features of MATLAB parallel applications focused on using GPU processors for image processing.

Download Full-text

Efficient Instruction and Data Caching for High Performance Embedded Processors

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jji-i3a.201201788 ◽

1970 ◽

pp. 9

Author(s):

A. Ferrerón Labari ◽

D. Suárez Gracia ◽

V. Viñals Yúfera

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

Low Power ◽

Interconnection Networks ◽

High Performance ◽

Critical Issue ◽

Content Management ◽

Structure Design ◽

Portable Devices ◽

On Chip

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Experiences on the characterization of parallel applications in embedded systems with Extrae/Paraver

49th International Conference on Parallel Processing - ICPP ◽

10.1145/3404397.3404440 ◽

2020 ◽

Author(s):

Adrian Munera ◽

Sara Royuela ◽

Germán Llort ◽

Estanislao Mercadal ◽

Franck Wartel ◽

...

Keyword(s):

Embedded Systems ◽

Parallel Applications

Download Full-text

SHIL and DHIL Simulations of Nonlinear Control Methods Applied for Power Converters Using Embedded Systems

Electronics ◽

10.3390/electronics7100241 ◽

2018 ◽

Vol 7 (10) ◽

pp. 241 ◽

Cited By ~ 10

Author(s):

Arthur Rosa ◽

Matheus Silva ◽

Marcos Campos ◽

Renato Santana ◽

Welbert Rodrigues ◽

...

Keyword(s):

Embedded Systems ◽

Nonlinear Control ◽

Real Time ◽

Digital Signal Processor ◽

High Performance ◽

Power Converters ◽

Digital Signal ◽

Simulation Method ◽

Hardware In The Loop ◽

Control Techniques

In this work, a new real-time Simulation method is designed for nonlinear control techniques applied to power converters. We propose two different implementations: in the first one (Single Hardware in The Loop: SHIL), both model and control laws are inserted in the same Digital Signal Processor (DSP), and in the second approach (Double Hardware in The Loop: DHIL), the equations are loaded in different embedded systems. With this methodology, linear and nonlinear control techniques can be designed and compared in a quick and cheap real-time realization of the proposed systems, ideal for both students and engineers who are interested in learning and validating converters performance. The methodology can be applied to buck, boost, buck-boost, flyback, SEPIC and 3-phase AC-DC boost converters showing that the new and high performance embedded systems can evaluate distinct nonlinear controllers. The approach is done using matlab-simulink over commodity Texas Instruments Digital Signal Processors (TI-DSPs). The main purpose is to demonstrate the feasibility of proposed real-time implementations without using expensive HIL systems such as Opal-RT and Typhoon-HL.

Download Full-text

GPUs: High-performance Accelerators for Parallel Applications

Ubiquity ◽

10.1145/2618401 ◽

2014 ◽

Vol 2014 (August) ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Mark Silberstein

Keyword(s):

High Performance ◽

Parallel Applications

Download Full-text

Technology Roadmap for Eco-Friendly Building Materials Industry

Energies ◽

10.3390/en12050804 ◽

2019 ◽

Vol 12 (5) ◽

pp. 804 ◽

Cited By ~ 1

Author(s):

Hyunsook Shim ◽

Taeyeon Kim ◽

Gyunghyun Choi

Keyword(s):

Quality Function Deployment ◽

High Performance ◽

Building Materials ◽

Free Market ◽

Process Analysis ◽

Low Carbon ◽

Design Development ◽

Analytic Hierarchy ◽

Technology Roadmap ◽

Save Energy

As quality of life has improved, the need for high-performance building materials that meet specific technological requirements has increased. Residential environments have also changed owing to climate change. A technology roadmap could define and systematically reflect a timeline for the development of future core technologies. The purpose of this research is to build a technology roadmap that could be utilized for the development of technology in the eco-friendly building material industry. This research is composed of multiple analysis processes—patent analysis, Delphi, and analytic hierarchy process analysis—that minimize the uncertainty caused by the lack of information in the eco-friendly construction industry by securing objective future forecast data. Subsequently, the quality function deployment test is implemented to verify the feasibility of the technology roadmap that is constructed. The design of various types of functional, low-carbon building materials could reduce carbon emissions and save energy by ensuring a hazardous-material-free market in the future. This design development roadmap is required to complement this technology roadmap.

Download Full-text

Hardware-Assisted Security Monitoring Unit for Real-Time Ensuring Secure Instruction Execution and Data Processing in Embedded Systems

Micromachines ◽

10.3390/mi12121450 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1450

Author(s):

Xiang Wang ◽

Zhun Zhang ◽

Qiang Hao ◽

Dongdong Xu ◽

Jiqing Wang ◽

...

Keyword(s):

Embedded Systems ◽

Data Processing ◽

Embedded System ◽

High Performance ◽

Program Execution ◽

Security Monitoring ◽

Railway Systems ◽

On Chip ◽

Capability Evaluation ◽

Security Capability

The hardware security of embedded systems is raising more and more concerns in numerous safety-critical applications, such as in the automotive, aerospace, avionic, and railway systems. Embedded systems are gaining popularity in these safety-sensitive sectors with high performance, low power, and great reliability, which are ideal control platforms for executing instruction operation and data processing. However, modern embedded systems are still exposing many potential hardware vulnerabilities to malicious attacks, including software-level and hardware-level attacks; these can cause program execution failure and confidential data leakage. For this reason, this paper presents a novel embedded system by integrating a hardware-assisted security monitoring unit (SMU), for achieving a reinforced system-on-chip (SoC) on ensuring program execution and data processing security. This architecture design was implemented and evaluated on a Xilinx Virtex-5 FPGA development board. Based on the evaluation of the SMU hardware implementation in terms of performance overhead, security capability, and resource consumption, the experimental results indicate that the SMU does not lead to a significant speed degradation to processor while executing different benchmarks, and its average performance overhead reduces to 2.18% on typical 8-KB I/D-Caches. Security capability evaluation confirms the monitoring effectiveness of SMU against both instruction and data tampering attacks. Meanwhile, the SoC satisfies a good balance between high-security and resource overhead.

Download Full-text

Microprocessors KOMDIV for High Performance Embedded Systems

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v7i3.71 ◽

2021 ◽

Vol 7 (3) ◽

Author(s):

S.G. Bobkov

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

High Performance ◽

Clock Cycle ◽

Embedded Computing ◽

Computing Systems ◽

Processor Performance

The problems of creating of high-performance embedded computing systems based on microprocessors KOMDIV is considered. Processor performance is dependent upon three characteristics: clock cycle, clock cycles per instruction, and instruction count. These characteristics for microprocessors KOMDIV are optimized using parameter performance/power consumption and requirements of embedded systems.

Download Full-text

Multi-Softcore Architecture on FPGA

International Journal of Reconfigurable Computing ◽

10.1155/2014/979327 ◽

2014 ◽

Vol 2014 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Mouna Baklouti ◽

Mohamed Abid

Keyword(s):

High Performance ◽

Design Methodology ◽

Matrix Multiplication ◽

Rapid Prototype ◽

General Purpose ◽

Parallel Applications ◽

Multicore Systems ◽

Processor Core ◽

Nios Ii ◽

Wide Range

To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication).

Download Full-text