Design, Development, Benchmarking and Evaluation of Parallel Applications for High Performance Embedded Systems

2000 ◽  
Author(s):  
Wei-keng Liao ◽  
Donald Weiner ◽  
Alok Choudhary
2012 ◽  
Vol 17 (4) ◽  
pp. 207-216 ◽  
Author(s):  
Magdalena Szymczyk ◽  
Piotr Szymczyk

Abstract The MATLAB is a technical computing language used in a variety of fields, such as control systems, image and signal processing, visualization, financial process simulations in an easy-to-use environment. MATLAB offers "toolboxes" which are specialized libraries for variety scientific domains, and a simplified interface to high-performance libraries (LAPACK, BLAS, FFTW too). Now MATLAB is enriched by the possibility of parallel computing with the Parallel Computing ToolboxTM and MATLAB Distributed Computing ServerTM. In this article we present some of the key features of MATLAB parallel applications focused on using GPU processors for image processing.


Author(s):  
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.


Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


Author(s):  
Adrian Munera ◽  
Sara Royuela ◽  
Germán Llort ◽  
Estanislao Mercadal ◽  
Franck Wartel ◽  
...  

Electronics ◽  
2018 ◽  
Vol 7 (10) ◽  
pp. 241 ◽  
Author(s):  
Arthur Rosa ◽  
Matheus Silva ◽  
Marcos Campos ◽  
Renato Santana ◽  
Welbert Rodrigues ◽  
...  

In this work, a new real-time Simulation method is designed for nonlinear control techniques applied to power converters. We propose two different implementations: in the first one (Single Hardware in The Loop: SHIL), both model and control laws are inserted in the same Digital Signal Processor (DSP), and in the second approach (Double Hardware in The Loop: DHIL), the equations are loaded in different embedded systems. With this methodology, linear and nonlinear control techniques can be designed and compared in a quick and cheap real-time realization of the proposed systems, ideal for both students and engineers who are interested in learning and validating converters performance. The methodology can be applied to buck, boost, buck-boost, flyback, SEPIC and 3-phase AC-DC boost converters showing that the new and high performance embedded systems can evaluate distinct nonlinear controllers. The approach is done using matlab-simulink over commodity Texas Instruments Digital Signal Processors (TI-DSPs). The main purpose is to demonstrate the feasibility of proposed real-time implementations without using expensive HIL systems such as Opal-RT and Typhoon-HL.


Ubiquity ◽  
2014 ◽  
Vol 2014 (August) ◽  
pp. 1-13 ◽  
Author(s):  
Mark Silberstein

Energies ◽  
2019 ◽  
Vol 12 (5) ◽  
pp. 804 ◽  
Author(s):  
Hyunsook Shim ◽  
Taeyeon Kim ◽  
Gyunghyun Choi

As quality of life has improved, the need for high-performance building materials that meet specific technological requirements has increased. Residential environments have also changed owing to climate change. A technology roadmap could define and systematically reflect a timeline for the development of future core technologies. The purpose of this research is to build a technology roadmap that could be utilized for the development of technology in the eco-friendly building material industry. This research is composed of multiple analysis processes—patent analysis, Delphi, and analytic hierarchy process analysis—that minimize the uncertainty caused by the lack of information in the eco-friendly construction industry by securing objective future forecast data. Subsequently, the quality function deployment test is implemented to verify the feasibility of the technology roadmap that is constructed. The design of various types of functional, low-carbon building materials could reduce carbon emissions and save energy by ensuring a hazardous-material-free market in the future. This design development roadmap is required to complement this technology roadmap.


Micromachines ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1450
Author(s):  
Xiang Wang ◽  
Zhun Zhang ◽  
Qiang Hao ◽  
Dongdong Xu ◽  
Jiqing Wang ◽  
...  

The hardware security of embedded systems is raising more and more concerns in numerous safety-critical applications, such as in the automotive, aerospace, avionic, and railway systems. Embedded systems are gaining popularity in these safety-sensitive sectors with high performance, low power, and great reliability, which are ideal control platforms for executing instruction operation and data processing. However, modern embedded systems are still exposing many potential hardware vulnerabilities to malicious attacks, including software-level and hardware-level attacks; these can cause program execution failure and confidential data leakage. For this reason, this paper presents a novel embedded system by integrating a hardware-assisted security monitoring unit (SMU), for achieving a reinforced system-on-chip (SoC) on ensuring program execution and data processing security. This architecture design was implemented and evaluated on a Xilinx Virtex-5 FPGA development board. Based on the evaluation of the SMU hardware implementation in terms of performance overhead, security capability, and resource consumption, the experimental results indicate that the SMU does not lead to a significant speed degradation to processor while executing different benchmarks, and its average performance overhead reduces to 2.18% on typical 8-KB I/D-Caches. Security capability evaluation confirms the monitoring effectiveness of SMU against both instruction and data tampering attacks. Meanwhile, the SoC satisfies a good balance between high-security and resource overhead.


2021 ◽  
Vol 7 (3) ◽  
Author(s):  
S.G. Bobkov

The problems of creating of high-performance embedded computing systems based on microprocessors KOMDIV is considered. Processor performance is dependent upon three characteristics: clock cycle, clock cycles per instruction, and instruction count. These characteristics for microprocessors KOMDIV are optimized using parameter performance/power consumption and requirements of embedded systems.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Mouna Baklouti ◽  
Mohamed Abid

To meet the high performance demands of embedded multimedia applications, embedded systems are integrating multiple processing units. However, they are mostly based on custom-logic design methodology. Designing parallel multicore systems using available standards intellectual properties yet maintaining high performance is also a challenging issue. Softcore processors and field programmable gate arrays (FPGAs) are a cheap and fast option to develop and test such systems. This paper describes a FPGA-based design methodology to implement a rapid prototype of parametric multicore systems. A study of the viability of making the SoC using the NIOS II soft-processor core from Altera is also presented. The NIOS II features a general-purpose RISC CPU architecture designed to address a wide range of applications. The performance of the implemented architecture is discussed, and also some parallel applications are used for testing speedup and efficiency of the system. Experimental results demonstrate the performance of the proposed multicore system, which achieves better speedup than the GPU (29.5% faster for the FIR filter and 23.6% faster for the matrix-matrix multiplication).


Sign in / Sign up

Export Citation Format

Share Document