scholarly journals A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels

2021 ◽  
Vol 18 (1) ◽  
pp. 1-25
Author(s):  
Lorenz Braun ◽  
Sotirios Nikas ◽  
Chen Song ◽  
Vincent Heuveline ◽  
Holger Fröning
2019 ◽  
Vol 2019 ◽  
pp. 1-19
Author(s):  
Karim M. A. Ali ◽  
Rabie Ben Atitallah ◽  
Abdessamad Ait El Cadi ◽  
Nizar Fakhfakh ◽  
Jean-Luc Dekeyser

Embedded video applications are now involved in sophisticated transportation systems like autonomous vehicles and driver assistance systems. As silicon capacity increases, the design productivity gap grows up for the current available design tools. Hence, high-level synthesis (HLS) tools emerged in order to reduce that gap by shifting the design efforts to higher abstraction levels. In this paper, we present ViPar as a tool for exploring different video processing architectures at higher design level. First, we proposed a parametrizable parallel architectural model dedicated for video applications. Second, targeting this architectural model, we developed ViPar tool with two main features: (1) An empirical model was introduced to estimate the power consumption based on hardware utilization and operating frequency. In addition to that, we derived the equations for estimating the hardware utilization and execution time for each design point during the space exploration process. (2) By defining the main characteristics of the parallel video architecture like parallelism level, the number of input/output ports, the pixel distribution pattern, and so on, ViPar tool can automatically generate the dedicated architecture for hardware implementation. In the experimental validation, we used ViPar tool to generate automatically an efficient hardware implementation for a Multiwindow Sum of Absolute Difference stereo matching algorithm on Xilinx Zynq ZC706 board. We succeeded to increase the design productivity by converging rapidly to the appropriate designs that fit with our system constraints in terms of power consumption, hardware utilization, and frame execution time.


2015 ◽  
Vol 24 (10) ◽  
pp. 1550161 ◽  
Author(s):  
Muhammad Yasir Qadri ◽  
Nadia N. Qadri ◽  
Martin Fleury ◽  
Klaus D. McDonald-Maier

This paper proposes a method of buffering instructions by software-based prefetching. The method allows low-end processors to improve their instruction throughput with a minimum of additional logic and power consumption. Low-end embedded processors do not employ caches for mainly two reasons. The first reason is that the overhead of cache implementation in terms of energy and area is considerable. The second reason is that, because a cache's performance primarily depends on the number of hits, an increasing number of misses could cause a processor to remain in stall mode for a longer duration. As a result, a cache may become more of a liability than an advantage. In contrast, the benchmarked results for the proposed software-based prefetch buffering without a cache show a 5–10% improvement in execution time. They also show a 4% or more reduction in the energy-delay-square-product (ED2P) with a maximum reduction of 40%. The results additionally demonstrate that the performance and efficiency of the proposed architecture scales with the number of multicycle instructions. The benchmarked routines tested to arrive at these results are widely deployed components of embedded applications.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Tarek Frikha ◽  
Faten Chaabane ◽  
Nadhir Aouinti ◽  
Omar Cheikhrouhou ◽  
Nader Ben Amor ◽  
...  

The adoption of Internet of Things (IoT) technology across many applications, such as autonomous systems, communication, and healthcare, is driving the market’s growth at a positive rate. The emergence of advanced data analytics techniques such as blockchain for connected IoT devices has the potential to reduce the cost and increase in cloud platform adoption. Blockchain is a key technology for real-time IoT applications providing trust in distributed robotic systems running on embedded hardware without the need for certification authorities. There are many challenges in blockchain IoT applications such as the power consumption and the execution time. These specific constraints have to be carefully considered besides other constraints such as number of nodes and data security. In this paper, a novel approach is discussed based on hybrid HW/SW architecture and designed for Proof of Work (PoW) consensus which is the most used consensus mechanism in blockchain. The proposed architecture is validated using the Ethereum blockchain with the Keccak 256 and the field-programmable gate array (FPGA) ZedBoard development kit. This implementation shows improvement in execution time of 338% and minimizing power consumption of 255% compared to the use of Nvidia Maxwell GPUs.


Author(s):  
Anwar H. Katrawi ◽  
Rosni Abdullah ◽  
Mohammed Anbar ◽  
Ammar Kamal Abasi

Using MapReduce in Hadoop helps in lowering the execution time and power consumption for large scale data. However, there can be a delay in job processing in circumstances where tasks are assigned to bad or congested machines called "straggler tasks"; which increases the time, power consumptions and therefore increasing the costs and leading to a poor performance of computing systems. This research proposes a hybrid MapReduce framework referred to as the combinatory late-machine (CLM) framework. Implementation of this framework will facilitate early and timely detection and identification of stragglers thereby facilitating prompt appropriate and effective actions.


AIChE Journal ◽  
2008 ◽  
Vol 54 (3) ◽  
pp. 646-656 ◽  
Author(s):  
Alessandro Paglianti ◽  
Maria Fujasova ◽  
Giuseppina Montante

Sign in / Sign up

Export Citation Format

Share Document