Three-Level Parallelism for FDK Algorithm Using Multi-GPU Based Cluster System

Author(s):  
Xing Wei ◽  
Bin Yan ◽  
Lei Li ◽  
Feng Zhang ◽  
Hongkui Liu ◽  
...  
2018 ◽  
Vol 55 (4) ◽  
pp. 652-657 ◽  
Author(s):  
Gabriel Murariu ◽  
Razvan Adrian Mahu ◽  
Adrian Gabriel Murariu ◽  
Mihai Daniel Dragu ◽  
Lucian P. Georgescu ◽  
...  

This article presents the design of a specific unmanned aerial vehicle UAV prototype own building. Our UAV is a flying wing type and is able to take off with a little boost. This system happily combines some major advantages taken from planes namely the ability to fly horizontal, at a constant altitude and of course, the great advantage of a long flight-time. The aerodynamic models presented in this paper are optimized to improve the operational performance of this aerial vehicle, especially in terms of stability and the possibility of a long gliding flight-time. Both aspects are very important for the increasing of the goals� efficiency and for the getting work jobs. The presented simulations were obtained using ANSYS 13 installed on our university� cluster system. In a next step the numerical results will be compared with those during experimental flights. This paper presents the main results obtained from numerical simulations and the obtained magnitudes of the main flight coefficients.


1999 ◽  
Vol 513 (2) ◽  
pp. 733-751 ◽  
Author(s):  
Arunav Kundu ◽  
Bradley C. Whitmore ◽  
William B. Sparks ◽  
F. Duccio Macchetto ◽  
Stephen E. Zepf ◽  
...  

SIMULATION ◽  
2010 ◽  
Vol 87 (12) ◽  
pp. 1021-1031 ◽  
Author(s):  
Zafeirios C Papazachos ◽  
Helen D Karatza

2012 ◽  
Vol 47 (6) ◽  
pp. 347-358 ◽  
Author(s):  
Jun Liu ◽  
Yuanrui Zhang ◽  
Ohyoung Jang ◽  
Wei Ding ◽  
Mahmut Kandemir

2021 ◽  
Vol 11 (3) ◽  
pp. 1225
Author(s):  
Woohyong Lee ◽  
Jiyoung Lee ◽  
Bo Kyung Park ◽  
R. Young Chul Kim

Geekbench is one of the most referenced cross-platform benchmarks in the mobile world. Most of its workloads are synthetic but some of them aim to simulate real-world behavior. In the mobile world, its microarchitectural behavior has been reported rarely since the hardware profiling features are limited to the public. As a popular mobile performance workload, it is hard to find Geekbench’s microarchitecture characteristics in mobile devices. In this paper, a thorough experimental study of Geekbench performance characterization is reported with detailed performance metrics. This study also identifies mobile system on chip (SoC) microarchitecture impacts, such as the cache subsystem, instruction-level parallelism, and branch performance. After the study, we could understand the bottleneck of workloads, especially in the cache sub-system. This means that the change of data set size directly impacts performance score significantly in some systems and will ruin the fairness of the CPU benchmark. In the experiment, Samsung’s Exynos9820-based platform was used as the tested device with Android Native Development Kit (NDK) built binaries. The Exynos9820 is a superscalar processor capable of dual issuing some instructions. To help performance analysis, we enable the capability to collect performance events with performance monitoring unit (PMU) registers. The PMU is a set of hardware performance counters which are built into microprocessors to store the counts of hardware-related activities. Throughout the experiment, functional and microarchitectural performance profiles were fully studied. This paper describes the details of the mobile performance studies above. In our experiment, the ARM DS5 tool was used for collecting runtime PMU profiles including OS-level performance data. After the comparative study is completed, users will understand more about the mobile architecture behavior, and this will help to evaluate which benchmark is preferable for fair performance comparison.


Author(s):  
Dennis Wolf ◽  
Andreas Engel ◽  
Tajas Ruschke ◽  
Andreas Koch ◽  
Christian Hochberger

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.


Sign in / Sign up

Export Citation Format

Share Document