Three-Level Parallelism for FDK Algorithm Using Multi-GPU Based Cluster System

This article presents the design of a specific unmanned aerial vehicle UAV prototype own building. Our UAV is a flying wing type and is able to take off with a little boost. This system happily combines some major advantages taken from planes namely the ability to fly horizontal, at a constant altitude and of course, the great advantage of a long flight-time. The aerodynamic models presented in this paper are optimized to improve the operational performance of this aerial vehicle, especially in terms of stability and the possibility of a long gliding flight-time. Both aspects are very important for the increasing of the goals� efficiency and for the getting work jobs. The presented simulations were obtained using ANSYS 13 installed on our university� cluster system. In a next step the numerical results will be compared with those during experimental flights. This paper presents the main results obtained from numerical simulations and the obtained magnitudes of the main flight coefficients.

Download Full-text

The Globular Cluster System in the Inner Region of M87

The Astrophysical Journal ◽

10.1086/306865 ◽

1999 ◽

Vol 513 (2) ◽

pp. 733-751 ◽

Cited By ~ 107

Author(s):

Arunav Kundu ◽

Bradley C. Whitmore ◽

William B. Sparks ◽

F. Duccio Macchetto ◽

Stephen E. Zepf ◽

...

Keyword(s):

Globular Cluster ◽

Cluster System ◽

Globular Cluster System ◽

Inner Region

Download Full-text

Gang scheduling in a two-cluster system implementing migrations and periodic feedback

SIMULATION ◽

10.1177/0037549710371218 ◽

2010 ◽

Vol 87 (12) ◽

pp. 1021-1031 ◽

Cited By ~ 5

Author(s):

Zafeirios C Papazachos ◽

Helen D Karatza

Keyword(s):

Cluster System ◽

Gang Scheduling ◽

Periodic Feedback

Download Full-text

A compiler framework for extracting superword level parallelism

ACM SIGPLAN Notices ◽

10.1145/2345156.2254106 ◽

2012 ◽

Vol 47 (6) ◽

pp. 347-358 ◽

Cited By ~ 2

Author(s):

Jun Liu ◽

Yuanrui Zhang ◽

Ohyoung Jang ◽

Wei Ding ◽

Mahmut Kandemir

Keyword(s):

Compiler Framework ◽

Level Parallelism

Download Full-text

The Milky Way globular cluster system within the context of a closed box

Astrophysics and Space Science ◽

10.1007/s10509-020-03897-0 ◽

2020 ◽

Vol 365 (12) ◽

Author(s):

Graeme H. Smith

Keyword(s):

Globular Cluster ◽

Milky Way ◽

Cluster System ◽

Globular Cluster System

Download Full-text

Microarchitectural Characterization on a Mobile Workload

Applied Sciences ◽

10.3390/app11031225 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1225

Author(s):

Woohyong Lee ◽

Jiyoung Lee ◽

Bo Kyung Park ◽

R. Young Chul Kim

Keyword(s):

Performance Monitoring ◽

Performance Metrics ◽

Performance Comparison ◽

Instruction Level Parallelism ◽

Data Set ◽

Performance Events ◽

Hardware Performance Counters ◽

On Chip ◽

The Comparative Study ◽

Level Parallelism

Geekbench is one of the most referenced cross-platform benchmarks in the mobile world. Most of its workloads are synthetic but some of them aim to simulate real-world behavior. In the mobile world, its microarchitectural behavior has been reported rarely since the hardware profiling features are limited to the public. As a popular mobile performance workload, it is hard to find Geekbench’s microarchitecture characteristics in mobile devices. In this paper, a thorough experimental study of Geekbench performance characterization is reported with detailed performance metrics. This study also identifies mobile system on chip (SoC) microarchitecture impacts, such as the cache subsystem, instruction-level parallelism, and branch performance. After the study, we could understand the bottleneck of workloads, especially in the cache sub-system. This means that the change of data set size directly impacts performance score significantly in some systems and will ruin the fairness of the CPU benchmark. In the experiment, Samsung’s Exynos9820-based platform was used as the tested device with Android Native Development Kit (NDK) built binaries. The Exynos9820 is a superscalar processor capable of dual issuing some instructions. To help performance analysis, we enable the capability to collect performance events with performance monitoring unit (PMU) registers. The PMU is a set of hardware performance counters which are built into microprocessors to store the counts of hardware-related activities. Throughout the experiment, functional and microarchitectural performance profiles were fully studied. This paper describes the details of the mobile performance studies above. In our experiment, the ARM DS5 tool was used for collecting runtime PMU profiles including OS-level performance data. After the comparative study is completed, users will understand more about the mobile architecture behavior, and this will help to evaluate which benchmark is preferable for fair performance comparison.

Download Full-text

UltraSynth: Insights of a CGRA Integration into a Control Engineering Environment

Journal of Signal Processing Systems ◽

10.1007/s11265-021-01641-7 ◽

2021 ◽

Author(s):

Dennis Wolf ◽

Andreas Engel ◽

Tajas Ruschke ◽

Andreas Koch ◽

Christian Hochberger

Keyword(s):

Computing System ◽

Coarse Grained ◽

Instruction Level Parallelism ◽

Control Engineering ◽

Processing Elements ◽

Actual Application ◽

Reconfigurable Arrays ◽

Engineering Environment ◽

On Chip ◽

Level Parallelism

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.

Download Full-text

Introducing multi-level parallelism, at coarse, fine and instruction level to enhance the performance of iterative solvers for large sparse linear systems on Multi- and Many-core architecture

2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar) ◽

10.1109/llvmhpchipar51896.2020.00014 ◽

2020 ◽

Author(s):

Jean-Marc Gratien

Keyword(s):

Linear Systems ◽

Iterative Solvers ◽

Sparse Linear Systems ◽

Multi Level ◽

Many Core ◽

Level Parallelism

Download Full-text