Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications

Author(s):  
Daniel Jimenez-Gonzalez ◽  
Xavier Martorell ◽  
Alex Ramirez
2021 ◽  
Vol 13 (2) ◽  
pp. 717
Author(s):  
Alex Karbachevsky ◽  
Chaim Baskin ◽  
Evgenii Zheltonozhskii ◽  
Yevgeny Yermolin ◽  
Freddy Gabbay ◽  
...  

The demand for running NNs in embedded environments has increased significantly in recent years due to the significant success of convolutional neural network (CNN) approaches in various tasks, including image recognition and generation. The task of achieving high accuracy on resource-restricted devices, however, is still considered to be challenging, which is mainly due to the vast number of design parameters that need to be balanced. While the quantization of CNN parameters leads to a reduction of power and area, it can also generate unexpected changes in the balance between communication and computation. This change is hard to evaluate, and the lack of balance may lead to lower utilization of either memory bandwidth or computational resources, thereby reducing performance. This paper introduces a hardware performance analysis framework for identifying bottlenecks in the early stages of CNN hardware design. We demonstrate how the proposed method can help in evaluating different architecture alternatives of resource-restricted CNN accelerators (e.g., part of real-time embedded systems) early in design stages and, thus, prevent making design mistakes.


2009 ◽  
Vol 17 (1-2) ◽  
pp. 173-184 ◽  
Author(s):  
Jusub Kim ◽  
Joseph JaJa

Interactive high quality volume rendering is becoming increasingly more important as the amount of more complex volumetric data steadily grows. While a number of volumetric rendering techniques have been widely used, ray casting has been recognized as an effective approach for generating high quality visualization. However, for most users, the use of ray casting has been limited to datasets that are very small because of its high demands on computational power and memory bandwidth. However the recent introduction of the Cell Broadband Engine (Cell B.E.) processor, which consists of 9 heterogeneous cores designed to handle extremely demanding computations with large streams of data, provides an opportunity to put the ray casting into practical use. In this paper, we introduce an efficient parallel implementation of volume ray casting on the Cell B.E. The implementation is designed to take full advantage of the computational power and memory bandwidth of the Cell B.E. using an intricate orchestration of the ray casting computation on the available heterogeneous resources. Specifically, we introduce streaming model based schemes and techniques to efficiently implement acceleration techniques for ray casting on Cell B.E. In addition to ensuring effective SIMD utilization, our method provides two key benefits: there is no cost for empty space skipping and there is no memory bottleneck on moving volumetric data for processing. Our experimental results show that we can interactively render practical datasets on a single Cell B.E. processor.


Sign in / Sign up

Export Citation Format

Share Document