Peak Performance Model for a Custom Precision Floating-Point Dot Product on FPGAs

Author(s):  
Manfred Mücke ◽  
Bernd Lesser ◽  
Wilfried N. Gansterer
IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Yarib Nevarez ◽  
David Rotermund ◽  
Klaus R. Pawelzik ◽  
Alberto Garcia-Ortiz

2016 ◽  
Vol 63 (3) ◽  
pp. 370-378 ◽  
Author(s):  
Jongwook Sohn ◽  
Earl E. Swartzlander
Keyword(s):  

2019 ◽  
pp. 169-206
Author(s):  
Patricia Melton Allen ◽  
Frances E. Alston ◽  
Emily Millikin DeKerchove

2009 ◽  
Vol 17 (1-2) ◽  
pp. 199-208 ◽  
Author(s):  
Olaf Lubeck ◽  
Michael Lang ◽  
Ram Srinivasan ◽  
Greg Johnson

The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.


Sign in / Sign up

Export Citation Format

Share Document