Peak Performance Model for a Custom Precision Floating-Point Dot Product on FPGAs

Euro-Par 2010 Parallel Processing Workshops - Lecture Notes in Computer Science ◽

10.1007/978-3-642-21878-1_49 ◽

2011 ◽

pp. 399-406 ◽

Author(s):

Manfred Mücke ◽

Bernd Lesser ◽

Wilfried N. Gansterer

Keyword(s):

Performance Model ◽

Floating Point ◽

Peak Performance ◽

Download Full-text

Accelerating Spike-by-Spike Neural Networks on FPGA with Hybrid Custom Floating-Point and Logarithmic Dot-Product Approximation

IEEE Access ◽

10.1109/access.2021.3085216 ◽

2021 ◽

pp. 1-1

Author(s):

Yarib Nevarez ◽

David Rotermund ◽

Klaus R. Pawelzik ◽

Alberto Garcia-Ortiz

Keyword(s):

Neural Networks ◽

Floating Point ◽

Download Full-text

Optimized Fused Floating-Point Many-Term Dot-Product Hardware for Machine Learning Accelerators

2019 IEEE 26th Symposium on Computer Arithmetic (ARITH) ◽

10.1109/arith.2019.00021 ◽

2019 ◽

Author(s):

Himanshu Kaul ◽

Mark Anders ◽

Sanu Mathew ◽

Seongjong Kim ◽

Ram Krishnamurthy

Keyword(s):

Machine Learning ◽

Floating Point ◽

Download Full-text

An FPGA-based floating-point processor array supporting a high-precision dot product

2006 IEEE International Conference on Field Programmable Technology ◽

10.1109/fpt.2006.270337 ◽

2006 ◽

Author(s):

Fritz Mayer-Lindenberg ◽

Valerij Beller

Keyword(s):

High Precision ◽

Floating Point ◽

Processor Array ◽

Download Full-text

Improved Architectures for a Floating-Point Fused Dot Product Unit

2013 IEEE 21st Symposium on Computer Arithmetic ◽

10.1109/arith.2013.26 ◽

2013 ◽

Author(s):

Jongwook Sohn ◽

E. E. Swartzlander

Keyword(s):

Floating Point ◽

Download Full-text

A high speed floating point dot product unit

2014 International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT) ◽

10.1109/icicict.2014.6781299 ◽

2014 ◽

Author(s):

Akash Kumar Gupta ◽

Birendra Biswal

Keyword(s):

Floating Point ◽

Download Full-text

A Fused Floating-Point Four-Term Dot Product Unit

IEEE Transactions on Circuits and Systems I Regular Papers ◽

10.1109/tcsi.2016.2525042 ◽

2016 ◽

Vol 63 (3) ◽

pp. 370-378 ◽

Author(s):

Jongwook Sohn ◽

Earl E. Swartzlander

Keyword(s):

Floating Point ◽

Download Full-text

A new architecture for accurate dot product of floating point numbers

The 2010 International Conference on Computer Engineering & Systems ◽

10.1109/icces.2010.5674841 ◽

2010 ◽

Author(s):

Ahmad M. Zaki ◽

Mohamed H. El-Shafey ◽

Ayman M. Bahaa Eldin ◽

Gamal M. Ali

Keyword(s):

Floating Point ◽

Dot Product ◽

Floating Point Numbers

Download Full-text

Accurate summation, dot product and polynomial evaluation in complex floating point arithmetic

Information and Computation ◽

10.1016/j.ic.2011.09.003 ◽

2012 ◽

Vol 216 ◽

pp. 57-71 ◽

Author(s):

Stef Graillat ◽

Valérie Ménissier-Morain

Keyword(s):

Floating Point ◽

Polynomial Evaluation ◽

Floating Point Arithmetic ◽

Dot Product ◽

Point Arithmetic

Download Full-text

The peak performance model applied to Spud’s Chemical Company, LLC

Peak Performance ◽

10.1201/9780429451508-9 ◽

2019 ◽

pp. 169-206

Author(s):

Patricia Melton Allen ◽

Frances E. Alston ◽

Emily Millikin DeKerchove

Keyword(s):

Performance Model ◽

Peak Performance ◽

Chemical Company

Download Full-text

Implementation and Performance Modeling of Deterministic Particle Transport (Sweep3D) on the IBM Cell/B.E.

Scientific Programming ◽

10.1155/2009/784153 ◽

2009 ◽

Vol 17 (1-2) ◽

pp. 199-208 ◽

Author(s):

Olaf Lubeck ◽

Michael Lang ◽

Ram Srinivasan ◽

Greg Johnson

Keyword(s):

Message Passing ◽

Programming Model ◽

Performance Model ◽

Floating Point ◽

Design Parameters ◽

Data Movement ◽

Architecture Model ◽

Future Design ◽

Scientific Simulations ◽

And Performance

The IBM Cell Broadband Engine (BE) is a novel multi-core chip with the potential for the demanding floating point performance that is required for high-fidelity scientific simulations. However, data movement within the chip can be a major challenge to realizing the benefits of the peak floating point rates. In this paper, we present the results of implementing Sweep3D on the Cell/B.E. using an intra-chip message passing model that minimizes data movement. We compare the advantages/disadvantages of this programming model with a previous implementation using a master–worker threading strategy. We apply a previously validated micro-architecture performance model for the application executing on the Cell/B.E. (based on our previous work in Monte Carlo performance models), that predicts overall CPI (cycles per instruction), and gives a detailed breakdown of processor stalls. Finally, we use the micro-architecture model to assess the performance of future design parameters for the Cell/B.E. micro-architecture. The methodologies and results have broader implications that extend to multi-core architectures.

Download Full-text