roofline model Latest Research Papers

Continuous enhancements and diversity in modern multi-core hardware, such as wider and deeper core pipelines and memory subsystems, bring to practice a set of hard-to-solve challenges when modeling their upper-bound capabilities and identifying the main application bottlenecks. Insightful roofline models are widely used for this purpose, but the existing approaches overly abstract the micro-architecture complexity, thus providing unrealistic performance bounds that lead to a misleading characterization of real-world applications. To address this problem, the Mansard Roofline Model (MaRM), proposed in this work, uncovers a minimum set of architectural features that must be considered to provide insightful, but yet accurate and realistic, modeling of performance upper bounds for modern processors. By encapsulating the retirement constraints due to the amount of retirement slots, Reorder-Buffer and Physical Register File sizes, the proposed model accurately models the capabilities of a real platform (average rRMSE of 5.4%) and characterizes 12 application kernels from standard benchmark suites. By following a herein proposed MaRM interpretation methodology and guidelines, speed-ups of up to 5× are obtained when optimizing real-world bioinformatic application, as well as a super-linear speedup of 18.5× when parallelized.

Download Full-text

Autotuning Benchmarking Techniques: A Roofline Model Case Study

2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw52791.2021.00119 ◽

2021 ◽

Author(s):

Jacob O. Torring ◽

Jan Christian Meyer ◽

Anne C. Elster

Keyword(s):

Model Case ◽

Roofline Model

Download Full-text

Performance Optimization on GPGPU & Multicore CPU Using Roofline Model

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1152/1/012021 ◽

2021 ◽

Vol 1152 (1) ◽

pp. 012021

Author(s):

Noor M. Allayla ◽

Shefa A. Dawwd

Keyword(s):

Performance Optimization ◽

Roofline Model ◽

Multicore Cpu

Download Full-text

Dynamic Optimizations in GPU using Roofline Model

2021 IEEE International Symposium on Circuits and Systems (ISCAS) ◽

10.1109/iscas51556.2021.9401255 ◽

2021 ◽

Author(s):

Winnie Thomas ◽

Suryakant Toraskar ◽

Virendra Singh

Keyword(s):

Dynamic Optimizations ◽

Roofline Model

Download Full-text

A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model

IEEE Transactions on Computers ◽

10.1109/tc.2021.3111761 ◽

2021 ◽

pp. 1-1

Author(s):

Marco Siracusa ◽

Emanuele Delsozzo ◽

Marco Rabozzi ◽

Lorenzo Di Tucci ◽

Samuel Williams ◽

...

Keyword(s):

Roofline Model

Download Full-text

A CAD-based methodology to optimize HLS code via the roofline model

Proceedings of the 39th International Conference on Computer-Aided Design ◽

10.1145/3400302.3415730 ◽

2020 ◽

Author(s):

Marco Siracusa ◽

Lorenzo Di Tucci ◽

Marco Rabozzi ◽

Samuel Williams ◽

Emanuele Del Sozzo ◽

...

Keyword(s):

Roofline Model

Download Full-text

ESTIMATION OF THE WORKLOAD OF A HYBRID COMPUTING CLUSTER IN TASKS OF MODELING IN MATERIALS SCIENCE

Mathematical modeling in materials science of electronic component ◽

10.29003/m1511.mmmsec-2020/30-33 ◽

2020 ◽

Author(s):

Konstantin Volovich

Keyword(s):

High Performance Computing ◽

High Performance ◽

Materials Science ◽

Cluster Performance ◽

Computing Systems ◽

Hybrid Computing ◽

Methods Of Calculation ◽

Roofline Model ◽

Performance Computing

The article is devoted to methods of calculation and evaluation of the effectiveness of the functioning of hybrid computing systems. The article proposes a method of calculating the value of the workload using peak values of the cluster performance. The results and the quality of the functioning of cloud scientific services of high-performance computing using the roofline model are analyzed.

Download Full-text

Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020965661 ◽

2020 ◽

Vol 35 (1) ◽

pp. 5-19

Author(s):

Dominik Ernst ◽

Georg Hager ◽

Jonas Thies ◽

Gerhard Wellein

Keyword(s):

Code Generation ◽

Large Range ◽

State Of The Art ◽

Matrix Multiplication ◽

Double Precision ◽

General Matrix ◽

Performance Engineering ◽

Key Characteristics ◽

Roofline Model ◽

Mapping Scheme

General matrix-matrix multiplications with double-precision real and complex entries (DGEMM and ZGEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. NVIDIA’s current CUBLAS implementation delivers only a fraction of the potential performance as indicated by the roofline model in this case. We describe the challenges and key characteristics of an implementation that can achieve close to optimal performance. We further evaluate different strategies of parallelization and thread distribution and devise a flexible, configurable mapping scheme. To ensure flexibility and allow for highly tailored implementations we use code generation combined with autotuning. For a large range of matrix sizes in the domain of interest we achieve at least 2/3 of the roofline performance and often substantially outperform state-of-the art CUBLAS results on an NVIDIA Volta GPGPU.

Download Full-text

roofline model
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Evaluating performance of AI operators using roofline model

OpenCL FPGA Optimization guided by memory accesses and roofline model analysis applied to tomography acceleration

Mansard Roofline Model: Reinforcing the Accuracy of the Roofs

Autotuning Benchmarking Techniques: A Roofline Model Case Study

Performance Optimization on GPGPU & Multicore CPU Using Roofline Model

Dynamic Optimizations in GPU using Roofline Model

A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model

A CAD-based methodology to optimize HLS code via the roofline model

ESTIMATION OF THE WORKLOAD OF A HYBRID COMPUTING CLUSTER IN TASKS OF MODELING IN MATERIALS SCIENCE

Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs

Export Citation Format

roofline modelRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Evaluating performance of AI operators using roofline model

OpenCL FPGA Optimization guided by memory accesses and roofline model analysis applied to tomography acceleration

Mansard Roofline Model: Reinforcing the Accuracy of the Roofs

Autotuning Benchmarking Techniques: A Roofline Model Case Study

Performance Optimization on GPGPU & Multicore CPU Using Roofline Model

Dynamic Optimizations in GPU using Roofline Model

A Comprehensive Methodology to Optimize FPGA Designs via the Roofline Model

A CAD-based methodology to optimize HLS code via the roofline model

ESTIMATION OF THE WORKLOAD OF A HYBRID COMPUTING CLUSTER IN TASKS OF MODELING IN MATERIALS SCIENCE

Performance engineering for real and complex tall & skinny matrix multiplication kernels on GPUs

roofline model
Recently Published Documents