BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.

Download Full-text

Advancing the state of the art in high-performance logic and array technology

IBM Journal of Research and Development ◽

10.1147/rd.365.0821 ◽

1992 ◽

Vol 36 (5) ◽

pp. 821-828 ◽

Cited By ~ 9

Author(s):

K. H. Brown ◽

D. A. Grose ◽

R. C. Lange ◽

T. H. Ning ◽

P. A. Totta

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Array Technology

Download Full-text

Resilient gossip-inspired all-reduce algorithms for high-performance computing: Potential, limitations, and open questions

The International Journal of High Performance Computing Applications ◽

10.1177/1094342018762531 ◽

2018 ◽

Vol 33 (2) ◽

pp. 366-383

Author(s):

Marc Casas ◽

Wilfried N Gansterer ◽

Elias Wimmer

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

The State ◽

Reduction Algorithm ◽

Data Corruption ◽

Parallel Reduction ◽

Open Questions ◽

Performance Computing

We investigate the usefulness of gossip-based reduction algorithms in a high-performance computing (HPC) context. We compare them to state-of-the-art deterministic parallel reduction algorithms in terms of fault tolerance and resilience against silent data corruption (SDC) as well as in terms of performance and scalability. New gossip-based reduction algorithms are proposed, which significantly improve the state-of-the-art in terms of resilience against SDC. Moreover, a new gossip-inspired reduction algorithm is proposed, which promises a much more competitive runtime performance in an HPC context than classical gossip-based algorithms, in particular for low accuracy requirements.

Download Full-text

Ultra-high-performance microscope objectives: the state of the art in design, manufacturing, and testing

10.1117/12.692202 ◽

2007 ◽

Author(s):

Thomas Sure ◽

Lambert Danner ◽

Peter Euteneuer ◽

Gerhard Hoppen ◽

Armin Pausch ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State

Download Full-text

The State-of-the-Art Trends in Education Strategy for Sustainable Development of the High Performance Computing Ecosystem

Communications in Computer and Information Science - Supercomputing ◽

10.1007/978-3-319-71255-0_40 ◽

2017 ◽

pp. 494-504 ◽

Cited By ~ 1

Author(s):

Sergey Mosin

Keyword(s):

Sustainable Development ◽

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

The State ◽

Education Strategy ◽

Performance Computing

Download Full-text

Efficient Hardware Implementations of Binary-to-BCD Conversion Schemes for Decimal Multiplication

Journal of Circuits System and Computers ◽

10.1142/s021812661550019x ◽

2014 ◽

Vol 24 (02) ◽

pp. 1550019

Author(s):

Osama Al-Khaleel ◽

Zakaria Al-Qudah ◽

Mohammad Al-Khaleel ◽

Raed Bani-Hani ◽

Christos Papachristou ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Partial Product ◽

Hardware Implementations ◽

Array Multipliers ◽

Decimal Multiplication ◽

Multiplier Circuit

This paper proposes two high performance binary-to-binary coded decimal (BCD) conversion algorithms for use in BCD multiplication. These algorithms are based on splitting the 7-bit binary partial product of two BCD digits into two groups, computing the contribution of each group to the equivalent BCD partial product, and adding these contributions to compute the final BCD partial product. Designs for the proposed architectures and their implementations targeting both ASIC and FPGA are compared with others. Implementations of BCD array multipliers using both our conversion circuits and existing conversion circuits have been performed. The synthesis results for both ASIC and FPGA show that the proposed designs are faster and occupying less area than the state-of-the-art conversion circuits. Furthermore, the results obtained from comparing BCD multipliers of various sizes show that the enhancement in the area of the conversion circuit grows into a sizable area improvement in the multiplier circuit.

Download Full-text

The state-of-the-art mobility enhancing schemes for high-performance logic CMOS technologies

2008 9th International Conference on Solid-State and Integrated-Circuit Technology ◽

10.1109/icsict.2008.4734481 ◽

2008 ◽

Author(s):

Steve S. Chung

Keyword(s):

High Performance ◽

State Of The Art ◽

The State

Download Full-text

AutoFolio: An Automatically Configured Algorithm Selector (Extended Abstract)

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/715 ◽

2017 ◽

Cited By ~ 4

Author(s):

Marius Lindauer ◽

Frank Hutter ◽

Holger H. Hoos ◽

Torsten Schaub

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Problem Instance ◽

Algorithm Selection ◽

Algorithm Configuration ◽

Optimal Values ◽

Art Performance ◽

The One

Algorithm selection (AS) techniques -- which involve choosing from a set of algorithms the one expected to solve a given problem instance most efficiently -- have substantially improved the state of the art in solving many prominent AI problems, such as SAT, CSP, ASP, MAXSAT and QBF. Although several AS procedures have been introduced, not too surprisingly, none of them dominates all others across all AS scenarios. Furthermore, these procedures have parameters whose optimal values vary across AS scenarios. In this extended abstract of our 2015 JAIR article of the same title, we summarize AutoFolio, which uses an algorithm configuration procedure to automatically select an AS approach and optimize its parameters for a given AS scenario. AutoFolio allows researchers and practitioners across a broad range of applications to exploit the combined power of many different AS methods and to automatically construct high-performance algorithm selectors. We demonstrate that AutoFolio was able to produce new state-of-the-art algorithm selectors for 7 well-studied AS scenarios and matches state-of-the-art performance statistically on all other scenarios. Compared to the best single algorithm for each AS scenario, AutoFolio achieved average speedup factors between 1.3 and 15.4.

Download Full-text

Energy Efficiency Improvement in DC Railway Systems: The State of the Art

Newest Updates in Physical Science Research Vol. 5 ◽

10.9734/bpi/nupsr/v5/8457d ◽

2021 ◽

pp. 66-106

Author(s):

Mihaela Popescu ◽

Alexandru Bitoleanu

Keyword(s):

Energy Efficiency ◽

State Of The Art ◽

The State ◽

Efficiency Improvement ◽

Railway Systems ◽

Energy Efficiency Improvement

Download Full-text

Ultra High Performance Microscope Objective - The State of the Art in Design, Manufacturing, and Testing

International Optical Design ◽

10.1364/iodc.2006.md2 ◽

2006 ◽

Cited By ~ 1

Author(s):

Thomas Sure ◽

Peter Euteneuer ◽

Armin Pausch ◽

Lambert Danner ◽

Gerhardt Hoppen ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Microscope Objective

Download Full-text

High-performance conjugated polymer donor materials for polymer solar cells with narrow-bandgap nonfullerene acceptors

Energy & Environmental Science ◽

10.1039/c9ee02531f ◽

2019 ◽

Vol 12 (11) ◽

pp. 3225-3246 ◽

Cited By ~ 64

Author(s):

Chaohua Cui ◽

Yongfang Li

Keyword(s):

Solar Cells ◽

Conjugated Polymer ◽

High Performance ◽

State Of The Art ◽

Polymer Solar Cells ◽

The State ◽

High Performance Polymer ◽

Narrow Bandgap

The state-of-the-art conjugated polymer donor materials for high-performance polymer solar cells based on narrow-bandgap nonfullerene acceptors are summarized and discussed.

Download Full-text