Tutorial on high-performance theorem provers: Efficient implementation and parallelisation

High performance and memory efficient implementation of matrix multiplication on FPGAs

2010 International Conference on Field-Programmable Technology ◽

10.1109/fpt.2010.5681769 ◽

2010 ◽

Cited By ~ 5

Author(s):

Guiming Wu ◽

Yong Dou ◽

Miao Wang

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Efficient Implementation ◽

Memory Efficient

A novel, high performance and power efficient implementation of 8×8 multiplier unit using MT-CMOS technique

2013 Sixth International Conference on Contemporary Computing (IC3) ◽

10.1109/ic3.2013.6612187 ◽

2013 ◽

Cited By ~ 1

Author(s):

N. Rajput ◽

M. Sethi ◽

P. Dobriyal ◽

K. Sharma ◽

G. Sharma

Keyword(s):

High Performance ◽

Efficient Implementation ◽

Power Efficient

An efficient look up table based approximate adder for field programmable gate array

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v25.i1.pp144-151 ◽

2022 ◽

Vol 25 (1) ◽

pp. 144

Author(s):

Hadise Ramezani ◽

Majid Mohammadi ◽

Amir Sabbagh Molahosseini

Keyword(s):

Integrated Circuits ◽

High Performance ◽

Efficient Implementation ◽

Approximate Computing ◽

Gaussian Filter ◽

Gate Arrays ◽

Output Quality ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Application Specific

The approximate computing is an alternative computing approach which can lead to high-performance implementation of audio and image processing as well as deep learning applications. However, most of the available approximate adders have been designed using application specific integrated circuits (ASICs), and they would not result in an efficient implementation on field programmable gate arrays (FPGAs). In this paper, we have designed a new approximate adder customized for efficient implementation on FPGAs, and then it has been used to build the Gaussian filter. The experimental results of the implementation of Gaussian filter based on the proposed approximate adder on a Virtex-7 FPGA, indicated that the resource utilization has decreased by 20-51%, and the designed filter delay based on the modified design methodology for building approximate adders for FPGA-based systems (MDeMAS) adder has improved 10-35%, due to the obtained output quality.

Developing a prototype of high-performance graph-processing framework for NEC SX–Aurora TSUBASA vector architecture

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v21r325 ◽

2020 ◽

pp. 290-305

Author(s):

И.В. Афанасьев

Keyword(s):

Graph Algorithms ◽

High Performance ◽

Graph Algorithm ◽

Efficient Implementation ◽

Graph Processing ◽

Irregular Structure ◽

Vector Systems ◽

Order Of Magnitude ◽

Vector Graph ◽

Processing Framework

В данной статье описан подход к созданию прототипа графового фреймворка VGL (Vector Graph Library), нацеленного на эффективную реализацию графовых алгоритмов для современной векторной архитектуры NEC SX–Aurora TSUBASA. Современные векторные системы позволяют значительно ускорять приложения, интенсивно использующие подсистему памяти, подклассом которых являются графовые алгоритмы. Однако подходы к эффективной реализации графовых алгоритмов для векторных систем на сегодняшний день исследованы крайне слабо: вследствие сильно нерегулярной структуры графов реального мира, эффективно задействовать векторные особенности целевых платформ затруднительно. В работе показано, что разработанные на основе предложенного фреймворка VGL реализации графовых алгоритмов не уступают в производительности оптимизированным “вручную” аналогам за счет инкапсуляции большого числа оптимизаций графовых алгоритмов, характерных для векторных систем. Вместе с этим предложенный фреймворк позволяет значительно упростить процесс разработки графовых алгоритмов для векторных систем, на порядок сокращая объем кода реализуемых алгоритмов и скрывая от пользователя особенности программирования систем данного класса. This article describes a prototype of graph-processing framework VGL (Vector Graph Library), aimed at the efficient implementation of graph algorithms for the modern NEC SX–Aurora TSUBASA vector architecture. Present day vector systems can significantly speed up various memory-intensive applications, including graph algorithms. However, approaches to the efficient implementation of graph algorithms for vector systems have been studied extremely poorly as of today: due to the highly irregular structure of real-world graphs, it is difficult to effectively use vector features of target platforms. This paper shows that the implementations of graph algorithms developed on the basis of the proposed VGL framework show the performance comparable to their manually optimized versions due to the encapsulation of a large number of graph algorithm optimizations typical for vector systems. At the same time, the proposed framework makes it possible to significantly simplify the process of developing graph algorithms for vector systems, by an order of magnitude reducing the amount of code for implemented algorithms and hiding the programming features of systems of this class from the user.

HIGH THROUGHPUT FILTER ARCHITECTURE FOR OPTIMAL FPGA-BASED IMPLEMENTATIONS

Journal of Circuits System and Computers ◽

10.1142/s0218126613500345 ◽

2013 ◽

Vol 22 (05) ◽

pp. 1350034 ◽

Cited By ~ 1

Author(s):

HAMID M. KAMBOH ◽

SHOAB A. KHAN

Keyword(s):

High Speed ◽

High Performance ◽

Intermediate Frequency ◽

Efficient Implementation ◽

Design Algorithm ◽

Average Improvement ◽

Field Programmable ◽

Optimal Feed ◽

Embedded Blocks ◽

Algorithm Implementation

Modern field programmable gate arrays (FPGAs) offer built in support for efficient implementation of signal processing algorithms in the form of specialized embedded blocks such as high speed carry chains, specialized shift registers, adders, multiply accumulators (MAC) and block memories. These dedicated elements provide increased computational power and are used for efficient implementation of computationally extensive algorithms. This paper proposes a novel algorithm and architecture for the design and implementation of high performance intermediate frequency (IF) filters on FPGAs. In this research, we have proposed innovative design methodologies for generation of optimal feed forward and recursive architectures to be mapped on a family of FPGAs. Keeping in perspective the limited number of registers within the embedded blocks, the new methodology applies transformations to achieve higher throughput by applying various optimizations to the design algorithm. Implementation options include systolic MAC, transpose direct form MAC, canonic signed digit and distributed arithmetic based filters to suite the most economical FPGA implementation. The paper demonstrates the methodology and shows its applicability by synthesizing the designs and comparing the results to a number of traditional architectures and intellectual property cores. Using Xilinx Virtex-5 FPGA, our results show a throughput improvement between 7% and 30% with an average improvement of 16% over traditional implementations of these designs.

Efficient Implementation of Multi-View Video Compression for High Performance Application

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2018.0430.04 ◽

2018 ◽

Vol 11 (2) ◽

pp. 28-38

Author(s):

Shaik Rahimunnisha ◽

◽

Ghanta Sudhavani ◽

Keyword(s):

Video Compression ◽

High Performance ◽

Efficient Implementation ◽

High Performance Application

Efficient implementation of high performance Read out Integrated Circuit

2014 IEEE International Conference on Electron Devices and Solid-State Circuits ◽

10.1109/edssc.2014.7061270 ◽

2014 ◽

Author(s):

Hari Shanker Gupta ◽

Subhananda Chakrabarti ◽

Maryam Shojaei Baghini ◽

D. K. Sharma ◽

A S Kiran Kumar ◽

...

Keyword(s):

Integrated Circuit ◽

High Performance ◽

Efficient Implementation

Efficient implementation of rotation operations for high performance QRD-RLS filtering

Proceedings IEEE International Conference on Application-Specific Systems Architectures and Processors ASAP-97 ◽

10.1109/asap.1997.606823 ◽

2002 ◽

Cited By ~ 11

Author(s):

B. Haller ◽

J. Gotze ◽

J.R. Cavallaro

Keyword(s):

High Performance ◽

Efficient Implementation

Direct Training for Spiking Neural Networks: Faster, Larger, Better

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011311 ◽

2019 ◽

Vol 33 ◽

pp. 1311-1318 ◽

Cited By ~ 12

Author(s):

Yujie Wu ◽

Lei Deng ◽

Guoqi Li ◽

Jun Zhu ◽

Yuan Xie ◽

...

Keyword(s):

Neural Networks ◽

High Performance ◽

Large Scale ◽

Learning Algorithm ◽

Efficient Implementation ◽

Spiking Neural Networks ◽

Direct Training ◽

Neuromorphic Hardware ◽

Rate Coding ◽

Neural Selectivity

Spiking neural networks (SNNs) that enables energy efficient implementation on emerging neuromorphic hardware are gaining more attention. Yet now, SNNs have not shown competitive performance compared with artificial neural networks (ANNs), due to the lack of effective learning algorithms and efficient programming frameworks. We address this issue from two aspects: (1) We propose a neuron normalization technique to adjust the neural selectivity and develop a direct learning algorithm for deep SNNs. (2) Via narrowing the rate coding window and converting the leaky integrate-and-fire (LIF) model into an explicitly iterative version, we present a Pytorch-based implementation method towards the training of large-scale SNNs. In this way, we are able to train deep SNNs with tens of times speedup. As a result, we achieve significantly better accuracy than the reported works on neuromorphic datasets (N-MNIST and DVSCIFAR10), and comparable accuracy as existing ANNs and pre-trained SNNs on non-spiking datasets (CIFAR10). To our best knowledge, this is the first work that demonstrates direct training of deep SNNs with high performance on CIFAR10, and the efficient implementation provides a new way to explore the potential of SNNs.

METEORs: High Performance Theorem Provers using Model Elimination

Automated Reasoning Series - Automated Reasoning ◽

10.1007/978-94-011-3488-0_2 ◽

1991 ◽

pp. 31-59 ◽

Cited By ~ 15

Author(s):

O. L. Astrachan ◽

D. W. Loveland

Keyword(s):

High Performance ◽

Model Elimination ◽

Theorem Provers