Tutorial on high-performance theorem provers: Efficient implementation and parallelisation

Author(s):  
J. Schumann ◽  
R. Letz ◽  
F. Kurfess

Author(s):  
Hadise Ramezani ◽  
Majid Mohammadi ◽  
Amir Sabbagh Molahosseini

The approximate computing is an alternative computing approach which can lead to high-performance implementation of audio and image processing as well as deep learning applications. However, most of the available approximate adders have been designed using application specific integrated circuits (ASICs), and they would not result in an efficient implementation on field programmable gate arrays (FPGAs). In this paper, we have designed a new approximate adder customized for efficient implementation on FPGAs, and then it has been used to build the Gaussian filter. The experimental results of the implementation of Gaussian filter based on the proposed approximate adder on a Virtex-7 FPGA, indicated that the resource utilization has decreased by 20-51%, and the designed filter delay based on the modified design methodology for building approximate adders for FPGA-based systems (MDeMAS) adder has improved 10-35%, due to the obtained output quality.



Author(s):  
И.В. Афанасьев

В данной статье описан подход к созданию прототипа графового фреймворка VGL (Vector Graph Library), нацеленного на эффективную реализацию графовых алгоритмов для современной векторной архитектуры NEC SX–Aurora TSUBASA. Современные векторные системы позволяют значительно ускорять приложения, интенсивно использующие подсистему памяти, подклассом которых являются графовые алгоритмы. Однако подходы к эффективной реализации графовых алгоритмов для векторных систем на сегодняшний день исследованы крайне слабо: вследствие сильно нерегулярной структуры графов реального мира, эффективно задействовать векторные особенности целевых платформ затруднительно. В работе показано, что разработанные на основе предложенного фреймворка VGL реализации графовых алгоритмов не уступают в производительности оптимизированным “вручную” аналогам за счет инкапсуляции большого числа оптимизаций графовых алгоритмов, характерных для векторных систем. Вместе с этим предложенный фреймворк позволяет значительно упростить процесс разработки графовых алгоритмов для векторных систем, на порядок сокращая объем кода реализуемых алгоритмов и скрывая от пользователя особенности программирования систем данного класса. This article describes a prototype of graph-processing framework VGL (Vector Graph Library), aimed at the efficient implementation of graph algorithms for the modern NEC SX–Aurora TSUBASA vector architecture. Present day vector systems can significantly speed up various memory-intensive applications, including graph algorithms. However, approaches to the efficient implementation of graph algorithms for vector systems have been studied extremely poorly as of today: due to the highly irregular structure of real-world graphs, it is difficult to effectively use vector features of target platforms. This paper shows that the implementations of graph algorithms developed on the basis of the proposed VGL framework show the performance comparable to their manually optimized versions due to the encapsulation of a large number of graph algorithm optimizations typical for vector systems. At the same time, the proposed framework makes it possible to significantly simplify the process of developing graph algorithms for vector systems, by an order of magnitude reducing the amount of code for implemented algorithms and hiding the programming features of systems of this class from the user.



2013 ◽  
Vol 22 (05) ◽  
pp. 1350034 ◽  
Author(s):  
HAMID M. KAMBOH ◽  
SHOAB A. KHAN

Modern field programmable gate arrays (FPGAs) offer built in support for efficient implementation of signal processing algorithms in the form of specialized embedded blocks such as high speed carry chains, specialized shift registers, adders, multiply accumulators (MAC) and block memories. These dedicated elements provide increased computational power and are used for efficient implementation of computationally extensive algorithms. This paper proposes a novel algorithm and architecture for the design and implementation of high performance intermediate frequency (IF) filters on FPGAs. In this research, we have proposed innovative design methodologies for generation of optimal feed forward and recursive architectures to be mapped on a family of FPGAs. Keeping in perspective the limited number of registers within the embedded blocks, the new methodology applies transformations to achieve higher throughput by applying various optimizations to the design algorithm. Implementation options include systolic MAC, transpose direct form MAC, canonic signed digit and distributed arithmetic based filters to suite the most economical FPGA implementation. The paper demonstrates the methodology and shows its applicability by synthesizing the designs and comparing the results to a number of traditional architectures and intellectual property cores. Using Xilinx Virtex-5 FPGA, our results show a throughput improvement between 7% and 30% with an average improvement of 16% over traditional implementations of these designs.





Author(s):  
Hari Shanker Gupta ◽  
Subhananda Chakrabarti ◽  
Maryam Shojaei Baghini ◽  
D. K. Sharma ◽  
A S Kiran Kumar ◽  
...  


Author(s):  
Yujie Wu ◽  
Lei Deng ◽  
Guoqi Li ◽  
Jun Zhu ◽  
Yuan Xie ◽  
...  

Spiking neural networks (SNNs) that enables energy efficient implementation on emerging neuromorphic hardware are gaining more attention. Yet now, SNNs have not shown competitive performance compared with artificial neural networks (ANNs), due to the lack of effective learning algorithms and efficient programming frameworks. We address this issue from two aspects: (1) We propose a neuron normalization technique to adjust the neural selectivity and develop a direct learning algorithm for deep SNNs. (2) Via narrowing the rate coding window and converting the leaky integrate-and-fire (LIF) model into an explicitly iterative version, we present a Pytorch-based implementation method towards the training of large-scale SNNs. In this way, we are able to train deep SNNs with tens of times speedup. As a result, we achieve significantly better accuracy than the reported works on neuromorphic datasets (N-MNIST and DVSCIFAR10), and comparable accuracy as existing ANNs and pre-trained SNNs on non-spiking datasets (CIFAR10). To our best knowledge, this is the first work that demonstrates direct training of deep SNNs with high performance on CIFAR10, and the efficient implementation provides a new way to explore the potential of SNNs.





Sign in / Sign up

Export Citation Format

Share Document