Algorithmic Structures for Realizing Short-Length Circular Convolutions with Reduced Complexity

A set of efficient algorithmic solutions suitable to the fully parallel hardware implementation of the short-length circular convolution cores is proposed. The advantage of the presented algorithms is that they require significantly fewer multiplications as compared to the naive method of implementing this operation. During the synthesis of the presented algorithms, the matrix notation of the cyclic convolution operation was used, which made it possible to represent this operation using the matrix–vector product. The fact that the matrix multiplicand is a circulant matrix allows its successful factorization, which leads to a decrease in the number of multiplications when calculating such a product. The proposed algorithms are oriented towards a completely parallel hardware implementation, but in comparison with a naive approach to a completely parallel hardware implementation, they require a significantly smaller number of hardwired multipliers. Since the wired multiplier occupies a much larger area on the VLSI and consumes more power than the wired adder, the proposed solutions are resource efficient and energy efficient in terms of their hardware implementation. We considered circular convolutions for sequences of lengths N= 2, 3, 4, 5, 6, 7, 8, and 9.

Download Full-text

Some Algorithms for Computing Short-Length Linear Convolution

Electronics ◽

10.3390/electronics9122115 ◽

2020 ◽

Vol 9 (12) ◽

pp. 2115

Author(s):

Aleksandr Cariow ◽

Janusz P. Paplinski

Keyword(s):

Data Processing ◽

Energy Efficient ◽

Hardware Implementation ◽

Building Blocks ◽

Short Length ◽

Complex Data ◽

Linear Convolution ◽

Naive Approach

In this article, we propose a set of efficient algorithmic solutions for computing short linear convolutions focused on hardware implementation in VLSI. We consider convolutions for sequences of length N= 2, 3, 4, 5, 6, 7, and 8. Hardwired units that implement these algorithms can be used as building blocks when designing VLSI -based accelerators for more complex data processing systems. The proposed algorithms are focused on fully parallel hardware implementation, but compared to the naive approach to fully parallel hardware implementation, they require from 25% to about 60% less, depending on the length N and hardware multipliers. Since the multiplier takes up a much larger area on the chip than the adder and consumes more power, the proposed algorithms are resource-efficient and energy-efficient in terms of their hardware implementation.

Download Full-text

Selecting optimal SpMV realizations for GPUs via machine learning

The International Journal of High Performance Computing Applications ◽

10.1177/1094342021990738 ◽

2021 ◽

pp. 109434202199073

Author(s):

Ernesto Dufrechou ◽

Pablo Ezzatti ◽

Enrique S Quintana-Ortí

Keyword(s):

Machine Learning ◽

Sparse Matrix ◽

Machine Learning Techniques ◽

Optimal Method ◽

Learning Techniques ◽

General Rules ◽

Machine Learning Approach ◽

The Matrix ◽

Time And Energy ◽

Matrix Vector

More than 10 years of research related to the development of efficient GPU routines for the sparse matrix-vector product (SpMV) have led to several realizations, each with its own strengths and weaknesses. In this work, we review some of the most relevant efforts on the subject, evaluate a few prominent routines that are publicly available using more than 3000 matrices from different applications, and apply machine learning techniques to anticipate which SpMV realization will perform best for each sparse matrix on a given parallel platform. Our numerical experiments confirm the methods offer such varied behaviors depending on the matrix structure that the identification of general rules to select the optimal method for a given matrix becomes extremely difficult, though some useful strategies (heuristics) can be defined. Using a machine learning approach, we show that it is possible to obtain unexpensive classifiers that predict the best method for a given sparse matrix with over 80% accuracy, demonstrating that this approach can deliver important reductions in both execution time and energy consumption.

Download Full-text

An Energy-Efficient Hardware Implementation of HOG-Based Object Detection at 1080HD 60 fps with Multi-Scale Support

Journal of Signal Processing Systems ◽

10.1007/s11265-015-1080-7 ◽

2015 ◽

Vol 84 (3) ◽

pp. 325-337 ◽

Cited By ~ 27

Author(s):

Amr Suleiman ◽

Vivienne Sze

Keyword(s):

Object Detection ◽

Energy Efficient ◽

Hardware Implementation ◽

Multi Scale

Download Full-text

Application of Spline Collocation Method in Analysis of Beam and Continuous Beam

Journal of Mechanics ◽

10.1017/s1727719100004354 ◽

2003 ◽

Vol 19 (2) ◽

pp. 319-326 ◽

Cited By ~ 1

Author(s):

Lai-Yun Wu ◽

Yang-Tzung Chen

Keyword(s):

Collocation Method ◽

Differential Quadrature Method ◽

Spline Functions ◽

Continuous Beam ◽

Spline Collocation ◽

Spline Collocation Method ◽

The Matrix ◽

Weighting Coefficients ◽

Matrix Vector ◽

Excellent Accuracy

ABSTRACTIn this paper, spline collocation method (SCM) is successfully extended to solve the generalized problems of beam structures. The spline functions in SCM are re-formulated by finite difference method in a systematical way that is easily understood by engineers. The manipulation of SCM is further simplified by the introduction of quintic table so that the matrix-vector governing equation can be easily formulated to solve the weighting coefficients. SCM is first examined by the problems of a generalized single-span beam undergoing various types of loadings and boundary conditions, and it is then extended to the problems of continuous beam with multiple spans. By comparing with available analytical results, differential quadrature method (DQM), if any, excellent accuracy in deflection is achieved.

Download Full-text

Advanced three-dimensional electromagnetic modeling using a nested integral equation approach

Geophysical Journal International ◽

10.1093/gji/ggab072 ◽

2021 ◽

Author(s):

Chaojian Chen ◽

Mikhail Kruglyakov ◽

Alexey Kuvshinov

Keyword(s):

Integral Equation ◽

Three Dimensional ◽

Region Of Interest ◽

Electromagnetic Modeling ◽

Multi Scale ◽

Scale Modeling ◽

Uniform Grids ◽

The Matrix ◽

Integral Equation Approach ◽

Matrix Vector

Summary Most of the existing three-dimensional (3-D) electromagnetic (EM) modeling solvers based on the integral equation (IE) method exploit fast Fourier transform (FFT) to accelerate the matrix-vector multiplications. This in turn requires a laterally-uniform discretization of the modeling domain. However, there is often a need for multi-scale modeling and inversion, for instance, to properly account for the effects of non-uniform distant structures, and at the same time, to accurately model the effects from local anomalies. In such scenarios, the usage of laterally-uniform grids leads to excessive computational loads, both in terms of memory and time. To alleviate this problem, we developed an efficient 3-D EM modeling tool based on a multi-nested IE approach. Within this approach, the IE modeling is first performed at a large domain and on a (laterally-uniform) coarse grid, and then the results are refined in the region of interest by performing modeling at a smaller domain and on a (laterally-uniform) denser grid. At the latter stage, the modeling results obtained at the previous stage are exploited. The lateral uniformity of the grids at each stage allows us to keep using the FFT for the matrix-vector multiplications. An important novelty of the paper is a development of a “rim domain” concept which further improves the performance of the multi-nested IE approach. We verify the developed tool on both idealized and realistic 3-D conductivity models, and demonstrate its efficiency and accuracy.

Download Full-text

Improved Generalized Single-Source Tangential Equivalence Principle Algorithm with Contact-Region Modeling Method for Array Structures

International Journal of Antennas and Propagation ◽

10.1155/2018/9875041 ◽

2018 ◽

Vol 2018 ◽

pp. 1-11

Author(s):

Yao Han ◽

Hanru Shao ◽

Jianfeng Dong

Keyword(s):

Equivalence Principle ◽

Contact Region ◽

Near Field ◽

Single Source ◽

Matrix Vector Multiplication ◽

The Matrix ◽

Array Structures ◽

Fast Multipole Algorithm ◽

Matrix Vector ◽

Equivalence Principle Algorithm

An improved generalized single-source tangential equivalence principle algorithm (GSST-EPA) is proposed for analyzing array structures with connected elements. In order to use the advantages of GSST-EPA, the connected array elements are decomposed and computed by a contact-region modeling (CRM) method, which makes that each element has the same meshes. The unknowns of elements can be transferred onto the equivalence surfaces by GSST-EPA. The scattering matrix in GSST-EPA needs to be solved and stored only once due to the same meshes for each element. The shift invariant of translation matrices is also used to reduce the computation of near-field interaction. Furthermore, the multilevel fast multipole algorithm (MLFMA) is used to accelerate the matrix-vector multiplication in the GSST-EPA. Numerical results are shown to demonstrate the accuracy and efficiency of the proposed method.

Download Full-text

The Impact of Voltage-Frequency Scaling for the Matrix-Vector Product on the IBM POWER8

Euro-Par 2016: Parallel Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-319-43659-3_8 ◽

2016 ◽

pp. 103-116 ◽

Cited By ~ 1

Author(s):

Sandra Catalán ◽

A. Cristiano I. Malossi ◽

Costas Bekas ◽

Enrique S. Quintana-Ortí

Keyword(s):

Vector Product ◽

Frequency Scaling ◽

The Matrix ◽

Voltage Frequency ◽

The Impact ◽

Matrix Vector

Download Full-text

A submatrix algorithm for the matrix-vector multiplication of very large matrices

Journal of Computational Chemistry ◽

10.1002/jcc.540100307 ◽

1989 ◽

Vol 10 (3) ◽

pp. 344-345

Author(s):

Roland Lindh ◽

Per-�rke Malmquist

Keyword(s):

Matrix Vector Multiplication ◽

The Matrix ◽

Matrix Vector

Download Full-text

Novel reconfigurable hardware implementation of polynomial matrix/vector multiplications

2014 International Conference on Field-Programmable Technology (FPT) ◽

10.1109/fpt.2014.7082785 ◽

2014 ◽

Author(s):

Server Kasap ◽

Soydan Redif

Keyword(s):

Hardware Implementation ◽

Polynomial Matrix ◽

Reconfigurable Hardware ◽

Matrix Vector

Download Full-text

REACHABILITY PROBLEMS FOR PRODUCTS OF MATRICES IN SEMIRINGS

International Journal of Algebra and Computation ◽

10.1142/s021819670600313x ◽

2006 ◽

Vol 16 (03) ◽

pp. 603-627 ◽

Cited By ~ 13

Author(s):

STÉPHANE GAUBERT ◽

RICARDO D. KATZ

Keyword(s):

Matrix Product ◽

Reachability Problem ◽

Tropical Semiring ◽

The Matrix ◽

Left And Right ◽

Matrix Vector ◽

Products Of Matrices

We consider the following matrix reachability problem: given r square matrices with entries in a semiring, is there a product of these matrices which attains a prescribed matrix? Similarly, we define the vector (resp. scalar) reachability problem, by requiring that the matrix product, acting by right multiplication on a prescribed row vector, gives another prescribed row vector (resp. when multiplied on the left and right by prescribed row and column vectors, gives a prescribed scalar). We show that over any semiring, scalar reachability reduces to vector reachability which is equivalent to matrix reachability, and that for any of these problems, the specialization to any r ≥ 2 is equivalent to the specialization to r = 2. As an application of these reductions and of a theorem of Krob, we show that when r = 2, the vector and matrix reachability problems are undecidable over the max-plus semiring (ℤ∪{-∞}, max ,+). These reductions also improve known results concerning the classical zero corner problem. Finally, we show that the matrix, vector, and scalar reachability problems are decidable over semirings whose elements are "positive", like the tropical semiring (ℤ∪{+∞}, min ,+).

Download Full-text