Large Matrix Multiplication on a Novel Heterogeneous Parallel DSP Architecture

Author(s):  
Joar Sohl ◽  
Jian Wang ◽  
Dake Liu
Author(s):  
D.T.V. Dharmajee Rao ◽  
K.V. Ramana

<p style="text-indent: 1.27cm; margin-bottom: 0.35cm; line-height: 115%;" align="justify"><span style="font-family: Arial,serif;"><span style="font-size: small;"><em>Deep Neural Network training algorithms consumes long training time, especially when the number of hidden layers and nodes is large. Matrix multiplication is the key operation carried out at every node of each layer for several hundreds of thousands of times during the training of Deep Neural Network. Blocking is a well-proven optimization technique to improve the performance of matrix multiplication. Blocked Matrix multiplication algorithms can easily be parallelized to accelerate the performance further. This paper proposes a novel approach of implementing Parallel Blocked Matrix multiplication algorithms to reduce the long training time. The proposed approach was implemented using a parallel programming model OpenMP with collapse() clause for the multiplication of input and weight matrices of Backpropagation and Boltzmann Machine Algorithms for training Deep Neural Network and tested on multi-core processor system. Experimental results showed that the proposed approach achieved approximately two times speedup than classic algorithms.</em></span></span></p>


2013 ◽  
Vol 33 (12) ◽  
pp. 3339-3344 ◽  
Author(s):  
Yuanshuai SUN ◽  
Yao CHEN ◽  
Xinjun GUAN ◽  
Chen LIN

PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0256584
Author(s):  
Sam Pimentel ◽  
Youssef Qranfal

The process of integrating observations into a numerical model of an evolving dynamical system, known as data assimilation, has become an essential tool in computational science. These methods, however, are computationally expensive as they typically involve large matrix multiplication and inversion. Furthermore, it is challenging to incorporate a constraint into the procedure, such as requiring a positive state vector. Here we introduce an entirely new approach to data assimilation, one that satisfies an information measure and uses the unnormalized Kullback-Leibler divergence, rather than the standard choice of Euclidean distance. Two sequential data assimilation algorithms are presented within this framework and are demonstrated numerically. These new methods are solved iteratively and do not require an adjoint. We find them to be computationally more efficient than Optimal Interpolation (3D-Var solution) and the Kalman filter whilst maintaining similar accuracy. Furthermore, these Kullback-Leibler data assimilation (KL-DA) methods naturally embed constraints, unlike Kalman filter approaches. They are ideally suited to systems that require positive valued solutions as the KL-DA guarantees this without need of transformations, projections, or any additional steps. This Kullback-Leibler framework presents an interesting new direction of development in data assimilation theory. The new techniques introduced here could be developed further and may hold potential for applications in the many disciplines that utilize data assimilation, especially where there is a need to evolve variables of large-scale systems that must obey physical constraints.


Information ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 115 ◽  
Author(s):  
Fan Lin ◽  
Yingpin Chen ◽  
Lingzhi Wang ◽  
Yuqun Chen ◽  
Wei Zhu ◽  
...  

The total variation (TV) regularization-based methods are proven to be effective in removing random noise. However, these solutions usually have staircase effects. This paper proposes a new image reconstruction method based on TV regularization with Lp-quasinorm and group gradient sparsity. In this method, the regularization term of the group gradient sparsity can retrieve the neighborhood information of an image gradient, and the Lp-quasinorm constraint can characterize the sparsity of the image gradient. The method can effectively deblur images and remove impulse noise to well preserve image edge information and reduce the staircase effect. To improve the image recovery efficiency, a Fast Fourier Transform (FFT) is introduced to effectively avoid large matrix multiplication operations. Moreover, by introducing accelerated alternating direction method of multipliers (ADMM) in the method to allow for a fast restart of the optimization process, this method can run faster. In numerical experiments on standard test images sourced form Emory University and CVG-UGR (Computer Vision Group, University of Granada) image database, the advantage of the new method is verified by comparing it with existing advanced TV-based methods in terms of peak signal-to-noise ratio (PSNR), structural similarity (SSIM), and operational time.


Author(s):  
John T. Armstrong

One of the most cited papers in the geological sciences has been that of Albee and Bence on the use of empirical " α -factors" to correct quantitative electron microprobe data. During the past 25 years this method has remained the most commonly used correction for geological samples, despite the facts that few investigators have actually determined empirical α-factors, but instead employ tables of calculated α-factors using one of the conventional "ZAF" correction programs; a number of investigators have shown that the assumption that an α-factor is constant in binary systems where there are large matrix corrections is incorrect (e.g, 2-3); and the procedure’s desirability in terms of program size and computational speed is much less important today because of developments in computing capabilities. The question thus exists whether it is time to honorably retire the Bence-Albee procedure and turn to more modern, robust correction methods. This paper proposes that, although it is perhaps time to retire the original Bence-Albee procedure, it should be replaced by a similar method based on compositiondependent polynomial α-factor expressions.


Author(s):  
Yaniv Aspis ◽  
Krysia Broda ◽  
Alessandra Russo ◽  
Jorge Lobo

We introduce a novel approach for the computation of stable and supported models of normal logic programs in continuous vector spaces by a gradient-based search method. Specifically, the application of the immediate consequence operator of a program reduct can be computed in a vector space. To do this, Herbrand interpretations of a propositional program are embedded as 0-1 vectors in $\mathbb{R}^N$ and program reducts are represented as matrices in $\mathbb{R}^{N \times N}$. Using these representations we prove that the underlying semantics of a normal logic program is captured through matrix multiplication and a differentiable operation. As supported and stable models of a normal logic program can now be seen as fixed points in a continuous space, non-monotonic deduction can be performed using an optimisation process such as Newton's method. We report the results of several experiments using synthetically generated programs that demonstrate the feasibility of the approach and highlight how different parameter values can affect the behaviour of the system.


Sign in / Sign up

Export Citation Format

Share Document