scholarly journals Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

Array ◽  
2021 ◽  
pp. 100101
Author(s):  
Jose Nunez-Yanez ◽  
Mohammad Hosseinabady
2014 ◽  
Vol 82 (1) ◽  
pp. 147-158
Author(s):  
Wilson M. José ◽  
Ana Rita Silva ◽  
Mário P. Véstias ◽  
Horácio C. Neto

Author(s):  
Penporn Koanantakool ◽  
Ariful Azad ◽  
Aydin Buluc ◽  
Dmitriy Morozov ◽  
Sang-Yun Oh ◽  
...  

2004 ◽  
Vol 12 (3) ◽  
pp. 169-183 ◽  
Author(s):  
Alexandros V. Gerbessiotis ◽  
Seung-Yeop Lee

In this work we make a strong case for remote memory access (RMA) as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.


2020 ◽  
Author(s):  
David Moss

Optical artificial neural networks (ONNs) have significant potential for ultra-high computing speed and energy efficiency. We report a new approach to ONNs based on integrated Kerr micro-combs that is programmable, highly scalable and capable of reaching ultra-high speeds, demonstrating the building block of the ONN — a single neuron perceptron — by mapping synapses onto 49 wavelengths to achieve a single-unit throughput of 11.9 Giga-OPS at 8 bits per OP, or 95.2 Gbps. We test the perceptron on handwritten-digit recognition and cancer-cell detection — achieving over 90% and 85% accuracy, respectively. By scaling the perceptron to a deep learning network using off-the-shelf telecom technology we can achieve high throughput operation for matrix multiplication for real-time massive data processing.


2021 ◽  
Author(s):  
David Moss

<p>Optical artificial neural networks (ONNs) have significant potential for ultra-high computing speed and energy efficiency. We report a novel approach to ONNs that uses integrated Kerr optical micro-combs. This approach is programmable and scalable and is capable of reaching ultra-high speeds. We demonstrate the basic building block ONNs — a single neuron perceptron — by mapping synapses onto 49 wavelengths to achieve an operating speed of 11.9 x 10<sup>9</sup> operations per second, or Giga-OPS, at 8 bits per operation, which equates to 95.2 gigabits/s (Gbps). We test the perceptron on handwritten-digit recognition and cancer-cell detection — achieving over 90% and 85% accuracy, respectively. By scaling the perceptron to a deep learning network using off-the-shelf telecom technology we can achieve high throughput operation for matrix multiplication for real-time massive data processing. </p>


2020 ◽  
Author(s):  
David Moss ◽  
xingyuan xu ◽  
mengxi tan ◽  
Jiayang Wu ◽  
Roberto Morandotti

<p><b>Optical artificial neural networks (ONNs) — analog computing hardware tailored for machine learning — have significant potential for ultra-high computing speed and energy efficiency. We propose a new approach to architectures for ONNs based on integrated Kerr micro-comb sources that is programmable, highly scalable and capable of reaching ultra-high speeds. We experimentally demonstrate the building block of the ONN — a single neuron perceptron — by mapping synapses onto 49 wavelengths of a micro-comb to achieve a high single-unit throughput of 11.9 Giga-FLOPS at 8 bits per FLOP, corresponding to 95.2 Gbps. We test the perceptron on simple standard benchmark datasets — handwritten-digit recognition and cancer-cell detection — achieving over 90% and 85% accuracy, respectively. This performance is a direct result of the record small wavelength spacing (49GHz) for a coherent integrated microcomb source, which results in an unprecedented number of wavelengths for neuromorphic optics. Finally, we propose an approach to scaling the perceptron to a deep learning network using the same single micro-comb device and standard off-the-shelf telecommunications technology, for high-throughput operation involving full matrix multiplication for applications such as real-time massive data processing for unmanned vehicle and aircraft tracking. </b></p>


Sign in / Sign up

Export Citation Format

Share Document