Sparse and dense matrix multiplication hardware for heterogeneous multi-precision neural networks

In this work we make a strong case for remote memory access (RMA) as the effective way to program a parallel computer by proposing a framework that supports RMA in a library independent, simple and intuitive way. If one uses our approach the parallel code one writes will run transparently under MPI-2 enabled libraries but also bulk-synchronous parallel libraries. The advantage of using RMA is code simplicity, reduced programming complexity, and increased efficiency. We support the latter claims by implementing under this framework a collection of benchmark programs consisting of a communication and synchronization performance assessment program, a dense matrix multiplication algorithm, and two variants of a parallel radix-sort algorithm and examine their performance on a LINUX-based PC cluster under three different RMA enabled libraries: LAM MPI, BSPlib, and PUB. We conclude that implementations of such parallel algorithms using RMA communication primitives lead to code that is as efficient as the message-passing equivalent code and in the case of radix-sort substantially more efficient. In addition our work can be used as a comparative study of the relevant capabilities of the three libraries.

Download Full-text

Architecture for dense matrix multiplication on a high-performance reconfigurable system

Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design Chip on the Dunes - SBCCI '09 ◽

10.1145/1601896.1601950 ◽

2009 ◽

Cited By ~ 2

Author(s):

Viviane L. S. de Souza ◽

Victor W. C. de Medeiros ◽

Manoel E. de Lima

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Dense Matrix ◽

Reconfigurable System

Download Full-text

Kerr microcombs based on soliton crystals for high-speed, scalable optical neural networks

10.36227/techrxiv.13193051 ◽

2020 ◽

Author(s):

David Moss

Keyword(s):

Neural Networks ◽

High Speed ◽

Matrix Multiplication ◽

Cell Detection ◽

New Approach ◽

Digit Recognition ◽

High Speeds ◽

Handwritten Digit ◽

Deep Learning Network ◽

Cancer Cell Detection

Optical artificial neural networks (ONNs) have significant potential for ultra-high computing speed and energy efficiency. We report a new approach to ONNs based on integrated Kerr micro-combs that is programmable, highly scalable and capable of reaching ultra-high speeds, demonstrating the building block of the ONN — a single neuron perceptron — by mapping synapses onto 49 wavelengths to achieve a single-unit throughput of 11.9 Giga-OPS at 8 bits per OP, or 95.2 Gbps. We test the perceptron on handwritten-digit recognition and cancer-cell detection — achieving over 90% and 85% accuracy, respectively. By scaling the perceptron to a deep learning network using off-the-shelf telecom technology we can achieve high throughput operation for matrix multiplication for real-time massive data processing.

Download Full-text

Single perceptron operating at 12 GigaOPs based on a Kerr soliton crystal microcomb for versatile, high-speed, scalable, optical neural networks

10.36227/techrxiv.14095663 ◽

2021 ◽

Author(s):

David Moss

Keyword(s):

Neural Networks ◽

High Speed ◽

Matrix Multiplication ◽

Cell Detection ◽

Digit Recognition ◽

High Speeds ◽

Novel Approach ◽

Handwritten Digit ◽

Deep Learning Network ◽

Cancer Cell Detection

Optical artificial neural networks (ONNs) have significant potential for ultra-high computing speed and energy efficiency. We report a novel approach to ONNs that uses integrated Kerr optical micro-combs. This approach is programmable and scalable and is capable of reaching ultra-high speeds. We demonstrate the basic building block ONNs — a single neuron perceptron — by mapping synapses onto 49 wavelengths to achieve an operating speed of 11.9 x 109 operations per second, or Giga-OPS, at 8 bits per operation, which equates to 95.2 gigabits/s (Gbps). We test the perceptron on handwritten-digit recognition and cancer-cell detection — achieving over 90% and 85% accuracy, respectively. By scaling the perceptron to a deep learning network using off-the-shelf telecom technology we can achieve high throughput operation for matrix multiplication for real-time massive data processing.

Download Full-text

A single photonic perceptron based on a soliton crystal microcomb for scalable, high speed, optical neural networks

10.36227/techrxiv.11925225.v3 ◽

2020 ◽

Author(s):

David Moss ◽

xingyuan xu ◽

mengxi tan ◽

Jiayang Wu ◽

Roberto Morandotti

Keyword(s):

Neural Networks ◽

High Speed ◽

Matrix Multiplication ◽

Direct Result ◽

Cell Detection ◽

Analog Computing ◽

Aircraft Tracking ◽

Benchmark Datasets ◽

Deep Learning Network ◽

Cancer Cell Detection

Optical artificial neural networks (ONNs) — analog computing hardware tailored for machine learning — have significant potential for ultra-high computing speed and energy efficiency. We propose a new approach to architectures for ONNs based on integrated Kerr micro-comb sources that is programmable, highly scalable and capable of reaching ultra-high speeds. We experimentally demonstrate the building block of the ONN — a single neuron perceptron — by mapping synapses onto 49 wavelengths of a micro-comb to achieve a high single-unit throughput of 11.9 Giga-FLOPS at 8 bits per FLOP, corresponding to 95.2 Gbps. We test the perceptron on simple standard benchmark datasets — handwritten-digit recognition and cancer-cell detection — achieving over 90% and 85% accuracy, respectively. This performance is a direct result of the record small wavelength spacing (49GHz) for a coherent integrated microcomb source, which results in an unprecedented number of wavelengths for neuromorphic optics. Finally, we propose an approach to scaling the perceptron to a deep learning network using the same single micro-comb device and standard off-the-shelf telecommunications technology, for high-throughput operation involving full matrix multiplication for applications such as real-time massive data processing for unmanned vehicle and aircraft tracking.

Download Full-text