Toward Multi-FPGA Acceleration of the Neural Networks

2021 ◽  
Vol 17 (2) ◽  
pp. 1-23
Author(s):  
Saman Biookaghazadeh ◽  
Pravin Kumar Ravi ◽  
Ming Zhao

High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.

2020 ◽  
Vol 11 (28) ◽  
pp. 7335-7348 ◽  
Author(s):  
Timothy E. H. Allen ◽  
Andrew J. Wedlake ◽  
Elena Gelžinytė ◽  
Charles Gong ◽  
Jonathan M. Goodman ◽  
...  

Deep learning neural networks, constructed for the prediction of chemical binding at 79 pharmacologically important human biological targets, show extremely high performance on test data (accuracy 92.2 ± 4.2%, MCC 0.814 ± 0.093, ROC-AUC 0.96 ± 0.04).


2018 ◽  
Vol 246 ◽  
pp. 03044 ◽  
Author(s):  
Guozhao Zeng ◽  
Xiao Hu ◽  
Yueyue Chen

Convolutional Neural Networks (CNNs) have become the most advanced algorithms for deep learning. They are widely used in image processing, object detection and automatic translation. As the demand for CNNs continues to increase, the platforms on which they are deployed continue to expand. As an excellent low-power, high-performance, embedded solution, Digital Signal Processor (DSP) is used frequently in many key areas. This paper attempts to deploy the CNN to Texas Instruments (TI)’s TMS320C6678 multi-core DSP and optimize the main operations (convolution) to accommodate the DSP structure. The efficiency of the improved convolution operation has increased by tens of times.


2015 ◽  
Vol 781 ◽  
pp. 624-627 ◽  
Author(s):  
Rati Wongsathan ◽  
Pasit Pothong

Neural Networks (NNs) has emerged as an importance tool for classification in the field of decision making. The main objective of this work is to design the structure and select the optimized parameter in the neural networks to implement the heart disease classifier. Three types of neural networks, i.e. Multi-layered Perceptron Neural Network (MLP-NN), Radial Basis Function Neural Networks (RBF-NN), and Generalized Regression Neural Network (GR-NN) have been used to test the performance of heart disease classification. The classification accuracy obtained by RBFNN gave a very high performance than MLP-NN and GR-NN respectively. The performance of accuracy is very promising compared with the previously reported another type of neural networks.


2021 ◽  
Vol 13 (1) ◽  
pp. 01-08
Author(s):  
Allana dos Santos Campos ◽  
César Alberto Bravo Pariente

Initially, neural networks were developed with the objective of creating a computational system that models the functioning of the human brain, however they started to be used to solve specific tasks. Adaline and Perceptron are two neural networks that calculate an input function using a set of adaptive weights and a bias, despite their similarities, it is known that the Adaline neural network converges to a result more quickly than the Perceptron neural network. This work was designed as a didactic exercise, in order to present how such conclusions are obtained, using the IRIS database as data for classification and training. Throughout the work, the programming languages Processing, was used to develop neural networks, and Python for visual presentation of results. The results found show the high performance of the Adaline neural network over the Perceptron, showing the database classes that can be linearly separated and those that cannot, the metric used to evaluate the performance between the neural networks is defined by the percentage of correct answers in the data classifications. Adaline showed the best performance in the classification for length and width of the petal between the Iris-setosa and Iris-virginica classes among all the other classifications.


2021 ◽  
Author(s):  
Bo Wang ◽  
Eric R Gamazon

Alzheimer's Disease (AD) is a debilitating form of dementia with a high prevalence in the global population and a large burden on the community and health care systems. AD's complex pathobiology consists of extracellular β-amyloid deposition and intracellular hyperphosphorylated tau. Comprehensive mutational analyses can generate a wealth of knowledge about protein properties and enable crucial insights into molecular mechanisms of disease. Deep Mutational Scanning (DMS) has enabled multiplexed measurement of mutational effects on protein properties, including kinematics and self-organization, with unprecedented resolution. However, potential bottlenecks of DMS characterization include experimental design, data quality, and the depth of mutational coverage. Here, we apply Deep Learning to comprehensively model the mutational effect of the AD-associated peptide Aβ42 on aggregation-related biochemical traits from DMS measurements. Among tested neural network architectures, Convolutional Neural Networks (ConvNets) and Recurrent Neural Networks (RNN) are found to be the most cost-effective models with robust high performance even under insufficiently-sampled DMS studies. While sequence features are essential for satisfactory prediction from neural networks, geometric-structural features further enhance the prediction performance. Notably, we demonstrate how mechanistic insights into phenotype may be extracted from the neural networks themselves suitably designed. This methodological benefit is particularly relevant for biochemical systems displaying a strong coupling between structure and phenotype such as the conformation of Aβ42 aggregate and nucleation, as shown here using a Graph Convolutional Neural Network (GCN) developed from the protein atomic structure input. In addition to accurate imputation of missing values (which ranged up to 55% of all phenotype values at key residues), the mutationally-defined nucleation phenotype generated from a GCN shows improved resolution for identifying known disease-causing mutations relative to the original DMS phenotype. Our study suggests that neural network derived sequence-phenotype mapping can be exploited not only to provide direct support for protein engineering or genome editing but also to facilitate therapeutic design with the gained perspectives from biological modeling.


2021 ◽  
Vol 2062 (1) ◽  
pp. 012016
Author(s):  
Sunil Pandey ◽  
Naresh Kumar Nagwani ◽  
Shrish Verma

Abstract The training of deep learning convolutional neural networks is extremely compute intensive and takes long times for completion, on all except small datasets. This is a major limitation inhibiting the widespread adoption of convolutional neural networks in real world applications despite their better image classification performance in comparison with other techniques. Multidirectional research and development efforts are therefore being pursued with the objective of boosting the computational performance of convolutional neural networks. Development of parallel and scalable deep learning convolutional neural network implementations for multisystem high performance computing architectures is important in this background. Prior analysis based on computational experiments indicates that a combination of pipeline and task parallelism results in significant convolutional neural network performance gains of up to 18 times. This paper discusses the aspects which are important from the perspective of implementation of parallel and scalable convolutional neural networks on central processing unit based multisystem high performance computing architectures including computational pipelines, convolutional neural networks, convolutional neural network pipelines, multisystem high performance computing architectures and parallel programming models.


2019 ◽  
Vol 11 (4) ◽  
pp. 86 ◽  
Author(s):  
César Pérez López ◽  
María Delgado Rodríguez ◽  
Sonia de Lucas Santos

The goal of the present research is to contribute to the detection of tax fraud concerning personal income tax returns (IRPF, in Spanish) filed in Spain, through the use of Machine Learning advanced predictive tools, by applying Multilayer Perceptron neural network (MLP) models. The possibilities springing from these techniques have been applied to a broad range of personal income return data supplied by the Institute of Fiscal Studies (IEF). The use of the neural networks enabled taxpayer segmentation as well as calculation of the probability concerning an individual taxpayer’s propensity to attempt to evade taxes. The results showed that the selected model has an efficiency rate of 84.3%, implying an improvement in relation to other models utilized in tax fraud detection. The proposal can be generalized to quantify an individual’s propensity to commit fraud with regards to other kinds of taxes. These models will support tax offices to help them arrive at the best decisions regarding action plans to combat tax fraud.


Entropy ◽  
2021 ◽  
Vol 23 (2) ◽  
pp. 223
Author(s):  
Yen-Ling Tai ◽  
Shin-Jhe Huang ◽  
Chien-Chang Chen ◽  
Henry Horng-Shing Lu

Nowadays, deep learning methods with high structural complexity and flexibility inevitably lean on the computational capability of the hardware. A platform with high-performance GPUs and large amounts of memory could support neural networks having large numbers of layers and kernels. However, naively pursuing high-cost hardware would probably drag the technical development of deep learning methods. In the article, we thus establish a new preprocessing method to reduce the computational complexity of the neural networks. Inspired by the band theory of solids in physics, we map the image space into a noninteraction physical system isomorphically and then treat image voxels as particle-like clusters. Then, we reconstruct the Fermi–Dirac distribution to be a correction function for the normalization of the voxel intensity and as a filter of insignificant cluster components. The filtered clusters at the circumstance can delineate the morphological heterogeneity of the image voxels. We used the BraTS 2019 datasets and the dimensional fusion U-net for the algorithmic validation, and the proposed Fermi–Dirac correction function exhibited comparable performance to other employed preprocessing methods. By comparing to the conventional z-score normalization function and the Gamma correction function, the proposed algorithm can save at least 38% of computational time cost under a low-cost hardware architecture. Even though the correction function of global histogram equalization has the lowest computational time among the employed correction functions, the proposed Fermi–Dirac correction function exhibits better capabilities of image augmentation and segmentation.


2021 ◽  
Vol 26 (1) ◽  
pp. 200-215
Author(s):  
Muhammad Alam ◽  
Jian-Feng Wang ◽  
Cong Guangpei ◽  
LV Yunrong ◽  
Yuanfang Chen

AbstractIn recent years, the success of deep learning in natural scene image processing boosted its application in the analysis of remote sensing images. In this paper, we applied Convolutional Neural Networks (CNN) on the semantic segmentation of remote sensing images. We improve the Encoder- Decoder CNN structure SegNet with index pooling and U-net to make them suitable for multi-targets semantic segmentation of remote sensing images. The results show that these two models have their own advantages and disadvantages on the segmentation of different objects. In addition, we propose an integrated algorithm that integrates these two models. Experimental results show that the presented integrated algorithm can exploite the advantages of both the models for multi-target segmentation and achieve a better segmentation compared to these two models.


2021 ◽  
Vol 13 (11) ◽  
pp. 6194
Author(s):  
Selma Tchoketch_Kebir ◽  
Nawal Cheggaga ◽  
Adrian Ilinca ◽  
Sabri Boulouma

This paper presents an efficient neural network-based method for fault diagnosis in photovoltaic arrays. The proposed method was elaborated on three main steps: the data-feeding step, the fault-modeling step, and the decision step. The first step consists of feeding the real meteorological and electrical data to the neural networks, namely solar irradiance, panel temperature, photovoltaic-current, and photovoltaic-voltage. The second step consists of modeling a healthy mode of operation and five additional faulty operational modes; the modeling process is carried out using two networks of artificial neural networks. From this step, six classes are obtained, where each class corresponds to a predefined model, namely, the faultless scenario and five faulty scenarios. The third step involves the diagnosis decision about the system’s state. Based on the results from the above step, two probabilistic neural networks will classify each generated data according to the six classes. The obtained results show that the developed method can effectively detect different types of faults and classify them. Besides, this method still achieves high performances even in the presence of noises. It provides a diagnosis even in the presence of data injected at reduced real-time, which proves its robustness.


Sign in / Sign up

Export Citation Format

Share Document