Adaptive Computation Reuse for Energy-Efficient Training of Deep Neural Networks

2021 ◽  
Vol 20 (6) ◽  
pp. 1-24
Author(s):  
Jason Servais ◽  
Ehsan Atoofian

In recent years, Deep Neural Networks (DNNs) have been deployed into a diverse set of applications from voice recognition to scene generation mostly due to their high-accuracy. DNNs are known to be computationally intensive applications, requiring a significant power budget. There have been a large number of investigations into energy-efficiency of DNNs. However, most of them primarily focused on inference while training of DNNs has received little attention. This work proposes an adaptive technique to identify and avoid redundant computations during the training of DNNs. Elements of activations exhibit a high degree of similarity, causing inputs and outputs of layers of neural networks to perform redundant computations. Based on this observation, we propose Adaptive Computation Reuse for Tensor Cores (ACRTC) where results of previous arithmetic operations are used to avoid redundant computations. ACRTC is an architectural technique, which enables accelerators to take advantage of similarity in input operands and speedup the training process while also increasing energy-efficiency. ACRTC dynamically adjusts the strength of computation reuse based on the tolerance of precision relaxation in different training phases. Over a wide range of neural network topologies, ACRTC accelerates training by 33% and saves energy by 32% with negligible impact on accuracy.

Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 230
Author(s):  
Jaechan Cho ◽  
Yongchul Jung ◽  
Seongjoo Lee ◽  
Yunho Jung

Binary neural networks (BNNs) have attracted significant interest for the implementation of deep neural networks (DNNs) on resource-constrained edge devices, and various BNN accelerator architectures have been proposed to achieve higher efficiency. BNN accelerators can be divided into two categories: streaming and layer accelerators. Although streaming accelerators designed for a specific BNN network topology provide high throughput, they are infeasible for various sensor applications in edge AI because of their complexity and inflexibility. In contrast, layer accelerators with reasonable resources can support various network topologies, but they operate with the same parallelism for all the layers of the BNN, which degrades throughput performance at certain layers. To overcome this problem, we propose a BNN accelerator with adaptive parallelism that offers high throughput performance in all layers. The proposed accelerator analyzes target layer parameters and operates with optimal parallelism using reasonable resources. In addition, this architecture is able to fully compute all types of BNN layers thanks to its reconfigurability, and it can achieve a higher area–speed efficiency than existing accelerators. In performance evaluation using state-of-the-art BNN topologies, the designed BNN accelerator achieved an area–speed efficiency 9.69 times higher than previous FPGA implementations and 24% higher than existing VLSI implementations for BNNs.


2007 ◽  
Vol 189 (11) ◽  
pp. 4161-4167 ◽  
Author(s):  
Mark D. Farrar ◽  
Karen M. Howson ◽  
Richard A. Bojar ◽  
David West ◽  
James C. Towler ◽  
...  

ABSTRACT Cutaneous propionibacteria are important commensals of human skin and are implicated in a wide range of opportunistic infections. Propionibacterium acnes is also associated with inflammatory acne vulgaris. Bacteriophage PA6 is the first phage of P. acnes to be sequenced and demonstrates a high degree of similarity to many mycobacteriophages both morphologically and genetically. PA6 possesses an icosahedreal head and long noncontractile tail characteristic of the Siphoviridae. The overall genome organization of PA6 resembled that of the temperate mycobacteriophages, although the genome was much smaller, 29,739 bp (48 predicted genes), compared to, for example, 50,550 bp (86 predicted genes) for the Bxb1 genome. PA6 infected only P. acnes and produced clear plaques with turbid centers, but it lacked any obvious genes for lysogeny. The host range of PA6 was restricted to P. acnes, but the phage was able to infect and lyse all P. acnes isolates tested. Sequencing of the PA6 genome makes an important contribution to the study of phage evolution and propionibacterial genetics.


2018 ◽  
Author(s):  
Christopher McComb

The design of a system commits a significant portion of the final cost of that system. Many computational approaches have been developed to assist designers in the analysis (e.g., computational fluid dynamics) and synthesis (e.g., topology optimization) of engineered systems. However, many of these approaches are computationally intensive, taking significant time to complete an analysis and even longer to iteratively synthesize a solution. The current work proposes a methodology for rapidly evaluating and syn- thesizing engineered systems through the use of deep neural networks. The proposed methodology is applied to the analysis and synthesis of offshore structures such as oil platforms. These structures are constructed in a ma- rine environment and are typically designed to achieve specific dynamics in response to a known spectrum of ocean waves. Results show that deep learning can be used to accurately and rapidly synthesize and analyze off- shore structure.


2022 ◽  
Vol 18 (2) ◽  
pp. 1-25
Author(s):  
Saransh Gupta ◽  
Mohsen Imani ◽  
Joonseop Sim ◽  
Andrew Huang ◽  
Fan Wu ◽  
...  

Stochastic computing (SC) reduces the complexity of computation by representing numbers with long streams of independent bits. However, increasing performance in SC comes with either an increase in area or a loss in accuracy. Processing in memory (PIM) computes data in-place while having high memory density and supporting bit-parallel operations with low energy consumption. In this article, we propose COSMO, an architecture for co mputing with s tochastic numbers in me mo ry, which enables SC in memory. The proposed architecture is general and can be used for a wide range of applications. It is a highly dense and parallel architecture that supports most SC encodings and operations in memory. It maximizes the performance and energy efficiency of SC by introducing several innovations: (i) in-memory parallel stochastic number generation, (ii) efficient implication-based logic in memory, (iii) novel memory bit line segmenting, (iv) a new memory-compatible SC addition operation, and (v) enabling flexible block allocation. To show the generality and efficiency of our stochastic architecture, we implement image processing, deep neural networks (DNNs), and hyperdimensional (HD) computing on the proposed hardware. Our evaluations show that running DNN inference on COSMO is 141× faster and 80× more energy efficient as compared to GPU.


Author(s):  
Alistair Baretto ◽  
Noel Pudussery ◽  
Veerasai Subramaniam ◽  
Amroz Siddiqui

The rapid growth that has taken place in Computer Vision has been instrumental in driving the advancement of Image processing techniques and drawing inferences from them. Combined with the enormous capabilities that Deep Neural networks bring to the table, computers can be efficiently trained to automate the tasks and yield accurate and robust results quickly thus optimizing the process. Technological growth has enabled us to bring such computationally intensive tasks to lighter and lower-end mobile devices thus opening up a wide range of possibilities. WebRTC-the open-source web standard enables us to send multimedia-based data from peer to peer paving the way for Real-time Communication over the Web. With this project, we aim to build on one such opportunity that can enable us to perform custom object detection through an android based application installed on our mobile phones. Therefore, our problem statement is to be able to capture real-time feeds, perform custom object detection, generate inference results, and appropriately send intruder alerts when needed. To implement this, we propose a mobile-based over-the-cloud solution that can capitalize on the enormous and encouraging features of the YOLO algorithm and incorporate the functionalities of OpenCV’s DNN module for providing us with fast and correct inferences. Coupled with a good and intuitive UI, we can ensure ease of use of our application.


Author(s):  
Ulas Isildak ◽  
Alessandro Stella ◽  
Matteo Fumagalli

1AbstractBalancing selection is an important adaptive mechanism underpinning a wide range of phenotypes. Despite its relevance, the detection of recent balancing selection from genomic data is challenging as its signatures are qualitatively similar to those left by ongoing positive selection. In this study we developed and implemented two deep neural networks and tested their performance to predict loci under recent selection, either due to balancing selection or incomplete sweep, from population genomic data. Specifically, we generated forward-intime simulations to train and test an artificial neural network (ANN) and a convolutional neural network (CNN). ANN received as input multiple summary statistics calculated on the locus of interest, while CNN was applied directly on the matrix of haplotypes. We found that both architectures have high accuracy to identify loci under recent selection. CNN generally outperformed ANN to distinguish between signals of balancing selection and incomplete sweep and was less affected by incorrect training data. We deployed both trained networks on neutral genomic regions in European populations and demonstrated a lower false positive rate for CNN than ANN. We finally deployed CNN within the MEFV gene region and identified several common variants predicted to be under incomplete sweep in a European population. Notably, two of these variants are functional changes and could modulate susceptibility to Familial Mediterranean Fever, possibly as a consequence of past adaptation to pathogens. In conclusion, deep neural networks were able to characterise signals of selection on intermediate-frequency variants, an analysis currently inaccessible by commonly used strategies.


2020 ◽  
pp. 107754632092914
Author(s):  
Mohammed Alabsi ◽  
Yabin Liao ◽  
Ala-Addin Nabulsi

Deep learning has seen tremendous growth over the past decade. It has set new performance limits for a wide range of applications, including computer vision, speech recognition, and machinery health monitoring. With the abundance of instrumentation data and the availability of high computational power, deep learning continues to prove itself as an efficient tool for the extraction of micropatterns from machinery big data repositories. This study presents a comparative study for feature extraction capabilities using stacked autoencoders considering the use of expert domain knowledge. Case Western Reserve University bearing dataset was used for the study, and a classifier was trained and tested to extract and visualize features from 12 different failure classes. Based on the raw data preprocessing, four different deep neural network structures were studied. Results indicated that integrating domain knowledge with deep learning techniques improved feature extraction capabilities and reduced the deep neural networks size and computational requirements without the need for exhaustive deep neural networks architecture tuning and modification.


2019 ◽  
Vol 28 (06) ◽  
pp. 1960008 ◽  
Author(s):  
Grega Vrbančič ◽  
Iztok Fister ◽  
Vili Podgorelec

Over the past years, the application of deep neural networks in a wide range of areas is noticeably increasing. While many state-of-the-art deep neural networks are providing the performance comparable or in some cases even superior to humans, major challenges such as parameter settings for learning deep neural networks and construction of deep learning architectures still exist. The implications of those challenges have a significant impact on how a deep neural network is going to perform on a specific task. With the proposed method, presented in this paper, we are addressing the problem of parameter setting for a deep neural network utilizing swarm intelligence algorithms. In our experiments, we applied the proposed method variants to the classification task for distinguishing between phishing and legitimate websites. The performance of the proposed method is evaluated and compared against four different phishing datasets, two of which we prepared on our own. The results, obtained from the conducted empirical experiments, have proven the proposed approach to be very promising. By utilizing the proposed swarm intelligence based methods, we were able to statistically significantly improve the predictive performance when compared to the manually tuned deep neural network. In general, the improvement of classification accuracy ranges from 2.5% to 3.8%, while the improvement of F1-score reached even 24% on one of the datasets.


Author(s):  
Kosuke Takagi

Abstract Despite the recent success of deep learning models in solving various problems, their ability is still limited compared with human intelligence, which has the flexibility to adapt to a changing environment. To obtain a model which achieves adaptability to a wide range of problems and tasks is a challenging problem. To achieve this, an issue that must be addressed is identification of the similarities and differences between the human brain and deep neural networks. In this article, inspired by the human flexibility which might suggest the existence of a common mechanism allowing solution of different kinds of tasks, we consider a general learning process in neural networks, on which no specific conditions and constraints are imposed. Subsequently, we theoretically show that, according to the learning progress, the network structure converges to the state, which is characterized by a unique distribution model with respect to network quantities such as the connection weight and node strength. Noting that the empirical data indicate that this state emerges in the large scale network in the human brain, we show that the same state can be reproduced in a simple example of deep learning models. Although further research is needed, our findings provide an insight into the common inherent mechanism underlying the human brain and deep learning. Thus, our findings provide suggestions for designing efficient learning algorithms for solving a wide variety of tasks in the future.


Author(s):  
Ziyuan Zhong ◽  
Yuchi Tian ◽  
Baishakhi Ray

AbstractDeep Neural Networks (DNNs) are being deployed in a wide range of settings today, from safety-critical applications like autonomous driving to commercial applications involving image classifications. However, recent research has shown that DNNs can be brittle to even slight variations of the input data. Therefore, rigorous testing of DNNs has gained widespread attention.While DNN robustness under norm-bound perturbation got significant attention over the past few years, our knowledge is still limited when natural variants of the input images come. These natural variants, e.g., a rotated or a rainy version of the original input, are especially concerning as they can occur naturally in the field without any active adversary and may lead to undesirable consequences. Thus, it is important to identify the inputs whose small variations may lead to erroneous DNN behaviors. The very few studies that looked at DNN’s robustness under natural variants, however, focus on estimating the overall robustness of DNNs across all the test data rather than localizing such error-producing points. This work aims to bridge this gap.To this end, we study the local per-input robustness properties of the DNNs and leverage those properties to build a white-box (DeepRobust-W) and a black-box (DeepRobust-B) tool to automatically identify the non-robust points. Our evaluation of these methods on three DNN models spanning three widely used image classification datasets shows that they are effective in flagging points of poor robustness. In particular, DeepRobust-W and DeepRobust-B are able to achieve an F1 score of up to 91.4% and 99.1%, respectively. We further show that DeepRobust-W can be applied to a regression problem in a domain beyond image classification. Our evaluation on three self-driving car models demonstrates that DeepRobust-W is effective in identifying points of poor robustness with F1 score up to 78.9%.


Sign in / Sign up

Export Citation Format

Share Document