scholarly journals On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning

2020 ◽  
Vol 34 (04) ◽  
pp. 3817-3824
Author(s):  
Aritra Dutta ◽  
El Houcine Bergou ◽  
Ahmed M. Abdelmoniem ◽  
Chen-Yu Ho ◽  
Atal Narayan Sahu ◽  
...  

Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model, many practical implementations operate individually on the gradients of each layer of the model.In this paper, we prove that layer-wise compression is, in theory, better, because the convergence rate is upper bounded by that of entire-model compression for a wide range of biased and unbiased compression methods. However, despite the theoretical bound, our experimental study of six well-known methods shows that convergence, in practice, may or may not be better, depending on the actual trained model and compression ratio. Our findings suggest that it would be advantageous for deep learning frameworks to include support for both layer-wise and entire-model compression.

Author(s):  
Oksana Stupak

The multidimensional nature of the notion «system» is conditioned by a wide range of factors applied in philosophical, pedagogical and social studies. The review of scientific and encyclopaedic literature on systemic research confirmed the versatility of approaches to the definition of “system”. The purpose of the article is theoretical analysis of scientific approaches to the concept «system», characterization of system types in the context of pedagogical research. The determined approaches to this category in the scientific theory and practice allow considering the system as an orderly classification of interrelated elements which, interacting with the environment, form a holistic system, constitute a set of interacting elements oriented towards the achievement of a specific goal. It has been defined that a set of components oriented towards the goal determines the basis of the research system and systematic research methods. It has been determined in the article that autonomy in choosing ways of action on the basis of the developed criteria is the characteristic feature of purposeful systems. During the research, the importance of introducing the systematic approach to modern pedagogical science was emphasized, in particular the problem of forming social activity of youth in the institutions of civil society. The systematic approach involves performing a number of tasks: developing system goals; constructing objects as a system; building models of the system; determining system properties; studying the functioning of the system. In the context of our study, it is worth noting that the social, pedagogical, social-pedagogical and innovative systems that justify a number of principles, features and characteristics that are considered relevant during the formation of social activity contribute to the development of the system enabling the development of young people’s social activity. The analysis of the scientific-pedagogical literature made it possible to identify the main characteristics of these systems. According to the results of theoretical analysis, the concept and development stage of the system enabling the formation of young people’s social activity in the institutions of civil society were determined. Keywords: system, systematic approach, pedagogical system, innovative system, social-pedagogical system, youth.


2020 ◽  
pp. 107754632092914
Author(s):  
Mohammed Alabsi ◽  
Yabin Liao ◽  
Ala-Addin Nabulsi

Deep learning has seen tremendous growth over the past decade. It has set new performance limits for a wide range of applications, including computer vision, speech recognition, and machinery health monitoring. With the abundance of instrumentation data and the availability of high computational power, deep learning continues to prove itself as an efficient tool for the extraction of micropatterns from machinery big data repositories. This study presents a comparative study for feature extraction capabilities using stacked autoencoders considering the use of expert domain knowledge. Case Western Reserve University bearing dataset was used for the study, and a classifier was trained and tested to extract and visualize features from 12 different failure classes. Based on the raw data preprocessing, four different deep neural network structures were studied. Results indicated that integrating domain knowledge with deep learning techniques improved feature extraction capabilities and reduced the deep neural networks size and computational requirements without the need for exhaustive deep neural networks architecture tuning and modification.


Author(s):  
Kosuke Takagi

Abstract Despite the recent success of deep learning models in solving various problems, their ability is still limited compared with human intelligence, which has the flexibility to adapt to a changing environment. To obtain a model which achieves adaptability to a wide range of problems and tasks is a challenging problem. To achieve this, an issue that must be addressed is identification of the similarities and differences between the human brain and deep neural networks. In this article, inspired by the human flexibility which might suggest the existence of a common mechanism allowing solution of different kinds of tasks, we consider a general learning process in neural networks, on which no specific conditions and constraints are imposed. Subsequently, we theoretically show that, according to the learning progress, the network structure converges to the state, which is characterized by a unique distribution model with respect to network quantities such as the connection weight and node strength. Noting that the empirical data indicate that this state emerges in the large scale network in the human brain, we show that the same state can be reproduced in a simple example of deep learning models. Although further research is needed, our findings provide an insight into the common inherent mechanism underlying the human brain and deep learning. Thus, our findings provide suggestions for designing efficient learning algorithms for solving a wide variety of tasks in the future.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-24
Author(s):  
Gokul Krishnan ◽  
Sumit K. Mandal ◽  
Manvitha Pannala ◽  
Chaitali Chakrabarti ◽  
Jae-Sun Seo ◽  
...  

In-memory computing (IMC) on a monolithic chip for deep learning faces dramatic challenges on area, yield, and on-chip interconnection cost due to the ever-increasing model sizes. 2.5D integration or chiplet-based architectures interconnect multiple small chips (i.e., chiplets) to form a large computing system, presenting a feasible solution beyond a monolithic IMC architecture to accelerate large deep learning models. This paper presents a new benchmarking simulator, SIAM, to evaluate the performance of chiplet-based IMC architectures and explore the potential of such a paradigm shift in IMC architecture design. SIAM integrates device, circuit, architecture, network-on-chip (NoC), network-on-package (NoP), and DRAM access models to realize an end-to-end system. SIAM is scalable in its support of a wide range of deep neural networks (DNNs), customizable to various network structures and configurations, and capable of efficient design space exploration. We demonstrate the flexibility, scalability, and simulation speed of SIAM by benchmarking different state-of-the-art DNNs with CIFAR-10, CIFAR-100, and ImageNet datasets. We further calibrate the simulation results with a published silicon result, SIMBA. The chiplet-based IMC architecture obtained through SIAM shows 130 and 72 improvement in energy-efficiency for ResNet-50 on the ImageNet dataset compared to Nvidia V100 and T4 GPUs.


Author(s):  
NhatHai Phan ◽  
Minh N. Vu ◽  
Yang Liu ◽  
Ruoming Jin ◽  
Dejing Dou ◽  
...  

In this paper, we propose a novel Heterogeneous Gaussian Mechanism (HGM) to preserve differential privacy in deep neural networks, with provable robustness against adversarial examples. We first relax the constraint of the privacy budget in the traditional Gaussian Mechanism from (0, 1] to (0, infty), with a new bound of the noise scale to preserve differential privacy. The noise in our mechanism can be arbitrarily redistributed, offering a distinctive ability to address the trade-off between model utility and privacy loss. To derive provable robustness, our HGM is applied to inject Gaussian noise into the first hidden layer. Then, a tighter robustness bound is proposed. Theoretical analysis and thorough evaluations show that our mechanism notably improves the robustness of differentially private deep neural networks, compared with baseline approaches, under a variety of model attacks.


Author(s):  
Osval Antonio Montesinos López ◽  
Abelardo Montesinos López ◽  
Jose Crossa

AbstractThis chapter provides elements for implementing deep neural networks (deep learning) for continuous outcomes. We give details of the hyperparameters to be tuned in deep neural networks and provide a general guide for doing this task with more probability of success. Then we explain the most popular deep learning frameworks that can be used to implement these models as well as the most popular optimizers available in many software programs for deep learning. Several practical examples with plant breeding data for implementing deep neural networks in the Keras library are outlined. These examples take into account many components in the predictor as well many hyperparameters (hidden layer, number of neurons, learning rate, optimizers, penalization, etc.) for which we also illustrate how the tuning process can be done to increase the probability of a successful application.


Acta Numerica ◽  
2021 ◽  
Vol 30 ◽  
pp. 203-248
Author(s):  
Mikhail Belkin

In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parametrization enables interpolation and provides flexibility to select a suitable interpolating model.As we will see, just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning. This article is written in the belief and hope that clearer understanding of these issues will bring us a step closer towards a general theory of deep learning and machine learning.


Aviation ◽  
2018 ◽  
Vol 22 (1) ◽  
pp. 6-12 ◽  
Author(s):  
Victor SINEGLAZOV ◽  
Olena CHUMACHENKO ◽  
Vladyslav GORBATIUK

Neural network-based methods such as deep neural networks show great efficiency for a wide range of applications. In this paper, a deep learning-based hybrid approach to forecast the yearly revenue passenger kilometers time series of Australia’s major domestic airlines is proposed. The essence of the approach is to use a resilient error backpropagation algorithm with dropout for “tuning” the polynomial neural network, obtained as a result of a multi-layered GMDH algorithm. The article compares the performance of the suggested algorithm on the time series with other popular forecasting methods: deep belief network, multi-layered GMDH algorithm, Box-Jenkins method and the ANFIS model. The minimum reached MAE of the proposed algorithm was approximately 25% lower than the minimum MAE of the next best method – GMDH, thus indicating that the practical application of the algorithm can give good results compared with other well-known methods.


2019 ◽  
Author(s):  
Ammar Tareen ◽  
Justin B. Kinney

AbstractThe adoption of deep learning techniques in genomics has been hindered by the difficulty of mechanistically interpreting the models that these techniques produce. In recent years, a variety of post-hoc attribution methods have been proposed for addressing this neural network interpretability problem in the context of gene regulation. Here we describe a complementary way of approaching this problem. Our strategy is based on the observation that two large classes of biophysical models of cis-regulatory mechanisms can be expressed as deep neural networks in which nodes and weights have explicit physiochemical interpretations. We also demonstrate how such biophysical networks can be rapidly inferred, using modern deep learning frameworks, from the data produced by certain types of massively parallel reporter assays (MPRAs). These results suggest a scalable strategy for using MPRAs to systematically characterize the biophysical basis of gene regulation in a wide range of biological contexts. They also highlight gene regulation as a promising venue for the development of scientifically interpretable approaches to deep learning.


2020 ◽  
Author(s):  
James Lloyd McClelland ◽  
Matthew M. Botvinick

Recent years have seen an explosion of interest in deep learning and deep neural networks. Deep learning lies at the heart of unprecedented feats of machine intelligence as well as software people use every day. Systems built on deep learning have surpassed human capabilities in complex strategy games like go and chess, and we use them for speech recognition, image captioning, and a wide range of other applications. A consideration of deep learning is crucial for a Handbook of Human Memory, since human brains are deep neural networks, and an understanding of artificial deep learning systems may contribute to our understanding of how humans and animals learn and remember. Deep neural networks are complex, structured systems that process information in a parallel, distributed, and context sensitive fashion, and deep learning is the effort to use these systems to acquire capabilities we associate with intelligence through an experience dependent learning process. Within the field of Artificial Intelligence, work in deep learning is typically directed toward the goal of creating and understanding intelligence using all available tools and resources without consideration of their biological plausibility. Many of the ideas, however, at the heart of deep learning draw their inspiration from the brain and from characteristics of human intelligence we believe are best captured by these brain-inspired systems (Rumelhart, McClelland, and the PDP Research Group, 1986). Furthermore, ideas emerging from deep learning research can help inform us about memory and learning in humans and animals. Thus, deep learning research can be seen as fertile ground for engagement between researchers who work on related issues with implications for both biological and machine intelligence.We begin by introducing the basic constructs employed in deep learning and then consider several of the widely used learning paradigms and architectures used in these systems. We then turn to a consideration of how the constructs of deep learning relate to traditional constructs in the psychological literature on learning and memory. Next, we consider recent developments in the field of reinforcement learning that have broad implications for human learning and memory. We conclude with a consideration of areas where human capabilities still far exceed current deep learning approaches, and describe possible future directions toward understanding how these abilities might best be captured.


Sign in / Sign up

Export Citation Format

Share Document