Network as Regularization for Training Deep Neural Networks: Framework, Model and Performance

Kai Tian; Yi Xu; Jihong Guan; Shuigeng Zhou

doi:10.1609/aaai.v34i04.6063

Network as Regularization for Training Deep Neural Networks: Framework, Model and Performance

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6063 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6013-6020

Author(s):

Kai Tian ◽

Yi Xu ◽

Jihong Guan ◽

Shuigeng Zhou

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Convex Combination ◽

Network Training ◽

Class Level ◽

Framework Model ◽

Target Network ◽

Regularization Techniques ◽

And Performance ◽

Fitting Problem

Despite powerful representation ability, deep neural networks (DNNs) are prone to over-fitting, because of over-parametrization. Existing works have explored various regularization techniques to tackle the over-fitting problem. Some of them employed soft targets rather than one-hot labels to guide network training (e.g. label smoothing in classification tasks), which are called target-based regularization approaches in this paper. To alleviate the over-fitting problem, here we propose a new and general regularization framework that introduces an auxiliary network to dynamically incorporate guided semantic disturbance to the labels. We call it Network as Regularization (NaR in short). During training, the disturbance is constructed by a convex combination of the predictions of the target network and the auxiliary network. These two networks are initialized separately. And the auxiliary network is trained independently from the target network, while providing instance-level and class-level semantic information to the latter progressively. We conduct extensive experiments to validate the effectiveness of the proposed method. Experimental results show that NaR outperforms many state-of-the-art target-based regularization methods, and other regularization approaches (e.g. mixup) can also benefit from combining with NaR.

Download Full-text

Mobility-Included DNN Partition Offloading from Mobile Devices to Edge Clouds

Sensors ◽

10.3390/s21010229 ◽

2021 ◽

Vol 21 (1) ◽

pp. 229

Author(s):

Xianzhong Tian ◽

Juan Zhu ◽

Ting Xu ◽

Yanjun Li

Keyword(s):

Neural Networks ◽

Energy Consumption ◽

Mobile Devices ◽

Wireless Network ◽

Deep Neural Networks ◽

Mobile User ◽

Computation Offloading ◽

Long Latency ◽

Total Latency ◽

And Performance

The latest results in Deep Neural Networks (DNNs) have greatly improved the accuracy and performance of a variety of intelligent applications. However, running such computation-intensive DNN-based applications on resource-constrained mobile devices definitely leads to long latency and huge energy consumption. The traditional way is performing DNNs in the central cloud, but it requires significant amounts of data to be transferred to the cloud over the wireless network and also results in long latency. To solve this problem, offloading partial DNN computation to edge clouds has been proposed, to realize the collaborative execution between mobile devices and edge clouds. In addition, the mobility of mobile devices is easily to cause the computation offloading failure. In this paper, we develop a mobility-included DNN partition offloading algorithm (MDPO) to adapt to user’s mobility. The objective of MDPO is minimizing the total latency of completing a DNN job when the mobile user is moving. The MDPO algorithm is suitable for both DNNs with chain topology and graphic topology. We evaluate the performance of our proposed MDPO compared to local-only execution and edge-only execution, experiments show that MDPO significantly reduces the total latency and improves the performance of DNN, and MDPO can adjust well to different network conditions.

Download Full-text

Sentiment analysis with deep neural networks: comparative study and performance assessment

Artificial Intelligence Review ◽

10.1007/s10462-020-09845-2 ◽

2020 ◽

Vol 53 (8) ◽

pp. 6155-6195

Author(s):

Ramesh Wadawadagi ◽

Veerappa Pagi

Keyword(s):

Neural Networks ◽

Comparative Study ◽

Performance Assessment ◽

Sentiment Analysis ◽

Deep Neural Networks ◽

And Performance

Download Full-text

SNR: S queezing N umerical R ange Defuses Bit Error Vulnerability Surface in Deep Neural Networks

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477007 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-25

Author(s):

Elbruz Ozen ◽

Alex Orailoglu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Low Cost ◽

Error Resilience ◽

Error Rates ◽

Training Methods ◽

Performance Requirements ◽

Proper Design ◽

And Performance

As deep learning algorithms are widely adopted, an increasing number of them are positioned in embedded application domains with strict reliability constraints. The expenditure of significant resources to satisfy performance requirements in deep neural network accelerators has thinned out the margins for delivering safety in embedded deep learning applications, thus precluding the adoption of conventional fault tolerance methods. The potential of exploiting the inherent resilience characteristics of deep neural networks remains though unexplored, offering a promising low-cost path towards safety in embedded deep learning applications. This work demonstrates the possibility of such exploitation by juxtaposing the reduction of the vulnerability surface through the proper design of the quantization schemes with shaping the parameter distributions at each layer through the guidance offered by appropriate training methods, thus delivering deep neural networks of high resilience merely through algorithmic modifications. Unequaled error resilience characteristics can be thus injected into safety-critical deep learning applications to tolerate bit error rates of up to at absolutely zero hardware, energy, and performance costs while improving the error-free model accuracy even further.

Download Full-text

How Many Features is an Image Worth? Multi-Channel CNN for Steering Angle Prediction in Autonomous Vehicles

10.5121/csit.2021.111301 ◽

2021 ◽

Author(s):

Jason Munger ◽

Carlos W. Morato

Keyword(s):

Neural Networks ◽

Autonomous Vehicles ◽

Deep Neural Networks ◽

Spatial Information ◽

Image Data ◽

Input Image ◽

Steering Angle ◽

Rgb Images ◽

And Performance ◽

The Individual

This project explores how raw image data obtained from AV cameras can provide a model with more spatial information than can be learned from simple RGB images alone. This paper leverages the advances of deep neural networks to demonstrate steering angle predictions of autonomous vehicles through an end-to-end multi-channel CNN model using only the image data provided from an onboard camera. Image data is processed through existing neural networks to provide pixel segmentation and depth estimates and input to a new neural network along with the raw input image to provide enhanced feature signals from the environment. Various input combinations of Multi-Channel CNNs are evaluated, and their effectiveness is compared to single CNN networks using the individual data inputs. The model with the most accurate steering predictions is identified and performance compared to previous neural networks.

Download Full-text

THE USE OF CONTROL THEORY METHODS IN TRAINING NEURAL NETWORKS ON THE EXAMPLE OF TEETH RECOGNITION ON PANORAMIC X-RAY IMAGES

Automation technological and business processes ◽

10.15673/atbp.v13i2.2055 ◽

2021 ◽

Vol 13 (2) ◽

pp. 36-40

Author(s):

A. Smorodin

Keyword(s):

Neural Networks ◽

Control Theory ◽

Gradient Descent ◽

Deep Neural Networks ◽

Discrete Dynamical System ◽

Stochastic Gradient Descent ◽

Network Training ◽

Panoramic Images ◽

Important Field ◽

New Algorithms

The article investigated a modification of stochastic gradient descent (SGD), based on the previously developed stabilization theory of discrete dynamical system cycles. Relation between stabilization of cycles in discrete dynamical systems and finding extremum points allowed us to apply new control methods to accelerate gradient descent when approaching local minima. Gradient descent is often used in training deep neural networks on a par with other iterative methods. Two gradient SGD and Adam were experimented, and we conducted comparative experiments. All experiments were conducted during solving a practical problem of teeth recognition on 2-D panoramic images. Network training showed that the new method outperforms the SGD in its capabilities and as for parameters chosen it approaches the capabilities of Adam, which is a “state of the art” method. Thus, practical utility of using control theory in the training of deep neural networks and possibility of expanding its applicability in the process of creating new algorithms in this important field are shown.

Download Full-text

Network Approximation using Tensor Sketching

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/321 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shiva Prasad Kasiviswanathan ◽

Nina Narodytska ◽

Hongxia Jin

Keyword(s):

Neural Networks ◽

Language Processing ◽

Network Architecture ◽

Deep Neural Networks ◽

Network Architectures ◽

Effective Parameters ◽

Unified Framework ◽

Design Changes ◽

Target Network ◽

Fully Connected

Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a `smaller' network architecture that 'approximates' the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments.In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network.

Download Full-text

Sparse Deep Neural Network Optimization for Embedded Intelligence

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020600027 ◽

2020 ◽

Vol 29 (03n04) ◽

pp. 2060002

Author(s):

Jia Bi ◽

Steve R. Gunn

Keyword(s):

Neural Networks ◽

Network Optimization ◽

Deep Neural Networks ◽

Training Process ◽

Neural Network Optimization ◽

Convex Problems ◽

Fast Training ◽

Regularization Techniques ◽

Memory Resources ◽

Stochastic Element

Deep neural networks become more popular as its ability to solve very complex pattern recognition problems. However, deep neural networks often need massive computational and memory resources, which is main reason resulting them to be difficult efficiently and entirely running on embedded platforms. This work addresses this problem by saving the computational and memory requirements of deep neural networks by proposing a variance reduced (VR)-based optimization with regularization techniques to compress the requirements of memory of models within fast training process. It is shown theoretically and experimentally that sparsity-inducing regularization can be effectively worked with the VR-based optimization whereby in the optimizer the behaviors of the stochastic element is controlled by a hyper-parameter to solve non-convex problems.

Download Full-text

HLHLp: Quantized Neural Networks Training for Reaching Flat Minima in Loss Surface

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6035 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5784-5791

Author(s):

Sungho Shin ◽

Jinhwan Park ◽

Yoonho Boo ◽

Wonyong Sung

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

State Of The Art ◽

Fine Tuning ◽

Quantization Noise ◽

Quantization Scheme ◽

Performance Improvements ◽

Training Scheme ◽

Network Training ◽

Training Technique

Quantization of deep neural networks is extremely essential for efficient implementations. Low-precision networks are typically designed to represent original floating-point counterparts with high fidelity, and several elaborate quantization algorithms have been developed. We propose a novel training scheme for quantized neural networks to reach flat minima in the loss surface with the aid of quantization noise. The proposed training scheme employs high-low-high-low precision in an alternating manner for network training. The learning rate is also abruptly changed at each stage for coarse- or fine-tuning. With the proposed training technique, we show quite good performance improvements for convolutional neural networks when compared to the previous fine-tuning based quantization scheme. We achieve the state-of-the-art results for recurrent neural network based language modeling with 2-bit weight and activation.

Download Full-text

Bridging Finite Element and Machine Learning Modeling: Stress Prediction of Arterial Walls in Atherosclerosis

Journal of Biomechanical Engineering ◽

10.1115/1.4043290 ◽

2019 ◽

Vol 141 (8) ◽

Cited By ~ 8

Author(s):

Ali Madani ◽

Ahmed Bakhaty ◽

Jiwon Kim ◽

Yara Mubarak ◽

Mohammad R. K. Mofrad

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Finite Element ◽

Deep Neural Networks ◽

Prediction Models ◽

Plaque Rupture ◽

Training Data ◽

Von Mises Stress ◽

And Performance ◽

Von Mises

Finite element and machine learning modeling are two predictive paradigms that have rarely been bridged. In this study, we develop a parametric model to generate arterial geometries and accumulate a database of 12,172 2D finite element simulations modeling the hyperelastic behavior and resulting stress distribution. The arterial wall composition mimics vessels in atherosclerosis–a complex cardiovascular disease and one of the leading causes of death globally. We formulate the training data to predict the maximum von Mises stress, which could indicate risk of plaque rupture. Trained deep learning models are able to accurately predict the max von Mises stress within 9.86% error on a held-out test set. The deep neural networks outperform alternative prediction models and performance scales with amount of training data. Lastly, we examine the importance of contributing features on stress value and location prediction to gain intuitions on the underlying process. Moreover, deep neural networks can capture the functional mapping described by the finite element method, which has far-reaching implications for real-time and multiscale prediction tasks in biomechanics.

Download Full-text

Comparing the Visual Representations and Performance of Humans and Deep Neural Networks

Current Directions in Psychological Science ◽

10.1177/0963721418801342 ◽

2018 ◽

Vol 28 (1) ◽

pp. 34-39 ◽

Cited By ~ 5

Author(s):

Robert A. Jacobs ◽

Christopher J. Bates

Keyword(s):

Neural Networks ◽

Visual Cues ◽

Deep Neural Networks ◽

Three Dimensional ◽

Visual Representations ◽

Dimensional Structure ◽

Processing Strategies ◽

Psychological Models ◽

Statistical Regularities ◽

And Performance

Although deep neural networks (DNNs) are state-of-the-art artificial intelligence systems, it is unclear what insights, if any, they provide about human intelligence. We address this issue in the domain of visual perception. After briefly describing DNNs, we provide an overview of recent results comparing human visual representations and performance with those of DNNs. In many cases, DNNs acquire visual representations and processing strategies that are very different from those used by people. We conjecture that there are at least two factors preventing them from serving as better psychological models. First, DNNs are currently trained with impoverished data, such as data lacking important visual cues to three-dimensional structure, data lacking multisensory statistical regularities, and data in which stimuli are unconnected to an observer’s actions and goals. Second, DNNs typically lack adaptations to capacity limits, such as attentional mechanisms, visual working memory, and compressed mental representations biased toward preserving task-relevant abstractions.

Download Full-text