Communication-Efficient Stochastic Gradient MCMC for Neural Networks

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014173 ◽

2019 ◽

Vol 33 ◽

pp. 4173-4180 ◽

Cited By ~ 1

Author(s):

Chunyuan Li ◽

Changyou Chen ◽

Yunchen Pu ◽

Ricardo Henao ◽

Lawrence Carin

Keyword(s):

Neural Networks ◽

Probability Distributions ◽

Computational Cost ◽

Time Estimation ◽

Stochastic Gradient ◽

Communication Overhead ◽

Test Accuracy ◽

Training Time ◽

Learning Probability ◽

Policy Optimization

Learning probability distributions on the weights of neural networks has recently proven beneficial in many applications. Bayesian methods such as Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) offer an elegant framework to reason about model uncertainty in neural networks. However, these advantages usually come with a high computational cost. We propose accelerating SG-MCMC under the masterworker framework: workers asynchronously and in parallel share responsibility for gradient computations, while the master collects the final samples. To reduce communication overhead, two protocols (downpour and elastic) are developed to allow periodic interaction between the master and workers. We provide a theoretical analysis on the finite-time estimation consistency of posterior expectations, and establish connections to sample thinning. Our experiments on various neural networks demonstrate that the proposed algorithms can greatly reduce training time while achieving comparable (or better) test accuracy/log-likelihood levels, relative to traditional SG-MCMC. When applied to reinforcement learning, it naturally provides exploration for asynchronous policy optimization, with encouraging performance improvement.

Download Full-text

Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement

Biomimetics ◽

10.3390/biomimetics5010001 ◽

2019 ◽

Vol 5 (1) ◽

pp. 1 ◽

Cited By ~ 1

Author(s):

Michelle Gutiérrez-Muñoz ◽

Astryd González-Salazar ◽

Marvin Coto-Jiménez

Keyword(s):

Neural Networks ◽

Short Term Memory ◽

Computational Cost ◽

Real Life ◽

Fixed Number ◽

Training Procedure ◽

Statistical Validation ◽

Significant Drop ◽

Training Time ◽

Important Solution

Speech signals are degraded in real-life environments, as a product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions. To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combinations of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation was made based on quality measurements of the signal’s spectrum, the training time of the networks, and statistical validation of results. In total, 120 artificial neural networks of eight different types were trained and compared. The results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, given that reduction in training time is on the order of 30%, in processes that can normally take several days or weeks, depending on the amount of data. The results also present advantages in efficiency, but without a significant drop in quality.

Download Full-text

Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement

10.20944/preprints201910.0376.v1 ◽

2019 ◽

Author(s):

Michelle Gutiérrez-Muñoz ◽

Astryd González-Salazar ◽

Marvin Coto-Jiménez

Keyword(s):

Neural Networks ◽

Short Term Memory ◽

Computational Cost ◽

Real Life ◽

Fixed Number ◽

Training Procedure ◽

Statistical Validation ◽

Training Time ◽

Important Solution ◽

Reverberant Speech

Speech signals are degraded in real-life environments, product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions.To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long and short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combination of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation has been made based on quality measurements of the signal's spectrum, training time of the networks and statistical validation of results. Results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, with advantages in efficiency, but without a significan drop in quality.

Download Full-text

Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015693 ◽

2019 ◽

Vol 33 ◽

pp. 5693-5700 ◽

Cited By ~ 16

Author(s):

Hao Yu ◽

Sen Yang ◽

Shenghuo Zhu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Model Averaging ◽

Communication Overhead ◽

Single Server ◽

Training Time ◽

Distributed Training ◽

Speed Up ◽

Experimental Works ◽

Single Worker

In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with the averaged gradient. Ideally, parallel mini-batch SGD can achieve a linear speed-up of the training time (with respect to the number of workers) compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for gradient communication as more workers are involved. Model averaging, which periodically averages individual models trained over parallel workers, is another common practice used for distributed training of deep neural networks since (Zinkevich et al. 2010) (McDonald, Hall, and Mann 2010). Compared with parallel mini-batch SGD, the communication overhead of model averaging is significantly reduced. Impressively, tremendous experimental works have verified that model averaging can still achieve a good speed-up of the training time as long as the averaging interval is carefully controlled. However, it remains a mystery in theory why such a simple heuristic works so well. This paper provides a thorough and rigorous theoretical study on why model averaging can work as well as parallel mini-batch SGD with significantly less communication overhead.

Download Full-text

Apple quality identification and classification by image processing based on convolutional neural networks

Scientific Reports ◽

10.1038/s41598-021-96103-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yanfei Li ◽

Xianying Feng ◽

Yandong Liu ◽

Xingchang Han

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Support Vector ◽

Svm Classifier ◽

Test Accuracy ◽

Training Time ◽

Proposed Model ◽

Specific Complex ◽

Occurrence Matrix ◽

Apple Quality

AbstractThis work researched apple quality identification and classification from real images containing complicated disturbance information (background was similar to the surface of the apples). This paper proposed a novel model based on convolutional neural networks (CNN) which aimed at accurate and fast grading of apple quality. Specific, complex, and useful image characteristics for detection and classification were captured by the proposed model. Compared with existing methods, the proposed model could better learn high-order features of two adjacent layers that were not in the same channel but were very related. The proposed model was trained and validated, with best training and validation accuracy of 99% and 98.98% at 2590th and 3000th step, respectively. The overall accuracy of the proposed model tested using an independent 300 apple dataset was 95.33%. The results showed that the training accuracy, overall test accuracy and training time of the proposed model were better than Google Inception v3 model and traditional imaging process method based on histogram of oriented gradient (HOG), gray level co-occurrence matrix (GLCM) features merging and support vector machine (SVM) classifier. The proposed model has great potential in Apple’s quality detection and classification.

Download Full-text

Malware Classification Based on Shallow Neural Network

Future Internet ◽

10.3390/fi12120219 ◽

2020 ◽

Vol 12 (12) ◽

pp. 219

Author(s):

Pin Yang ◽

Huiyu Zhou ◽

Yue Zhu ◽

Liang Liu ◽

Lei Zhang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Computational Cost ◽

Malicious Code ◽

Classification Model ◽

Evolutionary Trend ◽

Training Time ◽

Binary File ◽

Malware Classification ◽

N Gram

The emergence of a large number of new malicious code poses a serious threat to network security, and most of them are derivative versions of existing malicious code. The classification of malicious code is helpful to analyze the evolutionary trend of malicious code families and trace the source of cybercrime. The existing methods of malware classification emphasize the depth of the neural network, which has the problems of a long training time and large computational cost. In this work, we propose the shallow neural network-based malware classifier (SNNMAC), a malware classification model based on shallow neural networks and static analysis. Our approach bridges the gap between precise but slow methods and fast but less precise methods in existing works. For each sample, we first generate n-grams from their opcode sequences of the binary file with a decompiler. An improved n-gram algorithm based on control transfer instructions is designed to reduce the n-gram dataset. Then, the SNNMAC exploits a shallow neural network, replacing the full connection layer and softmax with the average pooling layer and hierarchical softmax, to learn from the dataset and perform classification. We perform experiments on the Microsoft malware dataset. The evaluation result shows that the SNNMAC outperforms most of the related works with 99.21% classification precision and reduces the training time by more than half when compared with the methods using DNN (Deep Neural Networks).

Download Full-text

Machine learning and quantum devices

SciPost Physics Lecture Notes ◽

10.21468/scipostphyslectnotes.29 ◽

2021 ◽

Author(s):

Florian Marquardt

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Quantum Physics ◽

Quantum Information Processing ◽

Control Strategies ◽

Probability Distributions ◽

Boltzmann Machines ◽

Convolutional Networks ◽

Quantum Domain ◽

Learning Probability

These brief lecture notes cover the basics of neural networks and deep learning as well as their applications in the quantum domain, for physicists without prior knowledge. In the first part, we describe training using backpropagation, image classification, convolutional networks and autoencoders. The second part is about advanced techniques like reinforce-ment learning (for discovering control strategies), recurrent neural networks (for analyz-ing time traces), and Boltzmann machines (for learning probability distributions). In the third lecture, we discuss first recent applications to quantum physics, with an emphasis on quantum information processing machines. Finally, the fourth lecture is devoted to the promise of using quantum effects to accelerate machine learning.

Download Full-text

Implications of Pooling Strategies in Convolutional Neural Networks: A Deep Insight

Foundations of Computing and Decision Sciences ◽

10.2478/fcds-2019-0016 ◽

2019 ◽

Vol 44 (3) ◽

pp. 303-330 ◽

Cited By ~ 3

Author(s):

Shallu Sharma ◽

Rajesh Mehra

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Convolutional Neural Networks ◽

Network Architecture ◽

Computational Cost ◽

Activation Function ◽

Training Time ◽

Pooling Strategies ◽

Deep Cnn ◽

And Training

Abstract Convolutional neural networks (CNN) is a contemporary technique for computer vision applications, where pooling implies as an integral part of the deep CNN. Besides, pooling provides the ability to learn invariant features and also acts as a regularizer to further reduce the problem of overfitting. Additionally, the pooling techniques significantly reduce the computational cost and training time of networks which are equally important to consider. Here, the performances of pooling strategies on different datasets are analyzed and discussed qualitatively. This study presents a detailed review of the conventional and the latest strategies which would help in appraising the readers with the upsides and downsides of each strategy. Also, we have identified four fundamental factors namely network architecture, activation function, overlapping and regularization approaches which immensely affect the performance of pooling operations. It is believed that this work would help in extending the scope of understanding the significance of CNN along with pooling regimes for solving computer vision problems.

Download Full-text

An Innovative Multi-Model Neural Network Approach for Feature Selection in Emotion Recognition Using Deep Feature Clustering

Sensors ◽

10.3390/s20133765 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3765 ◽

Cited By ~ 2

Author(s):

Muhammad Adeel Asghar ◽

Muhammad Jamil Khan ◽

Muhammad Rizwan ◽

Raja Majid Mehmood ◽

Sun-Hee Kim

Keyword(s):

Neural Networks ◽

Feature Selection ◽

Emotion Recognition ◽

Computational Cost ◽

Emotional Awareness ◽

Eeg Signal ◽

Neural Network Approach ◽

Feature Clustering ◽

Training Time ◽

Deep Feature

Emotional awareness perception is a largely growing field that allows for more natural interactions between people and machines. Electroencephalography (EEG) has emerged as a convenient way to measure and track a user’s emotional state. The non-linear characteristic of the EEG signal produces a high-dimensional feature vector resulting in high computational cost. In this paper, characteristics of multiple neural networks are combined using Deep Feature Clustering (DFC) to select high-quality attributes as opposed to traditional feature selection methods. The DFC method shortens the training time on the network by omitting unusable attributes. First, Empirical Mode Decomposition (EMD) is applied as a series of frequencies to decompose the raw EEG signal. The spatiotemporal component of the decomposed EEG signal is expressed as a two-dimensional spectrogram before the feature extraction process using Analytic Wavelet Transform (AWT). Four pre-trained Deep Neural Networks (DNN) are used to extract deep features. Dimensional reduction and feature selection are achieved utilising the differential entropy-based EEG channel selection and the DFC technique, which calculates a range of vocabularies using k-means clustering. The histogram characteristic is then determined from a series of visual vocabulary items. The classification performance of the SEED, DEAP and MAHNOB datasets combined with the capabilities of DFC show that the proposed method improves the performance of emotion recognition in short processing time and is more competitive than the latest emotion recognition methods.

Download Full-text

Damped Newton Stochastic Gradient Descent Method for Neural Networks Training

Mathematics ◽

10.3390/math9131533 ◽

2021 ◽

Vol 9 (13) ◽

pp. 1533

Author(s):

Jingcheng Zhou ◽

Wei Wei ◽

Ruizhi Zhang ◽

Zhiming Zheng

Keyword(s):

Neural Networks ◽

Gradient Descent ◽

Hessian Matrix ◽

Second Order ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Gradient Descent Method ◽

Classification Problems ◽

Training Time ◽

Second Order Methods

First-order methods such as stochastic gradient descent (SGD) have recently become popular optimization methods to train deep neural networks (DNNs) for good generalization; however, they need a long training time. Second-order methods which can lower the training time are scarcely used on account of their overpriced computing cost to obtain the second-order information. Thus, many works have approximated the Hessian matrix to cut the cost of computing while the approximate Hessian matrix has large deviation. In this paper, we explore the convexity of the Hessian matrix of partial parameters and propose the damped Newton stochastic gradient descent (DN-SGD) method and stochastic gradient descent damped Newton (SGD-DN) method to train DNNs for regression problems with mean square error (MSE) and classification problems with cross-entropy loss (CEL). In contrast to other second-order methods for estimating the Hessian matrix of all parameters, our methods only accurately compute a small part of the parameters, which greatly reduces the computational cost and makes the convergence of the learning process much faster and more accurate than SGD and Adagrad. Several numerical experiments on real datasets were performed to verify the effectiveness of our methods for regression and classification problems.

Download Full-text

Remaining Useful Life Estimation for Engineered Systems Operating under Uncertainty with Causal GraphNets

Sensors ◽

10.3390/s21196325 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6325

Author(s):

Charilaos Mylonas ◽

Eleni Chatzi

Keyword(s):

Neural Networks ◽

Message Passing ◽

Probability Distributions ◽

Causal Structure ◽

Recurrent Network ◽

Remaining Useful Life ◽

Novel Approach ◽

Useful Life ◽

Engineered Systems ◽

Learning Probability

In this work, a novel approach, termed GNN-tCNN, is presented for the construction and training of Remaining Useful Life (RUL) models. The method exploits Graph Neural Networks (GNNs) and deals with the problem of efficiently learning from time series with non-equidistant observations, which may span multiple temporal scales. The efficacy of the method is demonstrated on a simulated stochastic degradation dataset and on a real-world accelerated life testing dataset for ball-bearings. The proposed method learns a model that describes the evolution of the system implicitly rather than at the raw observation level and is based on message-passing neural networks, which encode the irregularly sampled causal structure. The proposed approach is compared to a recurrent network with a temporal convolutional feature extractor head (LSTM-tCNN), which forms a viable alternative for the problem considered. Finally, by taking advantage of recent advances in the computation of reparametrization gradients for learning probability distributions, a simple, yet efficient, technique is employed for representing prediction uncertainty as a gamma distribution over RUL predictions.

Download Full-text