scholarly journals The HSIC Bottleneck: Deep Learning without Back-Propagation

2020 ◽  
Vol 34 (04) ◽  
pp. 5085-5092 ◽  
Author(s):  
Wan-Duo Kurt Ma ◽  
J. P. Lewis ◽  
W. Bastiaan Kleijn

We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance.

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Tal Linzen ◽  
Marco Baroni

Modern deep neural networks achieve impressive performance in engineering applications that require extensive linguistic skills, such as machine translation. This success has sparked interest in probing whether these models are inducing human-like grammatical knowledge from the raw data they are exposed to and, consequently, whether they can shed new light on long-standing debates concerning the innate structure necessary for language acquisition. In this article, we survey representative studies of the syntactic abilities of deep networks and discuss the broader implications that this work has for theoretical linguistics. Expected final online publication date for the Annual Review of Linguistics, Volume 7 is January 14, 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.


2020 ◽  
Vol 34 (04) ◽  
pp. 4884-4891
Author(s):  
Qingliang Liu ◽  
Jinmei Lai

Training deep neural networks is inherently subject to the predefined and fixed loss functions during optimizing. To improve learning efficiency, we develop Stochastic Loss Function (SLF) to dynamically and automatically generating appropriate gradients to train deep networks in the same round of back-propagation, while maintaining the completeness and differentiability of the training pipeline. In SLF, a generic loss function is formulated as a joint optimization problem of network weights and loss parameters. In order to guarantee the requisite efficiency, gradients with the respect to the generic differentiable loss are leveraged for selecting loss function and optimizing network weights. Extensive experiments on a variety of popular datasets strongly demonstrate that SLF is capable of obtaining appropriate gradients at different stages during training, and can significantly improve the performance of various deep models on real world tasks including classification, clustering, regression, neural machine translation, and objection detection.


2021 ◽  
pp. 1-47
Author(s):  
Yeonjong Shin

Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep neural networks is a challenging task. Many alternatives have been proposed in place of end-to-end back-propagation. Layer-wise training is one of them, which trains a single layer at a time, rather than trains the whole layers simultaneously. In this paper, we study a layer-wise training using a block coordinate gradient descent (BCGD) for deep linear networks. We establish a general convergence analysis of BCGD and found the optimal learning rate, which results in the fastest decrease in the loss. We identify the effects of depth, width, and initialization. When the orthogonal-like initialization is employed, we show that the width of intermediate layers plays no role in gradient-based training beyond a certain threshold. Besides, we found that the use of deep networks could drastically accelerate convergence when it is compared to those of a depth 1 network, even when the computational cost is considered. Numerical examples are provided to justify our theoretical findings and demonstrate the performance of layer-wise training by BCGD.


Informatics ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 77
Author(s):  
Ali Alqahtani ◽  
Xianghua Xie ◽  
Mark W. Jones

Deep networks often possess a vast number of parameters, and their significant redundancy in parameterization has become a widely-recognized property. This presents significant challenges and restricts many deep learning applications, making the focus on reducing the complexity of models while maintaining their powerful performance. In this paper, we present an overview of popular methods and review recent works on compressing and accelerating deep neural networks. We consider not only pruning methods but also quantization methods, and low-rank factorization methods. This review also intends to clarify these major concepts, and highlights their characteristics, advantages, and shortcomings.


2021 ◽  
pp. 78-89
Author(s):  
Panle Li ◽  
Xiaohui He ◽  
Dingjun Song ◽  
Zihao Ding ◽  
Mengjia Qiao ◽  
...  

2019 ◽  
Vol 2019 ◽  
pp. 1-8
Author(s):  
Danzi Wu ◽  
Xue Han ◽  
Guan Wang ◽  
Yu Sun ◽  
Haiyan Zhang ◽  
...  

Plant identification is a fine-grained classification task which aims to identify the family, genus, and species according to plant appearance features. Inspired by the hierarchical structure of taxonomic tree, the taxonomic loss was proposed, which could encode the hierarchical relationships among multilevel labels into the deep learning objective function by simple group and sum operation. By training various neural networks on PlantCLEF 2015 and PlantCLEF 2017 datasets, the experimental results demonstrated that the proposed loss function was easy to implement and outperformed the most commonly adopted cross-entropy loss. Eight neural networks were trained, respectively, by two different loss functions on PlantCLEF 2015 dataset, and the models trained by taxonomic loss led to significant performance improvements. On PlantCLEF 2017 dataset with 10,000 species, the SENet-154 model trained by taxonomic loss achieved the accuracies of 84.07%, 79.97%, and 73.61% at family, genus and species levels, which improved those of model trained by cross-entropy loss by 2.23%, 1.34%, and 1.08%, respectively. The taxonomic loss could further facilitate the fine-grained classification task with hierarchical labels.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1579
Author(s):  
Dongqi Wang ◽  
Qinghua Meng ◽  
Dongming Chen ◽  
Hupo Zhang ◽  
Lisheng Xu

Automatic detection of arrhythmia is of great significance for early prevention and diagnosis of cardiovascular disease. Traditional feature engineering methods based on expert knowledge lack multidimensional and multi-view information abstraction and data representation ability, so the traditional research on pattern recognition of arrhythmia detection cannot achieve satisfactory results. Recently, with the increase of deep learning technology, automatic feature extraction of ECG data based on deep neural networks has been widely discussed. In order to utilize the complementary strength between different schemes, in this paper, we propose an arrhythmia detection method based on the multi-resolution representation (MRR) of ECG signals. This method utilizes four different up to date deep neural networks as four channel models for ECG vector representations learning. The deep learning based representations, together with hand-crafted features of ECG, forms the MRR, which is the input of the downstream classification strategy. The experimental results of big ECG dataset multi-label classification confirm that the F1 score of the proposed method is 0.9238, which is 1.31%, 0.62%, 1.18% and 0.6% higher than that of each channel model. From the perspective of architecture, this proposed method is highly scalable and can be employed as an example for arrhythmia recognition.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Florian Stelzer ◽  
André Röhm ◽  
Raul Vicente ◽  
Ingo Fischer ◽  
Serhiy Yanchuk

AbstractDeep neural networks are among the most widely applied machine learning tools showing outstanding performance in a broad range of tasks. We present a method for folding a deep neural network of arbitrary size into a single neuron with multiple time-delayed feedback loops. This single-neuron deep neural network comprises only a single nonlinearity and appropriately adjusted modulations of the feedback signals. The network states emerge in time as a temporal unfolding of the neuron’s dynamics. By adjusting the feedback-modulation within the loops, we adapt the network’s connection weights. These connection weights are determined via a back-propagation algorithm, where both the delay-induced and local network connections must be taken into account. Our approach can fully represent standard Deep Neural Networks (DNN), encompasses sparse DNNs, and extends the DNN concept toward dynamical systems implementations. The new method, which we call Folded-in-time DNN (Fit-DNN), exhibits promising performance in a set of benchmark tasks.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Dipendra Jha ◽  
Vishu Gupta ◽  
Logan Ward ◽  
Zijiang Yang ◽  
Christopher Wolverton ◽  
...  

AbstractThe application of machine learning (ML) techniques in materials science has attracted significant attention in recent years, due to their impressive ability to efficiently extract data-driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector-based materials representation as input to build accurate property prediction models. We find that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.


Sign in / Sign up

Export Citation Format

Share Document