The HSIC Bottleneck: Deep Learning without Back-Propagation

Wan-Duo Kurt Ma; J. P. Lewis; W. Bastiaan Kleijn

doi:10.1609/aaai.v34i04.5950

The HSIC Bottleneck: Deep Learning without Back-Propagation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5950 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5085-5092 ◽

Cited By ~ 1

Author(s):

Wan-Duo Kurt Ma ◽

J. P. Lewis ◽

W. Bastiaan Kleijn

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Back Propagation ◽

Single Layer ◽

Cross Entropy ◽

Entropy Loss ◽

Deep Networks ◽

Independence Criterion

We introduce the HSIC (Hilbert-Schmidt independence criterion) bottleneck for training deep neural networks. The HSIC bottleneck is an alternative to the conventional cross-entropy loss and backpropagation that has a number of distinct advantages. It mitigates exploding and vanishing gradients, resulting in the ability to learn very deep networks without skip connections. There is no requirement for symmetric feedback or update locking. We find that the HSIC bottleneck provides performance on MNIST/FashionMNIST/CIFAR10 classification comparable to backpropagation with a cross-entropy target, even when the system is not encouraged to make the output resemble the classification labels. Appending a single layer trained with SGD (without backpropagation) to reformat the information further improves performance.

Download Full-text

Syntactic Structure from Deep Learning

Annual Review of Linguistics ◽

10.1146/annurev-linguistics-032020-051035 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Tal Linzen ◽

Marco Baroni

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Machine Translation ◽

Online Publication ◽

Deep Neural Networks ◽

Syntactic Structure ◽

Annual Review ◽

Publication Date ◽

Grammatical Knowledge ◽

Deep Networks

Modern deep neural networks achieve impressive performance in engineering applications that require extensive linguistic skills, such as machine translation. This success has sparked interest in probing whether these models are inducing human-like grammatical knowledge from the raw data they are exposed to and, consequently, whether they can shed new light on long-standing debates concerning the innate structure necessary for language acquisition. In this article, we survey representative studies of the syntactic abilities of deep networks and discuss the broader implications that this work has for theoretical linguistics. Expected final online publication date for the Annual Review of Linguistics, Volume 7 is January 14, 2021. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.

Download Full-text

Stochastic Loss Function

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5925 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4884-4891

Author(s):

Qingliang Liu ◽

Jinmei Lai

Keyword(s):

Neural Networks ◽

Loss Function ◽

Real World ◽

Optimization Problem ◽

Deep Neural Networks ◽

Back Propagation ◽

Loss Functions ◽

Joint Optimization ◽

Neural Machine Translation ◽

Deep Networks

Training deep neural networks is inherently subject to the predefined and fixed loss functions during optimizing. To improve learning efficiency, we develop Stochastic Loss Function (SLF) to dynamically and automatically generating appropriate gradients to train deep networks in the same round of back-propagation, while maintaining the completeness and differentiability of the training pipeline. In SLF, a generic loss function is formulated as a joint optimization problem of network weights and loss parameters. In order to guarantee the requisite efficiency, gradients with the respect to the generic differentiable loss are leveraged for selecting loss function and optimizing network weights. Extensive experiments on a variety of popular datasets strongly demonstrate that SLF is capable of obtaining appropriate gradients at different stages during training, and can significantly improve the performance of various deep models on real world tasks including classification, clustering, regression, neural machine translation, and objection detection.

Download Full-text

Effects of depth, width, and initialization: A convergence analysis of layer-wise training for deep linear neural networks

Analysis and Applications ◽

10.1142/s0219530521500263 ◽

2021 ◽

pp. 1-47

Author(s):

Yeonjong Shin

Keyword(s):

Neural Networks ◽

Convergence Analysis ◽

Deep Neural Networks ◽

Computational Cost ◽

Back Propagation ◽

Single Layer ◽

Intermediate Layers ◽

Machine Learning Applications ◽

Gradient Based ◽

Accelerate Convergence

Deep neural networks have been used in various machine learning applications and achieved tremendous empirical successes. However, training deep neural networks is a challenging task. Many alternatives have been proposed in place of end-to-end back-propagation. Layer-wise training is one of them, which trains a single layer at a time, rather than trains the whole layers simultaneously. In this paper, we study a layer-wise training using a block coordinate gradient descent (BCGD) for deep linear networks. We establish a general convergence analysis of BCGD and found the optimal learning rate, which results in the fastest decrease in the loss. We identify the effects of depth, width, and initialization. When the orthogonal-like initialization is employed, we show that the width of intermediate layers plays no role in gradient-based training beyond a certain threshold. Besides, we found that the use of deep networks could drastically accelerate convergence when it is compared to those of a depth 1 network, even when the computational cost is considered. Numerical examples are provided to justify our theoretical findings and demonstrate the performance of layer-wise training by BCGD.

Download Full-text

Literature Review of Deep Network Compression

Informatics ◽

10.3390/informatics8040077 ◽

2021 ◽

Vol 8 (4) ◽

pp. 77

Author(s):

Ali Alqahtani ◽

Xianghua Xie ◽

Mark W. Jones

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Low Rank ◽

Vast Number ◽

Deep Networks ◽

Factorization Methods ◽

Network Compression ◽

Rank Factorization ◽

Pruning Methods

Deep networks often possess a vast number of parameters, and their significant redundancy in parameterization has become a widely-recognized property. This presents significant challenges and restricts many deep learning applications, making the focus on reducing the complexity of models while maintaining their powerful performance. In this paper, we present an overview of popular methods and review recent works on compressing and accelerating deep neural networks. We consider not only pruning methods but also quantization methods, and low-rank factorization methods. This review also intends to clarify these major concepts, and highlights their characteristics, advantages, and shortcomings.

Download Full-text

Competitive Cross-Entropy Loss: A Study on Training Single-Layer Neural Networks for Solving Nonlinearly Separable Classification Problems

Neural Processing Letters ◽

10.1007/s11063-018-9906-5 ◽

2018 ◽

Vol 50 (2) ◽

pp. 1115-1122

Author(s):

Kamaledin Ghiasi-Shirazi

Keyword(s):

Neural Networks ◽

Single Layer ◽

Cross Entropy ◽

Classification Problems ◽

Entropy Loss

Download Full-text

Improved Categorical Cross-Entropy Loss for Training Deep Neural Networks with Noisy Labels

10.1007/978-3-030-88013-2_7 ◽

2021 ◽

pp. 78-89

Author(s):

Panle Li ◽

Xiaohui He ◽

Dingjun Song ◽

Zihao Ding ◽

Mengjia Qiao ◽

...

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Cross Entropy ◽

Entropy Loss ◽

Noisy Labels

Download Full-text

Deep Learning with Taxonomic Loss for Plant Identification

Computational Intelligence and Neuroscience ◽

10.1155/2019/2015017 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8

Author(s):

Danzi Wu ◽

Xue Han ◽

Guan Wang ◽

Yu Sun ◽

Haiyan Zhang ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Cross Entropy ◽

Classification Task ◽

Plant Identification ◽

Entropy Loss ◽

Fine Grained ◽

Performance Improvements ◽

The Hierarchical Structure ◽

Significant Performance

Plant identification is a fine-grained classification task which aims to identify the family, genus, and species according to plant appearance features. Inspired by the hierarchical structure of taxonomic tree, the taxonomic loss was proposed, which could encode the hierarchical relationships among multilevel labels into the deep learning objective function by simple group and sum operation. By training various neural networks on PlantCLEF 2015 and PlantCLEF 2017 datasets, the experimental results demonstrated that the proposed loss function was easy to implement and outperformed the most commonly adopted cross-entropy loss. Eight neural networks were trained, respectively, by two different loss functions on PlantCLEF 2015 dataset, and the models trained by taxonomic loss led to significant performance improvements. On PlantCLEF 2017 dataset with 10,000 species, the SENet-154 model trained by taxonomic loss achieved the accuracies of 84.07%, 79.97%, and 73.61% at family, genus and species levels, which improved those of model trained by cross-entropy loss by 2.23%, 1.34%, and 1.08%, respectively. The taxonomic loss could further facilitate the fine-grained classification task with hierarchical labels.

Download Full-text

Automatic Detection of Arrhythmia Based on Multi-Resolution Representation of ECG Signal

Sensors ◽

10.3390/s20061579 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1579

Author(s):

Dongqi Wang ◽

Qinghua Meng ◽

Dongming Chen ◽

Hupo Zhang ◽

Lisheng Xu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Channel Model ◽

Expert Knowledge ◽

Automatic Detection ◽

Data Representation ◽

Learning Technology ◽

Arrhythmia Detection ◽

Automatic Feature Extraction

Automatic detection of arrhythmia is of great significance for early prevention and diagnosis of cardiovascular disease. Traditional feature engineering methods based on expert knowledge lack multidimensional and multi-view information abstraction and data representation ability, so the traditional research on pattern recognition of arrhythmia detection cannot achieve satisfactory results. Recently, with the increase of deep learning technology, automatic feature extraction of ECG data based on deep neural networks has been widely discussed. In order to utilize the complementary strength between different schemes, in this paper, we propose an arrhythmia detection method based on the multi-resolution representation (MRR) of ECG signals. This method utilizes four different up to date deep neural networks as four channel models for ECG vector representations learning. The deep learning based representations, together with hand-crafted features of ECG, forms the MRR, which is the input of the downstream classification strategy. The experimental results of big ECG dataset multi-label classification confirm that the F1 score of the proposed method is 0.9238, which is 1.31%, 0.62%, 1.18% and 0.6% higher than that of each channel model. From the perspective of architecture, this proposed method is highly scalable and can be employed as an example for arrhythmia recognition.

Download Full-text

Deep neural networks using a single neuron: folded-in-time architecture using feedback-modulated delay loops

Nature Communications ◽

10.1038/s41467-021-25427-4 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Florian Stelzer ◽

André Röhm ◽

Raul Vicente ◽

Ingo Fischer ◽

Serhiy Yanchuk

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Network ◽

Single Neuron ◽

Deep Neural Networks ◽

Back Propagation ◽

Local Network ◽

Multiple Time ◽

Learning Tools ◽

Back Propagation Algorithm

AbstractDeep neural networks are among the most widely applied machine learning tools showing outstanding performance in a broad range of tasks. We present a method for folding a deep neural network of arbitrary size into a single neuron with multiple time-delayed feedback loops. This single-neuron deep neural network comprises only a single nonlinearity and appropriately adjusted modulations of the feedback signals. The network states emerge in time as a temporal unfolding of the neuron’s dynamics. By adjusting the feedback-modulation within the loops, we adapt the network’s connection weights. These connection weights are determined via a back-propagation algorithm, where both the delay-induced and local network connections must be taken into account. Our approach can fully represent standard Deep Neural Networks (DNN), encompasses sparse DNNs, and extends the DNN concept toward dynamical systems implementations. The new method, which we call Folded-in-time DNN (Fit-DNN), exhibits promising performance in a set of benchmark tasks.

Download Full-text

Enabling deeper learning on big data for materials informatics applications

Scientific Reports ◽

10.1038/s41598-021-83193-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dipendra Jha ◽

Vishu Gupta ◽

Logan Ward ◽

Zijiang Yang ◽

Christopher Wolverton ◽

...

Keyword(s):

Neural Networks ◽

Big Data ◽

Deep Learning ◽

Deep Neural Networks ◽

Materials Science ◽

Prediction Models ◽

Model Performance ◽

Materials Informatics ◽

Learning Framework ◽

Significant Attention

AbstractThe application of machine learning (ML) techniques in materials science has attracted significant attention in recent years, due to their impressive ability to efficiently extract data-driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector-based materials representation as input to build accurate property prediction models. We find that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.

Download Full-text