Evaluating Dropout Placements in Bayesian Regression Resnet

Abstract Deep Neural Networks (DNNs) have shown great success in many fields. Various network architectures have been developed for different applications. Regardless of the complexities of the networks, DNNs do not provide model uncertainty. Bayesian Neural Networks (BNNs), on the other hand, is able to make probabilistic inference. Among various types of BNNs, Dropout as a Bayesian Approximation converts a Neural Network (NN) to a BNN by adding a dropout layer after each weight layer in the NN. This technique provides a simple transformation from a NN to a BNN. However, for DNNs, adding a dropout layer to each weight layer would lead to a strong regularization due to the deep architecture. Previous researches [1, 2, 3] have shown that adding a dropout layer after each weight layer in a DNN is unnecessary. However, how to place dropout layers in a ResNet for regression tasks are less explored. In this work, we perform an empirical study on how different dropout placements would affect the performance of a Bayesian DNN. We use a regression model modified from ResNet as the DNN and place the dropout layers at different places in the regression ResNet. Our experimental results show that it is not necessary to add a dropout layer after every weight layer in the Regression ResNet to let it be able to make Bayesian Inference. Placing Dropout layers between the stacked blocks i.e. Dense+Identity+Identity blocks has the best performance in Predictive Interval Coverage Probability (PICP). Placing a dropout layer after each stacked block has the best performance in Root Mean Square Error (RMSE).

Download Full-text

State-Space Representations of Deep Neural Networks

Neural Computation ◽

10.1162/neco_a_01165 ◽

2019 ◽

Vol 31 (3) ◽

pp. 538-554

Author(s):

Michael Hauser ◽

Sean Gunn ◽

Samer Saab ◽

Asok Ray

Keyword(s):

Neural Networks ◽

State Space ◽

Deep Neural Networks ◽

The State ◽

Network Architectures ◽

Embedding Dimension ◽

Dynamical Equations ◽

Closed Form Solutions ◽

Dense Networks ◽

Finite Difference Equations

This letter deals with neural networks as dynamical systems governed by finite difference equations. It shows that the introduction of [Formula: see text]-many skip connections into network architectures, such as residual networks and additive dense networks, defines [Formula: see text]th order dynamical equations on the layer-wise transformations. Closed-form solutions for the state-space representations of general [Formula: see text]th order additive dense networks, where the concatenation operation is replaced by addition, as well as [Formula: see text]th order smooth networks, are found. The developed provision endows deep neural networks with an algebraic structure. Furthermore, it is shown that imposing [Formula: see text]th order smoothness on network architectures with [Formula: see text]-many nodes per layer increases the state-space dimension by a multiple of [Formula: see text], and so the effective embedding dimension of the data manifold by the neural network is [Formula: see text]-many dimensions. It follows that network architectures of these types reduce the number of parameters needed to maintain the same embedding dimension by a factor of [Formula: see text] when compared to an equivalent first-order, residual network. Numerical simulations and experiments on CIFAR10, SVHN, and MNIST have been conducted to help understand the developed theory and efficacy of the proposed concepts.

Download Full-text

Interpreting Deep Neural Networks Beyond Attribution Methods: Quantifying Global Importance of Genomic Features

10.1101/2020.02.19.956896 ◽

2020 ◽

Cited By ~ 1

Author(s):

Peter K. Koo ◽

Matt Ploenzke

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Population Level ◽

Computational Genomics ◽

Great Success ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genomic Features ◽

High Performing ◽

Importance Analysis

AbstractDespite deep neural networks (DNNs) having found great success at improving performance on various prediction tasks in computational genomics, it remains difficult to understand why they make any given prediction. In genomics, the main approaches to interpret a high-performing DNN are to visualize learned representations via weight visualizations and attribution methods. While these methods can be informative, each has strong limitations. For instance, attribution methods only uncover the independent contribution of single nucleotide variants in a given sequence. Here we discuss and argue for global importance analysis which can quantify population-level importance of putative features and their interactions learned by a DNN. We highlight recent work that has benefited from this interpretability approach and then discuss connections between global importance analysis and causality.

Download Full-text

Interpolation Consistency Training for Semi-supervised Learning

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/504 ◽

2019 ◽

Cited By ~ 39

Author(s):

Vikas Verma ◽

Alex Lamb ◽

Juho Kannala ◽

Yoshua Bengio ◽

David Lopez-Paz

Keyword(s):

Neural Network ◽

Neural Networks ◽

Supervised Learning ◽

Deep Neural Networks ◽

State Of The Art ◽

Data Distribution ◽

Network Architectures ◽

Low Density ◽

Decision Boundary ◽

Classification Problems

We introduce Interpolation Consistency Training (ICT), a simple and computation efficient algorithm for training Deep Neural Networks in the semi-supervised learning paradigm. ICT encourages the prediction at an interpolation of unlabeled points to be consistent with the interpolation of the predictions at those points. In classification problems, ICT moves the decision boundary to low-density regions of the data distribution. Our experiments show that ICT achieves state-of-the-art performance when applied to standard neural network architectures on the CIFAR-10 and SVHN benchmark dataset.

Download Full-text

Network Approximation using Tensor Sketching

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/321 ◽

2018 ◽

Cited By ~ 1

Author(s):

Shiva Prasad Kasiviswanathan ◽

Nina Narodytska ◽

Hongxia Jin

Keyword(s):

Neural Networks ◽

Language Processing ◽

Network Architecture ◽

Deep Neural Networks ◽

Network Architectures ◽

Effective Parameters ◽

Unified Framework ◽

Design Changes ◽

Target Network ◽

Fully Connected

Deep neural networks are powerful learning models that achieve state-of-the-art performance on many computer vision, speech, and language processing tasks. In this paper, we study a fundamental question that arises when designing deep network architectures: Given a target network architecture can we design a `smaller' network architecture that 'approximates' the operation of the target network? The question is, in part, motivated by the challenge of parameter reduction (compression) in modern deep neural networks, as the ever increasing storage and memory requirements of these networks pose a problem in resource constrained environments.In this work, we focus on deep convolutional neural network architectures, and propose a novel randomized tensor sketching technique that we utilize to develop a unified framework for approximating the operation of both the convolutional and fully connected layers. By applying the sketching technique along different tensor dimensions, we design changes to the convolutional and fully connected layers that substantially reduce the number of effective parameters in a network. We show that the resulting smaller network can be trained directly, and has a classification accuracy that is comparable to the original network.

Download Full-text

Chaotic System Prediction Using Data Assimilation and Machine Learning

E3S Web of Conferences ◽

10.1051/e3sconf/202018502025 ◽

2020 ◽

Vol 185 ◽

pp. 02025

Author(s):

Guo Yanan ◽

Cao Xiaoqun ◽

Peng Kecheng

Keyword(s):

Machine Learning ◽

Numerical Simulation ◽

Neural Networks ◽

Data Assimilation ◽

Deep Neural Networks ◽

Chaotic Systems ◽

Prediction Method ◽

Great Success ◽

Simulation Methods ◽

Numerical Simulation Methods

Atmospheric systems are typically chaotic and their chaotic nature is an important limiting factor for weather forecasting and climate prediction. So far, there have been many studies on the simulation and prediction of chaotic systems using numerical simulation methods. However, there are many intractable problems in predicting chaotic systems using numerical simulation methods, such as initial value sensitivity, error accumulation, and unreasonable parameterization of physical processes, which often lead to forecast failure. With the continuous improvement of observational techniques, data assimilation has gradually become an effective method to improve the numerical simulation prediction. In addition, with the advent of big data and the enhancement of computing resources, machine learning has achieved great success. Studies have shown that deep neural networks are capable of mining and extracting the complex physical relationships behind large amounts of data to build very good forecasting models. Therefore, in this paper, we propose a prediction method for chaotic systems that combines deep neural networks and data assimilation. To test the effectiveness of the method, we use the model to perform forecasting experiments on the Lorenz96 model. The experimental results show that the prediction method that combines neural network and data assimilation is very effective in predicting the amount of state of Lorenz96. However, Lorenz96 is a relatively simple model, and our next step will be to continue the experiments on the complex system model to test the effectiveness of the proposed method in this paper and to further optimize and improve the proposed method.

Download Full-text

Improved Training of Deep Convolutional Networks via Minimum-Variance Regularized Adaptive Sampling

10.21203/rs.3.rs-983472/v1 ◽

2021 ◽

Author(s):

Alfonso Rojas-Domínguez ◽

Ivvan Valdez ◽

Manuel Ornelas-Rodríguez ◽

Martín Carpio

Keyword(s):

Neural Networks ◽

Adaptive Sampling ◽

Sampling Method ◽

Deep Neural Networks ◽

Computational Cost ◽

Stochastic Gradient Descent ◽

Experimental Comparison ◽

Great Success ◽

Convolutional Networks ◽

Training Examples

Abstract Fostered by technological and theoretical developments, deep neural networks have achieved great success in many applications, but their training by means of mini-batch stochastic gradient descent (SGD) can be very costly due to the possibly tens of millions of parameters to be optimized and the large amounts of training examples that must be processed. Said computational cost is exacerbated by the inefficiency of the uniform sampling method typically used by SGD to form the training mini-batches: since not all training examples are equally relevant for training, sampling these under a uniform distribution is far from optimal. A better strategy is to form the mini-batches by sampling the training examples under a distribution where the probability of being selected is proportional to the relevance of each individual example. This can be achieved through Importance Sampling (IS), which also achieves the minimization of the gradients’ variance w.r.t. the network parameters, further improving convergence. In this paper, an IS-based adaptive sampling method is studied that exploits side information to construct the required probability distribution. Said method is modified to enable its application to deep neural networks, and the improved method is dubbed Regularized Adaptive Sampling (RAS). Experimental comparison (using deep convolutional networks for classification of the MNIST and CIFAR-10 datasets) of RAS against SGD and against another sampling method in the state of the art, shows that RAS achieves relative improvements of the training process, without incurring significant overhead or affecting the accuracy of the networks.

Download Full-text

An optical neural network using less than 1 photon per multiplication

Nature Communications ◽

10.1038/s41467-021-27774-8 ◽

2022 ◽

Vol 13 (1) ◽

Author(s):

Tianyu Wang ◽

Shi-Yuan Ma ◽

Logan G. Wright ◽

Tatsuhiro Onodera ◽

Brian C. Richard ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Fundamental Principle ◽

Energy Costs ◽

Network Architectures ◽

Optical Neural Networks ◽

Optical Neural Network ◽

Handwritten Digit

AbstractDeep learning has become a widespread tool in both science and industry. However, continued progress is hampered by the rapid growth in energy costs of ever-larger deep neural networks. Optical neural networks provide a potential means to solve the energy-cost problem faced by deep learning. Here, we experimentally demonstrate an optical neural network based on optical dot products that achieves 99% accuracy on handwritten-digit classification using ~3.1 detected photons per weight multiplication and ~90% accuracy using ~0.66 photons (~2.5 × 10−19 J of optical energy) per weight multiplication. The fundamental principle enabling our sub-photon-per-multiplication demonstration—noise reduction from the accumulation of scalar multiplications in dot-product sums—is applicable to many different optical-neural-network architectures. Our work shows that optical neural networks can achieve accurate results using extremely low optical energies.

Download Full-text

Interaffection of Multiple Datasets with Neural Networks in Speech Emotion Recognition

10.5753/eniac.2020.12141 ◽

2020 ◽

Author(s):

Ronnypetson Da Silva ◽

Valter M. Filho ◽

Mario Souza

Keyword(s):

Neural Network ◽

Neural Networks ◽

Emotion Recognition ◽

Deep Neural Networks ◽

Speech Emotion Recognition ◽

Network Architectures ◽

Shared Representations ◽

Multiple Datasets ◽

Neural Network Architectures

Many works that apply Deep Neural Networks (DNNs) to Speech Emotion Recognition (SER) use single datasets or train and evaluate the models separately when using multiple datasets. Those datasets are constructed with specific guidelines and the subjective nature of the labels for SER makes it difficult to obtain robust and general models. We investigate how DNNs learn shared representations for different datasets in both multi-task and unified setups. We also analyse how each dataset benefits from others in different combinations of datasets and popular neural network architectures. We show that the longstanding belief of more data resulting in more general models doesn’t always hold for SER, as different dataset and meta-parameter combinations hold the best result for each of the analysed datasets.

Download Full-text

Medical Knowledge Graph in Chinese Using Deep Semantic Mobile Computation Based on IoT and WoT

Wireless Communications and Mobile Computing ◽

10.1155/2021/5590754 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Wanheng Liu ◽

Ling Yin ◽

Cong Wang ◽

Fulin Liu ◽

Zhiyu Ni

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

State Of The Art ◽

Medical Knowledge ◽

Disease Diagnosis ◽

Knowledge Graph ◽

Great Success ◽

Smart Healthcare ◽

Made In

In this paper, a novel medical knowledge graph in Chinese approach applied in smart healthcare based on IoT and WoT is presented, using deep neural networks combined with self-attention to generate medical knowledge graph to make it more convenient for performing disease diagnosis and providing treatment advisement. Although great success has been made in the medical knowledge graph in recent studies, the issue of comprehensive medical knowledge graph in Chinese appropriate for telemedicine or mobile devices have been ignored. In our study, it is a working theory which is based on semantic mobile computing and deep learning. When several experiments have been carried out, it is demonstrated that it has better performance in generating various types of medical knowledge graph in Chinese, which is similar to that of the state-of-the-art. Also, it works well in the accuracy and comprehensive, which is much higher and highly consisted with the predictions of the theoretical model. It proves to be inspiring and encouraging that our work involving studies of medical knowledge graph in Chinese, which can stimulate the smart healthcare development.

Download Full-text

HELLO: improved neural network architectures and methodologies for small variant calling

BMC Bioinformatics ◽

10.1186/s12859-021-04311-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Anand Ramachandran ◽

Steven S. Lumetta ◽

Eric W. Klee ◽

Deming Chen

Keyword(s):

Neural Network ◽

Neural Networks ◽

Image Recognition ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Method Development ◽

Variant Calling ◽

Network Architectures ◽

Sequencing Data ◽

Neural Network Architectures

Abstract Background Modern Next Generation- and Third Generation- Sequencing methods such as Illumina and PacBio Circular Consensus Sequencing platforms provide accurate sequencing data. Parallel developments in Deep Learning have enabled the application of Deep Neural Networks to variant calling, surpassing the accuracy of classical approaches in many settings. DeepVariant, arguably the most popular among such methods, transforms the problem of variant calling into one of image recognition where a Deep Neural Network analyzes sequencing data that is formatted as images, achieving high accuracy. In this paper, we explore an alternative approach to designing Deep Neural Networks for variant calling, where we use meticulously designed Deep Neural Network architectures and customized variant inference functions that account for the underlying nature of sequencing data instead of converting the problem to one of image recognition. Results Results from 27 whole-genome variant calling experiments spanning Illumina, PacBio and hybrid Illumina-PacBio settings suggest that our method allows vastly smaller Deep Neural Networks to outperform the Inception-v3 architecture used in DeepVariant for indel and substitution-type variant calls. For example, our method reduces the number of indel call errors by up to 18%, 55% and 65% for Illumina, PacBio and hybrid Illumina-PacBio variant calling respectively, compared to a similarly trained DeepVariant pipeline. In these cases, our models are between 7 and 14 times smaller. Conclusions We believe that the improved accuracy and problem-specific customization of our models will enable more accurate pipelines and further method development in the field. HELLO is available at https://github.com/anands-repo/hello

Download Full-text