HatchEnsemble: an efficient and practical uncertainty quantification method for deep neural networks

Complex & Intelligent Systems ◽

10.1007/s40747-021-00463-1 ◽

2021 ◽

Author(s):

Yufeng Xia ◽

Jun Zhang ◽

Tingsong Jiang ◽

Zhiqiang Gong ◽

Wen Yao ◽

...

Keyword(s):

Neural Networks ◽

Uncertainty Quantification ◽

Bayesian Methods ◽

Large Scale ◽

Deep Neural Networks ◽

Computational Cost ◽

Main Idea ◽

Quantification Theory ◽

Complete Uncertainty ◽

Scale Models

AbstractQuantifying predictive uncertainty in deep neural networks is a challenging and yet unsolved problem. Existing quantification approaches can be categorized into two lines. Bayesian methods provide a complete uncertainty quantification theory but are often not scalable to large-scale models. Along another line, non-Bayesian methods have good scalability and can quantify uncertainty with high quality. The most remarkable idea in this line is Deep Ensemble, but it is limited in practice due to its expensive computational cost. Thus, we propose HatchEnsemble to improve the efficiency and practicality of Deep Ensemble. The main idea is to use function-preserving transformations, ensuring HatchNets to inherit the knowledge learned by a single model called SeedNet. This process is called hatching, and HatchNet can be obtained by continuously widening the SeedNet. Based on our method, two different hatches are proposed, respectively, for ensembling the same and different architecture networks. To ensure the diversity of models, we also add random noises to parameters during hatching. Experiments on both clean and corrupted datasets show that HatchEnsemble can give a competitive prediction performance and better-calibrated uncertainty quantification in a shorter time compared with baselines.

Download Full-text

Understanding approximate Fisher information for fast convergence of natural gradient descent in wide neural networks*

Journal of Statistical Mechanics Theory and Experiment ◽

10.1088/1742-5468/ac3ae3 ◽

2021 ◽

Vol 2021 (12) ◽

pp. 124010

Author(s):

Ryo Karakida ◽

Kazuki Osawa

Keyword(s):

Neural Networks ◽

Function Space ◽

Fisher Information ◽

Gradient Descent ◽

Large Scale ◽

Deep Neural Networks ◽

Theoretical Perspective ◽

Computational Cost ◽

Fast Convergence ◽

Natural Gradient

Abstract Natural gradient descent (NGD) helps to accelerate the convergence of gradient descent dynamics, but it requires approximations in large-scale deep neural networks because of its high computational cost. Empirical studies have confirmed that some NGD methods with approximate Fisher information converge sufficiently fast in practice. Nevertheless, it remains unclear from the theoretical perspective why and under what conditions such heuristic approximations work well. In this work, we reveal that, under specific conditions, NGD with approximate Fisher information achieves the same fast convergence to global minima as exact NGD. We consider deep neural networks in the infinite-width limit, and analyze the asymptotic training dynamics of NGD in function space via the neural tangent kernel. In the function space, the training dynamics with the approximate Fisher information are identical to those with the exact Fisher information, and they converge quickly. The fast convergence holds in layer-wise approximations; for instance, in block diagonal approximation where each block corresponds to a layer as well as in block tri-diagonal and K-FAC approximations. We also find that a unit-wise approximation achieves the same fast convergence under some assumptions. All of these different approximations have an isotropic gradient in the function space, and this plays a fundamental role in achieving the same convergence properties in training. Thus, the current study gives a novel and unified theoretical foundation with which to understand NGD methods in deep learning.

Download Full-text

Financial Market Prediction and Improving the Performance Based on Large-scale Exogenous Variables and Deep Neural Networks

Korean Institute of Smart Media ◽

10.30693/smj.2020.9.4.26 ◽

2020 ◽

Vol 9 (4) ◽

pp. 26-35

Author(s):

Sung Gil Cheon ◽

Ju Hong Lee ◽

Bum Ghi Choi ◽

Jae Won Song

Keyword(s):

Neural Networks ◽

Financial Market ◽

Large Scale ◽

Deep Neural Networks ◽

Exogenous Variables

Download Full-text

Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home

10.21437/interspeech.2017-1510 ◽

2017 ◽

Cited By ~ 35

Author(s):

Chanwoo Kim ◽

Ananya Misra ◽

Kean Chin ◽

Thad Hughes ◽

Arun Narayanan ◽

...

Keyword(s):

Neural Networks ◽

Speech Recognition ◽

Large Scale ◽

Deep Neural Networks ◽

Far Field

Download Full-text

An efficient pruning scheme of deep neural networks for Internet of Things applications

EURASIP Journal on Advances in Signal Processing ◽

10.1186/s13634-021-00744-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Chen Qi ◽

Shibo Shen ◽

Rongpeng Li ◽

Zhifeng Zhao ◽

Qing Liu ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Internet Of Things ◽

Deep Neural Networks ◽

Computational Cost ◽

Superior Performance ◽

Compact Structure ◽

Resource Limited ◽

Benchmark Datasets ◽

Iot Devices

AbstractNowadays, deep neural networks (DNNs) have been rapidly deployed to realize a number of functionalities like sensing, imaging, classification, recognition, etc. However, the computational-intensive requirement of DNNs makes it difficult to be applicable for resource-limited Internet of Things (IoT) devices. In this paper, we propose a novel pruning-based paradigm that aims to reduce the computational cost of DNNs, by uncovering a more compact structure and learning the effective weights therein, on the basis of not compromising the expressive capability of DNNs. In particular, our algorithm can achieve efficient end-to-end training that transfers a redundant neural network to a compact one with a specifically targeted compression rate directly. We comprehensively evaluate our approach on various representative benchmark datasets and compared with typical advanced convolutional neural network (CNN) architectures. The experimental results verify the superior performance and robust effectiveness of our scheme. For example, when pruning VGG on CIFAR-10, our proposed scheme is able to significantly reduce its FLOPs (floating-point operations) and number of parameters with a proportion of 76.2% and 94.1%, respectively, while still maintaining a satisfactory accuracy. To sum up, our scheme could facilitate the integration of DNNs into the common machine-learning-based IoT framework and establish distributed training of neural networks in both cloud and edge.

Download Full-text

Efficient Binarized Convolutional Layers for Visual Inspection Applications on Resource-Limited FPGAs and ASICs

Electronics ◽

10.3390/electronics10131511 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1511

Author(s):

Taylor Simons ◽

Dah-Jye Lee

Keyword(s):

Neural Networks ◽

Visual Inspection ◽

Deep Neural Networks ◽

Computational Cost ◽

Quality Inspection ◽

Agricultural Produce ◽

Resource Limited ◽

Inspection Tasks ◽

Computational Resources ◽

Small Models

There has been a recent surge in publications related to binarized neural networks (BNNs), which use binary values to represent both the weights and activations in deep neural networks (DNNs). Due to the bitwise nature of BNNs, there have been many efforts to implement BNNs on ASICs and FPGAs. While BNNs are excellent candidates for these kinds of resource-limited systems, most implementations still require very large FPGAs or CPU-FPGA co-processing systems. Our work focuses on reducing the computational cost of BNNs even further, making them more efficient to implement on FPGAs. We target embedded visual inspection tasks, like quality inspection sorting on manufactured parts and agricultural produce sorting. We propose a new binarized convolutional layer, called the neural jet features layer, that learns well-known classic computer vision kernels that are efficient to calculate as a group. We show that on visual inspection tasks, neural jet features perform comparably to standard BNN convolutional layers while using less computational resources. We also show that neural jet features tend to be more stable than BNN convolution layers when training small models.

Download Full-text

Towards pixel-to-pixel deep nucleus detection in microscopy images

BMC Bioinformatics ◽

10.1186/s12859-019-3037-5 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 3

Author(s):

Fuyong Xing ◽

Yuanpu Xie ◽

Xiaoshuang Shi ◽

Pingjun Chen ◽

Zizhao Zhang ◽

...

Keyword(s):

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Image Data ◽

Fine Tuning ◽

Cell Detection ◽

Imaging Protocol ◽

Microscopy Image ◽

Microscopy Images ◽

Target Data

Abstract Background Nucleus or cell detection is a fundamental task in microscopy image analysis and supports many other quantitative studies such as object counting, segmentation, tracking, etc. Deep neural networks are emerging as a powerful tool for biomedical image computing; in particular, convolutional neural networks have been widely applied to nucleus/cell detection in microscopy images. However, almost all models are tailored for specific datasets and their applicability to other microscopy image data remains unknown. Some existing studies casually learn and evaluate deep neural networks on multiple microscopy datasets, but there are still several critical, open questions to be addressed. Results We analyze the applicability of deep models specifically for nucleus detection across a wide variety of microscopy image data. More specifically, we present a fully convolutional network-based regression model and extensively evaluate it on large-scale digital pathology and microscopy image datasets, which consist of 23 organs (or cancer diseases) and come from multiple institutions. We demonstrate that for a specific target dataset, training with images from the same types of organs might be usually necessary for nucleus detection. Although the images can be visually similar due to the same staining technique and imaging protocol, deep models learned with images from different organs might not deliver desirable results and would require model fine-tuning to be on a par with those trained with target data. We also observe that training with a mixture of target and other/non-target data does not always mean a higher accuracy of nucleus detection, and it might require proper data manipulation during model training to achieve good performance. Conclusions We conduct a systematic case study on deep models for nucleus detection in a wide variety of microscopy images, aiming to address several important but previously understudied questions. We present and extensively evaluate an end-to-end, pixel-to-pixel fully convolutional regression network and report a few significant findings, some of which might have not been reported in previous studies. The model performance analysis and observations would be helpful to nucleus detection in microscopy images.

Download Full-text

Off-the-shelf deep learning is not enough, and requires parsimony, Bayesianity, and causality

npj Computational Materials ◽

10.1038/s41524-020-00487-0 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Rama K. Vasudevan ◽

Maxim Ziatdinov ◽

Lukas Vlcek ◽

Sergei V. Kalinin

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Bayesian Methods ◽

Deep Neural Networks ◽

Applied Research ◽

Modern Science ◽

Generative Models ◽

Knowledge Development ◽

Physical Constraints

AbstractDeep neural networks (‘deep learning’) have emerged as a technology of choice to tackle problems in speech recognition, computer vision, finance, etc. However, adoption of deep learning in physical domains brings substantial challenges stemming from the correlative nature of deep learning methods compared to the causal, hypothesis driven nature of modern science. We argue that the broad adoption of Bayesian methods incorporating prior knowledge, development of solutions with incorporated physical constraints and parsimonious structural descriptors and generative models, and ultimately adoption of causal models, offers a path forward for fundamental and applied research.

Download Full-text

Extensive deep neural networks for transferring small scale learning to large scale systems

Chemical Science ◽

10.1039/c8sc04578j ◽

2019 ◽

Vol 10 (15) ◽

pp. 4129-4140 ◽

Cited By ~ 9

Author(s):

Kyle Mills ◽

Kevin Ryczko ◽

Iryna Luchak ◽

Adam Domurad ◽

Chris Beeler ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Large Scale ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Small Scale ◽

Large Scale Systems ◽

Energy Entropy ◽

Large Systems ◽

Number Of Particles

We present a physically-motivated topology of a deep neural network that can efficiently infer extensive parameters (such as energy, entropy, or number of particles) of arbitrarily large systems, doing so with scaling.

Download Full-text

Improving human cortical sulcal curve labeling in large scale cross-sectional MRI using deep neural networks

Journal of Neuroscience Methods ◽

10.1016/j.jneumeth.2019.108311 ◽

2019 ◽

Vol 324 ◽

pp. 108311 ◽

Cited By ~ 2

Author(s):

Prasanna Parvathaneni ◽

Vishwesh Nath ◽

Maureen McHugo ◽

Yuankai Huo ◽

Susan M. Resnick ◽

...

Keyword(s):

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Cross Sectional

Download Full-text

MSpectraAI: a powerful platform for deciphering proteome profiling of multi-tumor mass spectrometry data by using deep neural networks

BMC Bioinformatics ◽

10.1186/s12859-020-03783-0 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Shisheng Wang ◽

Hongwen Zhu ◽

Hu Zhou ◽

Jingqiu Cheng ◽

Hao Yang

Keyword(s):

Mass Spectrometry ◽

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Spectral Feature ◽

Mass Spectrometry Data ◽

Learning Approaches ◽

Proteomics Data ◽

Proteome Profiling ◽

Analytical Technique

Abstract Background Mass spectrometry (MS) has become a promising analytical technique to acquire proteomics information for the characterization of biological samples. Nevertheless, most studies focus on the final proteins identified through a suite of algorithms by using partial MS spectra to compare with the sequence database, while the pattern recognition and classification of raw mass-spectrometric data remain unresolved. Results We developed an open-source and comprehensive platform, named MSpectraAI, for analyzing large-scale MS data through deep neural networks (DNNs); this system involves spectral-feature swath extraction, classification, and visualization. Moreover, this platform allows users to create their own DNN model by using Keras. To evaluate this tool, we collected the publicly available proteomics datasets of six tumor types (a total of 7,997,805 mass spectra) from the ProteomeXchange consortium and classified the samples based on the spectra profiling. The results suggest that MSpectraAI can distinguish different types of samples based on the fingerprint spectrum and achieve better prediction accuracy in MS1 level (average 0.967). Conclusion This study deciphers proteome profiling of raw mass spectrometry data and broadens the promising application of the classification and prediction of proteomics data from multi-tumor samples using deep learning methods. MSpectraAI also shows a better performance compared to the other classical machine learning approaches.

Download Full-text