A Comparison of Optimization Algorithms for Deep Learning

In recent years, we have witnessed the rise of deep learning. Deep neural networks have proved their success in many areas. However, the optimization of these networks has become more difficult as neural networks going deeper and datasets becoming bigger. Therefore, more advanced optimization algorithms have been proposed over the past years. In this study, widely used optimization algorithms for deep learning are examined in detail. To this end, these algorithms called adaptive gradient methods are implemented for both supervised and unsupervised tasks. The behavior of the algorithms during training and results on four image datasets, namely, MNIST, CIFAR-10, Kaggle Flowers and Labeled Faces in the Wild are compared by pointing out their differences against basic optimization algorithms.

Download Full-text

DeepQGHO: Quantized Greedy Hyperparameter Optimization in Deep Neural Networks for on-the-fly Learning

10.21203/rs.3.rs-1146054/v1 ◽

2021 ◽

Author(s):

Anjir Ahmed Chowdhury ◽

Md Abir Hossen ◽

Md Ali Azam ◽

Md. Hafizur Rahman

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Energy Consumption ◽

Deep Neural Networks ◽

State Of The Art ◽

Optimization Algorithms ◽

Computation Time ◽

Hyperparameter Optimization ◽

Computationally Expensive ◽

Time And Energy

Abstract Hyperparameter optimization or tuning plays a significant role in the performance and reliability of deep learning (DL). Many hyperparameter optimization algorithms have been developed for obtaining better validation accuracy in DL training. Most state-of-the-art hyperparameters are computationally expensive due to a focus on validation accuracy. Therefore, they are unsuitable for online or on-the-fly training applications which require computational efficiency. In this paper, we develop a novel greedy approach-based hyperparameter optimization (GHO) algorithm for faster training applications, e.g., on-the-fly training. We perform an empirical study to compute the performance such as computation time and energy consumption of the GHO and compare it with two state-of-the-art hyperparameter optimization algorithms. We also deploy the GHO algorithm in an edge device to validate the performance of our algorithm. We perform post-training quantization to the GHO algorithm to reduce inference time and latency.

Download Full-text

Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1719367115 ◽

2018 ◽

Vol 115 (25) ◽

pp. E5716-E5725 ◽

Cited By ~ 184

Author(s):

Mohammad Sadegh Norouzzadeh ◽

Anh Nguyen ◽

Margaret Kosmala ◽

Alexandra Swanson ◽

Meredith S. Palmer ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Data Extraction ◽

High Volume ◽

Camera Trap ◽

Camera Traps ◽

Deep Convolutional Neural Networks ◽

Manual Task ◽

In The Wild

Having accurate, detailed, and up-to-date information about the location and behavior of animals in the wild would improve our ability to study and conserve ecosystems. We investigate the ability to automatically, accurately, and inexpensively collect such data, which could help catalyze the transformation of many fields of ecology, wildlife biology, zoology, conservation biology, and animal behavior into “big data” sciences. Motion-sensor “camera traps” enable collecting wildlife pictures inexpensively, unobtrusively, and frequently. However, extracting information from these pictures remains an expensive, time-consuming, manual task. We demonstrate that such information can be automatically extracted by deep learning, a cutting-edge type of artificial intelligence. We train deep convolutional neural networks to identify, count, and describe the behaviors of 48 species in the 3.2 million-image Snapshot Serengeti dataset. Our deep neural networks automatically identify animals with >93.8% accuracy, and we expect that number to improve rapidly in years to come. More importantly, if our system classifies only images it is confident about, our system can automate animal identification for 99.3% of the data while still performing at the same 96.6% accuracy as that of crowdsourced teams of human volunteers, saving >8.4 y (i.e., >17,000 h at 40 h/wk) of human labeling effort on this 3.2 million-image dataset. Those efficiency gains highlight the importance of using deep neural networks to automate data extraction from camera-trap images, reducing a roadblock for this widely used technology. Our results suggest that deep learning could enable the inexpensive, unobtrusive, high-volume, and even real-time collection of a wealth of information about vast numbers of animals in the wild.

Download Full-text

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

Acta Numerica ◽

10.1017/s0962492921000039 ◽

2021 ◽

Vol 30 ◽

pp. 203-248

Author(s):

Mikhail Belkin

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Mathematical Theory ◽

Deep Neural Networks ◽

Noisy Data ◽

Theory And Practice ◽

The Past ◽

Complex Picture ◽

Modern Machine

In the past decade the mathematical theory of machine learning has lagged far behind the triumphs of deep neural networks on practical challenges. However, the gap between theory and practice is gradually starting to close. In this paper I will attempt to assemble some pieces of the remarkable and still incomplete mathematical mosaic emerging from the efforts to understand the foundations of deep learning. The two key themes will be interpolation and its sibling over-parametrization. Interpolation corresponds to fitting data, even noisy data, exactly. Over-parametrization enables interpolation and provides flexibility to select a suitable interpolating model.As we will see, just as a physical prism separates colours mixed within a ray of light, the figurative prism of interpolation helps to disentangle generalization and optimization properties within the complex picture of modern machine learning. This article is written in the belief and hope that clearer understanding of these issues will bring us a step closer towards a general theory of deep learning and machine learning.

Download Full-text

Fastened CROWN: Tightened Neural Network Robustness Certificates

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5944 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5037-5044

Author(s):

Zhaoyang Lyu ◽

Ching-Yun Ko ◽

Zhifeng Kong ◽

Ngai Wong ◽

Dahua Lin ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Linear Programming ◽

Deep Learning ◽

Deep Neural Networks ◽

Convex Relaxation ◽

Real Life ◽

Network Robustness ◽

The Past ◽

Computationally Expensive

The rapid growth of deep learning applications in real life is accompanied by severe safety concerns. To mitigate this uneasy phenomenon, much research has been done providing reliable evaluations of the fragility level in different deep neural networks. Apart from devising adversarial attacks, quantifiers that certify safeguarded regions have also been designed in the past five years. The summarizing work in (Salman et al. 2019) unifies a family of existing verifiers under a convex relaxation framework. We draw inspiration from such work and further demonstrate the optimality of deterministic CROWN (Zhang et al. 2018) solutions in a given linear programming problem under mild constraints. Given this theoretical result, the computationally expensive linear programming based method is shown to be unnecessary. We then propose an optimization-based approach FROWN (Fastened CROWN): a general algorithm to tighten robustness certificates for neural networks. Extensive experiments on various networks trained individually verify the effectiveness of FROWN in safeguarding larger robust regions.

Download Full-text

Towards Understanding Deep Learning from Noisy Labels with Small-Loss Criterion

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/340 ◽

2021 ◽

Author(s):

Xian-Jin Gui ◽

Wei Wang ◽

Zhang-Hao Tian

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Real World ◽

Deep Neural Networks ◽

Theoretical Explanation ◽

Small Loss ◽

The Past ◽

Real World Applications ◽

Theoretical Analyses ◽

Noisy Labels

Deep neural networks need large amounts of labeled data to achieve good performance. In real-world applications, labels are usually collected from non-experts such as crowdsourcing to save cost and thus are noisy. In the past few years, deep learning methods for dealing with noisy labels have been developed, many of which are based on the small-loss criterion. However, there are few theoretical analyses to explain why these methods could learn well from noisy labels. In this paper, we theoretically explain why the widely-used small-loss criterion works. Based on the explanation, we reformalize the vanilla small-loss criterion to better tackle noisy labels. The experimental results verify our theoretical explanation and also demonstrate the effectiveness of the reformalization.

Download Full-text

Automatic Detection of Arrhythmia Based on Multi-Resolution Representation of ECG Signal

Sensors ◽

10.3390/s20061579 ◽

2020 ◽

Vol 20 (6) ◽

pp. 1579

Author(s):

Dongqi Wang ◽

Qinghua Meng ◽

Dongming Chen ◽

Hupo Zhang ◽

Lisheng Xu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Deep Neural Networks ◽

Channel Model ◽

Expert Knowledge ◽

Automatic Detection ◽

Data Representation ◽

Learning Technology ◽

Arrhythmia Detection ◽

Automatic Feature Extraction

Automatic detection of arrhythmia is of great significance for early prevention and diagnosis of cardiovascular disease. Traditional feature engineering methods based on expert knowledge lack multidimensional and multi-view information abstraction and data representation ability, so the traditional research on pattern recognition of arrhythmia detection cannot achieve satisfactory results. Recently, with the increase of deep learning technology, automatic feature extraction of ECG data based on deep neural networks has been widely discussed. In order to utilize the complementary strength between different schemes, in this paper, we propose an arrhythmia detection method based on the multi-resolution representation (MRR) of ECG signals. This method utilizes four different up to date deep neural networks as four channel models for ECG vector representations learning. The deep learning based representations, together with hand-crafted features of ECG, forms the MRR, which is the input of the downstream classification strategy. The experimental results of big ECG dataset multi-label classification confirm that the F1 score of the proposed method is 0.9238, which is 1.31%, 0.62%, 1.18% and 0.6% higher than that of each channel model. From the perspective of architecture, this proposed method is highly scalable and can be employed as an example for arrhythmia recognition.

Download Full-text

Modeling of aquifer vulnerability index using deep learning neural networks coupling with optimization algorithms

Environmental Science and Pollution Research ◽

10.1007/s11356-021-14522-0 ◽

2021 ◽

Author(s):

Hussam Eldin Elzain ◽

Sang Yong Chung ◽

Venkatramanan Senapathi ◽

Selvam Sekar ◽

Namsik Park ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Optimization Algorithms ◽

Vulnerability Index ◽

Aquifer Vulnerability

Download Full-text

Enabling deeper learning on big data for materials informatics applications

Scientific Reports ◽

10.1038/s41598-021-83193-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dipendra Jha ◽

Vishu Gupta ◽

Logan Ward ◽

Zijiang Yang ◽

Christopher Wolverton ◽

...

Keyword(s):

Neural Networks ◽

Big Data ◽

Deep Learning ◽

Deep Neural Networks ◽

Materials Science ◽

Prediction Models ◽

Model Performance ◽

Materials Informatics ◽

Learning Framework ◽

Significant Attention

AbstractThe application of machine learning (ML) techniques in materials science has attracted significant attention in recent years, due to their impressive ability to efficiently extract data-driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector-based materials representation as input to build accurate property prediction models. We find that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.

Download Full-text

Beyond Deep Learning: An Econometric Example

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488520400036 ◽

2020 ◽

Vol 28 (Supp01) ◽

pp. 31-38 ◽

Cited By ~ 1

Author(s):

Ruofan Liao ◽

Paravee Maneejuk ◽

Songsak Sriboonchitta

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Prediction Models ◽

Original Data ◽

Parametric Model ◽

Parametric Models ◽

The Past ◽

Currency Exchange ◽

Currency Exchange Rate ◽

Linear And Nonlinear

In the past, in many areas, the best prediction models were linear and nonlinear parametric models. In the last decade, in many application areas, deep learning has shown to lead to more accurate predictions than the parametric models. Deep learning-based predictions are reasonably accurate, but not perfect. How can we achieve better accuracy? To achieve this objective, we propose to combine neural networks with parametric model: namely, to train neural networks not on the original data, but on the differences between the actual data and the predictions of the parametric model. On the example of predicting currency exchange rate, we show that this idea indeed leads to more accurate predictions.

Download Full-text

Laplacian networks: bounding indicator function smoothness for neural networks robustness

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2021.2 ◽

2021 ◽

Vol 10 ◽

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Supervised Learning ◽

Indicator Function ◽

Training Data ◽

Theoretical Justification ◽

The Past ◽

Noisy Examples

For the past few years, deep learning (DL) robustness (i.e. the ability to maintain the same decision when inputs are subject to perturbations) has become a question of paramount importance, in particular in settings where misclassification can have dramatic consequences. To address this question, authors have proposed different approaches, such as adding regularizers or training using noisy examples. In this paper we introduce a regularizer based on the Laplacian of similarity graphs obtained from the representation of training data at each layer of the DL architecture. This regularizer penalizes large changes (across consecutive layers in the architecture) in the distance between examples of different classes, and as such enforces smooth variations of the class boundaries. We provide theoretical justification for this regularizer and demonstrate its effectiveness to improve robustness on classical supervised learning vision datasets for various types of perturbations. We also show it can be combined with existing methods to increase overall robustness.

Download Full-text