scholarly journals A Zeroth-Order Adaptive Learning Rate Method to Reduce Cost of Hyperparameter Tuning for Deep Learning

2021 ◽  
Vol 11 (21) ◽  
pp. 10184
Author(s):  
Yanan Li ◽  
Xuebin Ren ◽  
Fangyuan Zhao ◽  
Shusen Yang

Due to powerful data representation ability, deep learning has dramatically improved the state-of-the-art in many practical applications. However, the utility highly depends on fine-tuning of hyper-parameters, including learning rate, batch size, and network initialization. Although many first-order adaptive methods (e.g., Adam, Adagrad) have been proposed to adjust learning rate based on gradients, they are susceptible to the initial learning rate and network architecture. Therefore, the main challenge of using deep learning in practice is how to reduce the cost of tuning hyper-parameters. To address this, we propose a heuristic zeroth-order learning rate method, Adacomp, which adaptively adjusts the learning rate based only on values of the loss function. The main idea is that Adacomp penalizes large learning rates to ensure the convergence and compensates small learning rates to accelerate the training process. Therefore, Adacomp is robust to the initial learning rate. Extensive experiments, including comparison to six typically adaptive methods (Momentum, Adagrad, RMSprop, Adadelta, Adam, and Adamax) on several benchmark datasets for image classification tasks (MNIST, KMNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100), were conducted. Experimental results show that Adacomp is not only robust to the initial learning rate but also to the network architecture, network initialization, and batch size.

2021 ◽  
Author(s):  
Ryan Santoso ◽  
Xupeng He ◽  
Marwa Alsinan ◽  
Hyung Kwak ◽  
Hussein Hoteit

Abstract Automatic fracture recognition from borehole images or outcrops is applicable for the construction of fractured reservoir models. Deep learning for fracture recognition is subject to uncertainty due to sparse and imbalanced training set, and random initialization. We present a new workflow to optimize a deep learning model under uncertainty using U-Net. We consider both epistemic and aleatoric uncertainty of the model. We propose a U-Net architecture by inserting dropout layer after every "weighting" layer. We vary the dropout probability to investigate its impact on the uncertainty response. We build the training set and assign uniform distribution for each training parameter, such as the number of epochs, batch size, and learning rate. We then perform uncertainty quantification by running the model multiple times for each realization, where we capture the aleatoric response. In this approach, which is based on Monte Carlo Dropout, the variance map and F1-scores are utilized to evaluate the need to craft additional augmentations or stop the process. This work demonstrates the existence of uncertainty within the deep learning caused by sparse and imbalanced training sets. This issue leads to unstable predictions. The overall responses are accommodated in the form of aleatoric uncertainty. Our workflow utilizes the uncertainty response (variance map) as a measure to craft additional augmentations in the training set. High variance in certain features denotes the need to add new augmented images containing the features, either through affine transformation (rotation, translation, and scaling) or utilizing similar images. The augmentation improves the accuracy of the prediction, reduces the variance prediction, and stabilizes the output. Architecture, number of epochs, batch size, and learning rate are optimized under a fixed-uncertain training set. We perform the optimization by searching the global maximum of accuracy after running multiple realizations. Besides the quality of the training set, the learning rate is the heavy-hitter in the optimization process. The selected learning rate controls the diffusion of information in the model. Under the imbalanced condition, fast learning rates cause the model to miss the main features. The other challenge in fracture recognition on a real outcrop is to optimally pick the parental images to generate the initial training set. We suggest picking images from multiple sides of the outcrop, which shows significant variations of the features. This technique is needed to avoid long iteration within the workflow. We introduce a new approach to address the uncertainties associated with the training process and with the physical problem. The proposed approach is general in concept and can be applied to various deep-learning problems in geoscience.


In the recent past, Deep Learning models [1] are predominantly being used in Object Detection algorithms due to their accurate Image Recognition capability. These models extract features from the input images and videos [2] for identification of objects present in them. Various applications of these models include Image Processing, Video analysis, Speech Recognition, Biomedical Image Analysis, Biometric Recognition, Iris Recognition, National Security applications, Cyber Security, Natural Language Processing [3], Weather Forecasting applications, Renewable Energy Generation Scheduling etc. These models utilize the concept of Convolutional Neural Network (CNN) [3], which constitutes several layers of artificial neurons. The accuracy of Deep Learning models [1] depends on various parameters such as ‘Learning-rate’, ‘Training batch size’, ‘Validation batch size’, ‘Activation Function’, ‘Drop-out rate’ etc. These parameters are known as Hyper-Parameters. Object detection accuracy depends on selection of Hyperparameters and these in-turn decides the optimum accuracy. Hence, finding the best values for these parameters is a challenging task. Fine-Tuning is a process used for selection of a suitable Hyper-Parameter value for improvement of object detection accuracy. Selection of an inappropriate Hyper-Parameter value, leads to Over-Fitting or Under-Fitting of data. Over-Fitting is a case, when training data is larger than the required, which results in learning noise and inaccurate object detection. Under-fitting is a case, when the model is unable to capture the trend of the data and which leads to more erroneous results in testing or training data. In this paper, a balance between Over-fitting and Under-fitting is achieved by varying the ‘Learning rate’ of various Deep Learning models. Four Deep Learning Models such as VGG16, VGG19, InceptionV3 and Xception are considered in this paper for analysis purpose. The best zone of Learning-rate for each model, in respect of maximum Object Detection accuracy, is analyzed. In this paper a dataset of 70 object classes is taken and the prediction accuracy is analyzed by changing the ‘Learning-rate’ and keeping the rest of the Hyper-Parameters constant. This paper mainly concentrates on the impact of ‘Learning-rate’ on accuracy and identifies an optimum accuracy zone in Object Detection


Insects ◽  
2020 ◽  
Vol 11 (8) ◽  
pp. 458
Author(s):  
Sijing Ye ◽  
Shuhan Lu ◽  
Xuesong Bai ◽  
Jinfeng Gu

Locusts are agricultural pests found in many parts of the world. Developing efficient and accurate locust information acquisition techniques helps in understanding the relation between locust distribution density and structural changes in locust communities. It also helps in understanding the hydrothermal and vegetation growth conditions that affect locusts in their habitats in various parts of the world as well as in providing rapid and accurate warnings on locust plague outbreak. This study is a preliminary attempt to explore whether the batch normalization-based convolutional neural network (CNN) model can be applied used to perform automatic classification of East Asian migratory locust (AM locust), Oxya chinensis (rice locusts), and cotton locusts. In this paper, we present a way of applying the CNN technique to identify species and instars of locusts using the proposed ResNet-Locust-BN model. This model is based on the ResNet architecture and involves introduction of a BatchNorm function before each convolution layer to improve the network’s stability, convergence speed, and classification accuracy. Subsequently, locust image data collected in the field were used as input to train the model. By performing comparison experiments of the activation function, initial learning rate, and batch size, we selected ReLU as the preferred activation function. The initial learning rate and batch size were set to 0.1 and 32, respectively. Experiments performed to evaluate the accuracy of the proposed ResNet-Locust-BN model show that the model can effectively distinguish AM locust from rice locusts (93.60% accuracy) and cotton locusts (97.80% accuracy). The model also performed well in identifying the growth status information of AM locusts (third-instar (77.20% accuracy), fifth-instar (88.40% accuracy), and adult (93.80% accuracy)) with an overall accuracy of 90.16%. This is higher than the accuracy scores obtained by using other typical models: AlexNet (73.68%), GoogLeNet (69.12%), ResNet 18 (67.60%), ResNet 50 (80.84%), and VggNet (81.70%). Further, the model has good robustness and fast convergence rate.


2021 ◽  
Vol 30 (1) ◽  
pp. 1-18
Author(s):  
Yusuf Hendrawan ◽  
Shinta Widyaningtyas ◽  
Muchammad Riza Fauzy ◽  
Sucipto Sucipto ◽  
Retno Damayanti ◽  
...  

Luwak coffee (palm civet coffee) is known as one of the most expensive coffee in the world. In order to lower production costs, Indonesian producers and retailers often mix high-priced Luwak coffee with regular coffee green beans. However, the absence of tools and methods to classify Luwak coffee counterfeiting makes the sensing method’s development urgent. The research aimed to detect and classify Luwak coffee green beans purity into the following purity categories, very low (0-25%), low (25-50%), medium (50-75%), and high (75-100%). The classifying method relied on a low-cost commercial visible light camera and the deep learning model method. Then, the research also compared the performance of four pre-trained convolutional neural network (CNN) models consisting of SqueezeNet, GoogLeNet, ResNet-50, and AlexNet. At the same time, the sensitivity analysis was performed by setting the CNN parameters such as optimization technique (SGDm, Adam, RMSProp) and the initial learning rate (0.00005 and 0.0001). The training and validation result obtained the GoogLeNet as the best CNN model with optimizer type Adam and learning rate 0.0001, which resulted in 89.65% accuracy. Furthermore, the testing process using confusion matrix from different sample data obtained the best CNN model using ResNet-50 with optimizer type RMSProp and learning rate 0.0001, providing an accuracy average of up to 85.00%. Later, the CNN model can be used to establish a real-time, non-destructive, rapid, and precise purity detection system.


2020 ◽  
Vol 10 (17) ◽  
pp. 5792 ◽  
Author(s):  
Biserka Petrovska ◽  
Tatjana Atanasova-Pacemska ◽  
Roberto Corizzo ◽  
Paolo Mignone ◽  
Petre Lameski ◽  
...  

Remote Sensing (RS) image classification has recently attracted great attention for its application in different tasks, including environmental monitoring, battlefield surveillance, and geospatial object detection. The best practices for these tasks often involve transfer learning from pre-trained Convolutional Neural Networks (CNNs). A common approach in the literature is employing CNNs for feature extraction, and subsequently train classifiers exploiting such features. In this paper, we propose the adoption of transfer learning by fine-tuning pre-trained CNNs for end-to-end aerial image classification. Our approach performs feature extraction from the fine-tuned neural networks and remote sensing image classification with a Support Vector Machine (SVM) model with linear and Radial Basis Function (RBF) kernels. To tune the learning rate hyperparameter, we employ a linear decay learning rate scheduler as well as cyclical learning rates. Moreover, in order to mitigate the overfitting problem of pre-trained models, we apply label smoothing regularization. For the fine-tuning and feature extraction process, we adopt the Inception-v3 and Xception inception-based CNNs, as well the residual-based networks ResNet50 and DenseNet121. We present extensive experiments on two real-world remote sensing image datasets: AID and NWPU-RESISC45. The results show that the proposed method exhibits classification accuracy of up to 98%, outperforming other state-of-the-art methods.


2019 ◽  
Vol 3 (2) ◽  
pp. 422
Author(s):  
Jaya Tata Hardinata ◽  
Harly Okprana ◽  
Agus Perdana Windarto ◽  
Widodo Saputra

Backpropagation is an artificial neural network that has the architecture in conducting training and determining the right parameters to produce the correct output of similar but not the same input. One of the parameters that influences the determination of bacpropagation architecture is the rate of learning, where if the value of the learning rate is too high then the network architecture becomes unstable otherwise if the value of the learning rate is too low the network architecture converges and takes a long time in training network architecture. This research data is secondary data sourced from UCI Data Mechine Learning. The best network architecture in this study is 13-10-3, with different learning rates ranging from 0.01, 0.03, 0.06, 0.01, 0.13, 0.16, 0.2, 0.23, 0.026, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.9. From the 21 different learning rate values in the 13-10-3 network architecture, it is found that the level of learning rate is very important to get the right and fast network architecture. This can be seen in experiments with a learning rate of 0.65 can produce a better level of accuracy compared to a learning rate smaller than 0.65.


2019 ◽  
Vol 11 (5) ◽  
pp. 523 ◽  
Author(s):  
Charlotte Pelletier ◽  
Geoffrey Webb ◽  
François Petitjean

Latest remote sensing sensors are capable of acquiring high spatial and spectral Satellite Image Time Series (SITS) of the world. These image series are a key component of classification systems that aim at obtaining up-to-date and accurate land cover maps of the Earth’s surfaces. More specifically, current SITS combine high temporal, spectral and spatial resolutions, which makes it possible to closely monitor vegetation dynamics. Although traditional classification algorithms, such as Random Forest (RF), have been successfully applied to create land cover maps from SITS, these algorithms do not make the most of the temporal domain. This paper proposes a comprehensive study of Temporal Convolutional Neural Networks (TempCNNs), a deep learning approach which applies convolutions in the temporal dimension in order to automatically learn temporal (and spectral) features. The goal of this paper is to quantitatively and qualitatively evaluate the contribution of TempCNNs for SITS classification, as compared to RF and Recurrent Neural Networks (RNNs) —a standard deep learning approach that is particularly suited to temporal data. We carry out experiments on Formosat-2 scene with 46 images and one million labelled time series. The experimental results show that TempCNNs are more accurate than the current state of the art for SITS classification. We provide some general guidelines on the network architecture, common regularization mechanisms, and hyper-parameter values such as batch size; we also draw out some differences with standard results in computer vision (e.g., about pooling layers). Finally, we assess the visual quality of the land cover maps produced by TempCNNs.


With the availability of high processing capability hardwares at less expensive prices, it is possible to successfully train multi-layered neural networks. Since then, several training algorithms have been developed, from algorithms which are statically initialized to algorithms which adaptively change. It is observed that to improve the training process of neural networks, the hyper-parameters are to be fine tuned. Learning Rate, Decay rate, number of epochs, number of hidden layers and number of neurons in the network are some of the hyper-parameters in concern. Of these, the Learning rate plays a crucial role in enhancing the learning capability of the network. Learning rate is the value by which the weights are adjusted in a neural network with respect to the gradient descending towards the expected optimum value. This paper discusses four types of learning rate scheduling which helps to find the best learning rates in less number of epochs. Following these scheduling methods, facilitates to find better initial learning rate value and step-wise updation during the later phase of the training process. In addition the discussed learning rate schedules are demonstrated using COIL-100, Caltech-101 and CIFAR-10 datasets trained on ResNet. The performance is evaluated using the metrics, Precision, Recall and F1-Score. The results analysis show that, depending on the nature of the dataset, the performance of the Learning Rate Scheduling policy varies. Hence the choice of the scheduling policy to train a neural network is made, based on the data.


2020 ◽  
Vol 19 (2) ◽  
pp. 151
Author(s):  
Ida Bagus Leo Mahadya Suta ◽  
Made Sudarma ◽  
I Nyoman Satya Kumara

Tumor otak merupakan salah satu penyakit yang mematikan dimana 3.7% per 100.000 pasien mengidap tumor ganas. Untuk menganalisa tumor otak dapat dilakukan melalui segmentasi citra Magnetic Resonance Imaging (MRI). Proses analisa citra secara otomatis dibutuhkan untuk menghemat waktu dan meningkatkan akurasi dari diagnosa yang dilakukan. Segmentasi secara otomatis dapat dilakukan dengan deep learning. U-NET merupakan salah satu metode yang digunakan untuk melakukan segmentasi citra medis karena bekerja dapa pixel level. Dengan menerapkan fungsi aktivasi ReLU dan Adam Optimizer, metode ini dapat menyelesaikan permasalahan segmentasi tumor otak. Dataset untuk proses training dan validation menggunakan BRATS 2017. Beberapa hyperparameter diterapkan pada metode ini yaitu, learning rate (lr) = 0.0001, batch size (bz) = 5, epoch = 80 dan beta (  ) = 0.9. Dari serangkaian proses yang dilakukan, akurasi metode U-NET dihitung dengan rumus Dice Coefficient dan menghasilkan nilai akurasi sebagai berikut: 90.22% (Full Tumor), 78.09% (Core Tumor) dan 80.20% (Enhancing Tumor).


Sign in / Sign up

Export Citation Format

Share Document