scholarly journals Learning Rate Scheduling Policies

With the availability of high processing capability hardwares at less expensive prices, it is possible to successfully train multi-layered neural networks. Since then, several training algorithms have been developed, from algorithms which are statically initialized to algorithms which adaptively change. It is observed that to improve the training process of neural networks, the hyper-parameters are to be fine tuned. Learning Rate, Decay rate, number of epochs, number of hidden layers and number of neurons in the network are some of the hyper-parameters in concern. Of these, the Learning rate plays a crucial role in enhancing the learning capability of the network. Learning rate is the value by which the weights are adjusted in a neural network with respect to the gradient descending towards the expected optimum value. This paper discusses four types of learning rate scheduling which helps to find the best learning rates in less number of epochs. Following these scheduling methods, facilitates to find better initial learning rate value and step-wise updation during the later phase of the training process. In addition the discussed learning rate schedules are demonstrated using COIL-100, Caltech-101 and CIFAR-10 datasets trained on ResNet. The performance is evaluated using the metrics, Precision, Recall and F1-Score. The results analysis show that, depending on the nature of the dataset, the performance of the Learning Rate Scheduling policy varies. Hence the choice of the scheduling policy to train a neural network is made, based on the data.

2016 ◽  
Vol 25 (06) ◽  
pp. 1650033 ◽  
Author(s):  
Hossam Faris ◽  
Ibrahim Aljarah ◽  
Nailah Al-Madi ◽  
Seyedali Mirjalili

Evolutionary Neural Networks are proven to be beneficial in solving challenging datasets mainly due to the high local optima avoidance. Stochastic operators in such techniques reduce the probability of stagnation in local solutions and assist them to supersede conventional training algorithms such as Back Propagation (BP) and Levenberg-Marquardt (LM). According to the No-Free-Lunch (NFL), however, there is no optimization technique for solving all optimization problems. This means that a Neural Network trained by a new algorithm has the potential to solve a new set of problems or outperform the current techniques in solving existing problems. This motivates our attempts to investigate the efficiency of the recently proposed Evolutionary Algorithm called Lightning Search Algorithm (LSA) in training Neural Network for the first time in the literature. The LSA-based trainer is benchmarked on 16 popular medical diagnosis problems and compared to BP, LM, and 6 other evolutionary trainers. The quantitative and qualitative results show that the LSA algorithm is able to show not only better local solutions avoidance but also faster convergence speed compared to the other algorithms employed. In addition, the statistical test conducted proves that the LSA-based trainer is significantly superior in comparison with the current algorithms on the majority of datasets.


2020 ◽  
Author(s):  
B Wang ◽  
Y Sun ◽  
Bing Xue ◽  
Mengjie Zhang

© 2019, Springer Nature Switzerland AG. Image classification is a difficult machine learning task, where Convolutional Neural Networks (CNNs) have been applied for over 20 years in order to solve the problem. In recent years, instead of the traditional way of only connecting the current layer with its next layer, shortcut connections have been proposed to connect the current layer with its forward layers apart from its next layer, which has been proved to be able to facilitate the training process of deep CNNs. However, there are various ways to build the shortcut connections, it is hard to manually design the best shortcut connections when solving a particular problem, especially given the design of the network architecture is already very challenging. In this paper, a hybrid evolutionary computation (EC) method is proposed to automatically evolve both the architecture of deep CNNs and the shortcut connections. Three major contributions of this work are: Firstly, a new encoding strategy is proposed to encode a CNN, where the architecture and the shortcut connections are encoded separately; Secondly, a hybrid two-level EC method, which combines particle swarm optimisation and genetic algorithms, is developed to search for the optimal CNNs; Lastly, an adjustable learning rate is introduced for the fitness evaluations, which provides a better learning rate for the training process given a fixed number of epochs. The proposed algorithm is evaluated on three widely used benchmark datasets of image classification and compared with 12 peer Non-EC based competitors and one EC based competitor. The experimental results demonstrate that the proposed method outperforms all of the peer competitors in terms of classification accuracy.


2021 ◽  
Vol 5 (2) ◽  
pp. 312-318
Author(s):  
Rima Dias Ramadhani ◽  
Afandi Nur Aziz Thohari ◽  
Condro Kartiko ◽  
Apri Junaidi ◽  
Tri Ginanjar Laksana ◽  
...  

Waste is goods / materials that have no value in the scope of production, where in some cases the waste is disposed of carelessly and can damage the environment. The Indonesian government in 2019 recorded waste reaching 66-67 million tons, which is higher than the previous year, which was 64 million tons. Waste is differentiated based on its type, namely organic and anorganic waste. In the field of computer science, the process of sensing the type waste can be done using a camera and the Convolutional Neural Networks (CNN) method, which is a type of neural network that works by receiving input in the form of images. The input will be trained using CNN architecture so that it will produce output that can recognize the object being inputted. This study optimizes the use of the CNN method to obtain accurate results in identifying types of waste. Optimization is done by adding several hyperparameters to the CNN architecture. By adding hyperparameters, the accuracy value is 91.2%. Meanwhile, if the hyperparameter is not used, the accuracy value is only 67.6%. There are three hyperparameters used to increase the accuracy value of the model. They are dropout, padding, and stride. 20% increase in dropout to increase training overfit. Whereas padding and stride are used to speed up the model training process.


Author(s):  
Yunong Zhang ◽  
Ning Tan

Artificial neural networks (ANN), especially with error back-propagation (BP) training algorithms, have been widely investigated and applied in various science and engineering fields. However, the BP algorithms are essentially gradient-based iterative methods, which adjust the neural-network weights to bring the network input/output behavior into a desired mapping by taking a gradient-based descent direction. This kind of iterative neural-network (NN) methods has shown some inherent weaknesses, such as, 1) the possibility of being trapped into local minima, 2) the difficulty in choosing appropriate learning rates, and 3) the inability to design the optimal or smallest NN-structure. To resolve such weaknesses of BP neural networks, we have asked ourselves a special question: Could neural-network weights be determined directly without iterative BP-training? The answer appears to be YES, which is demonstrated in this chapter with three positive but different examples. In other words, a new type of artificial neural networks with linearly-independent or orthogonal activation functions, is being presented, analyzed, simulated and verified by us, of which the neural-network weights and structure could be decided directly and more deterministically as well (in comparison with usual conventional BP neural networks).


2000 ◽  
Vol 68 (1) ◽  
pp. 57-64 ◽  
Author(s):  
D. Kaiser ◽  
C. Tmej ◽  
P. Chiba ◽  
K.-J. Schaper ◽  
G. Ecker

A data set of 48 propafenone-type modulators of multidrug resistance was used to investigate the influence of learning rate and momentum factor on the predictive power of artificial neural networks of different architecture. Generally, small learning rates and medium sized momentum factors are preferred. Some of the networks showed higher cross validated Q2 values than the corresponding linear model (0.87 vs. 0.83). Screening of a 158 compound virtual library identified several new lead compounds with activities in the nanomolar range.


Author(s):  
MOHAMED ZINE EL ABIDINE SKHIRI ◽  
MOHAMED CHTOUROU

This paper investigates the applicability of the constructive approach proposed in Ref. 1 to wavelet neural networks (WNN). In fact, two incremental training algorithms will be presented. The first one, known as one pattern at a time (OPAT) approach, is the WNN version of the method applied in Ref. 1. The second approach however proposes a modified version of Ref. 1, known as one epoch at a time (OEAT) approach. In the OPAT approach, the input patterns are trained incrementally one by one until all patterns are presented. If the algorithm gets stuck in a local minimum and could not escape after a fixed number of successive attempts, then a new wavelet called also wavelon, will be recruited. In the OEAT approach however, all the input patterns are presented one epoch at a time. During one epoch, each pattern is trained only once until all patterns are trained. If the resulting overall error is reduced, then all the patterns will be retrained for one more epoch. Otherwise, a new wavelon will be recruited. To guarantee the convergence of the trained networks, an adaptive learning rate has been introduced using the discrete Lyapunov stability theorem.


1999 ◽  
Vol 11 (5) ◽  
pp. 1069-1077 ◽  
Author(s):  
Danilo P. Mandic ◽  
Jonathon A. Chambers

A relationship between the learning rate η in the learning algorithm, and the slope β in the nonlinear activation function, for a class of recurrent neural networks (RNNs) trained by the real-time recurrent learning algorithm is provided. It is shown that an arbitrary RNN can be obtained via the referent RNN, with some deterministic rules imposed on its weights and the learning rate. Such relationships reduce the number of degrees of freedom when solving the nonlinear optimization task of finding the optimal RNN parameters.


2019 ◽  
Vol 2 ◽  
pp. 1-8
Author(s):  
Dongeun Kim ◽  
Youngok Kang ◽  
Yearim Park ◽  
Nayeon Kim ◽  
Juyoon Lee ◽  
...  

<p><strong>Abstract.</strong> In this study we aim to analyze the urban image of Seoul that tourists feel through the photos uploaded on Flickr, which is one of Social Network Service (SNS) platforms that people can share Geo-tagged photos. We first categorize the photos uploaded on the site by tourists and then performed the image mining by utilizing Convolutional Neural Network (CNN), which is one of the artificial neural networks with deep learning capability. In this study we are able to find out that tourists are interested in old palaces, historical monuments, stores, food, etc. in which are considered to be the signatured sightseeing elements in Seoul. Those key elements are differentiated from the major sightseeing attractions within Seoul. The purpose of this study is two folds: First, we analyze the image of Seoul by applying the technology of image mining with the photos uploaded on Flickr by tourists. Second, we draw some significant sightseeing factors by region of attraction where tourists prefer to visit within Seoul.</p>


2016 ◽  
Vol 101 (1) ◽  
pp. 27-35 ◽  
Author(s):  
Maria Mrówczyńska

Abstract The field of processing information provided by measurement results is one of the most important components of geodetic technologies. The dynamic development of this field improves classic algorithms for numerical calculations in the aspect of analytical solutions that are difficult to achieve. Algorithms based on artificial intelligence in the form of artificial neural networks, including the topology of connections between neurons have become an important instrument connected to the problem of processing and modelling processes. This concept results from the integration of neural networks and parameter optimization methods and makes it possible to avoid the necessity to arbitrarily define the structure of a network. This kind of extension of the training process is exemplified by the algorithm called the Group Method of Data Handling (GMDH), which belongs to the class of evolutionary algorithms. The article presents a GMDH type network, used for modelling deformations of the geometrical axis of a steel chimney during its operation.


Author(s):  
Houcheng Tang ◽  
Leila Notash

Abstract In this paper, a neural network based transfer learning approach of inverse displacement analysis of robot manipulators is studied. Neural networks with different structures are applied utilizing data from different configurations of a manipulator for training purposes. Then the transfer learning was conducted between manipulators with different geometric layouts. The training is performed on both the neural networks with pretrained initial parameters and the neural networks with random initialization. To investigate the rate of convergence of data fitting comprehensively, different values of performance targets are defined. The computing epochs and performance measures are compared. It is presented that, depending on the structure of neural network, the proposed transfer learning can accelerate the training process and achieve higher accuracy. For different datasets, the transfer learning approach improves their performance differently.


Sign in / Sign up

Export Citation Format

Share Document