An Adaptive Optimization Method Based on Learning Rate Schedule for Neural Networks

Artificial intelligence (AI) is achieved by optimizing the cost function constructed from learning data. Changing the parameters in the cost function is an AI learning process (or AI learning for convenience). If AI learning is well performed, then the value of the cost function is the global minimum. In order to obtain the well-learned AI learning, the parameter should be no change in the value of the cost function at the global minimum. One useful optimization method is the momentum method; however, the momentum method has difficulty stopping the parameter when the value of the cost function satisfies the global minimum (non-stop problem). The proposed method is based on the momentum method. In order to solve the non-stop problem of the momentum method, we use the value of the cost function to our method. Therefore, as the learning method processes, the mechanism in our method reduces the amount of change in the parameter by the effect of the value of the cost function. We verified the method through proof of convergence and numerical experiments with existing methods to ensure that the learning works well.

Download Full-text

An Enhanced Optimization Scheme Based on Gradient Descent Methods for Machine Learning

Symmetry ◽

10.3390/sym11070942 ◽

2019 ◽

Vol 11 (7) ◽

pp. 942 ◽

Cited By ~ 3

Author(s):

Dokkyun Yi ◽

Sangmin Ji ◽

Sunyoung Bu

Keyword(s):

Machine Learning ◽

Cost Function ◽

Gradient Descent ◽

Local Minimum ◽

Global Minimum ◽

Convergence Condition ◽

First Derivative ◽

Estimation Scheme ◽

The Cost ◽

Learning Data

A The learning process of machine learning consists of finding values of unknown weights in a cost function by minimizing the cost function based on learning data. However, since the cost function is not convex, it is conundrum to find the minimum value of the cost function. The existing methods used to find the minimum values usually use the first derivative of the cost function. When even the local minimum (but not a global minimum) is reached, since the first derivative of the cost function becomes zero, the methods give the local minimum values, so that the desired global minimum cannot be found. To overcome this problem, in this paper we modified one of the existing schemes—the adaptive momentum estimation scheme—by adding a new term, so that it can prevent the new optimizer from staying at local minimum. The convergence condition for the proposed scheme and the convergence value are also analyzed, and further explained through several numerical experiments whose cost function is non-convex.

Download Full-text

Convergence of Simulated Annealing with Feedback Temperature Schedules

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800004836 ◽

1997 ◽

Vol 11 (3) ◽

pp. 279-304 ◽

Cited By ~ 4

Author(s):

M. Kolonko ◽

M. T. Tran

Keyword(s):

Simulated Annealing ◽

Cost Function ◽

Job Shop ◽

Job Shop Scheduling ◽

Optimization Method ◽

Search Process ◽

Local Optimum ◽

Fixed Sequence ◽

Temperature Parameter ◽

The Cost

It is well known that the standard simulated annealing optimization method converges in distribution to the minimum of the cost function if the probability a for accepting an increase in costs goes to 0. α is controlled by the “temperature” parameter, which in the standard setup is a fixed sequence of values converging slowly to 0. We study a more general model in which the temperature may depend on the state of the search process. This allows us to adapt the temperature to the landscape of the cost function. The temperature may temporarily rise such that the process can leave a local optimum more easily. We give weak conditions on the temperature schedules such that the process of solutions finally concentrates near the optimal solutions. We also briefly sketch computational results for the job shop scheduling problem.

Download Full-text

Quasi-static ensemble variational data assimilation: a theoretical and numerical study with the iterative ensemble Kalman smoother

Nonlinear Processes in Geophysics ◽

10.5194/npg-25-315-2018 ◽

2018 ◽

Vol 25 (2) ◽

pp. 315-334 ◽

Cited By ~ 2

Author(s):

Anthony Fillion ◽

Marc Bocquet ◽

Serge Gratton

Keyword(s):

Data Assimilation ◽

Cost Function ◽

Global Minimum ◽

Numerical Study ◽

Variational Data Assimilation ◽

Kalman Smoother ◽

Local Extrema ◽

Starting Point ◽

The Cost ◽

Temporal Extent

Abstract. The analysis in nonlinear variational data assimilation is the solution of a non-quadratic minimization. Thus, the analysis efficiency relies on its ability to locate a global minimum of the cost function. If this minimization uses a Gauss–Newton (GN) method, it is critical for the starting point to be in the attraction basin of a global minimum. Otherwise the method may converge to a local extremum, which degrades the analysis. With chaotic models, the number of local extrema often increases with the temporal extent of the data assimilation window, making the former condition harder to satisfy. This is unfortunate because the assimilation performance also increases with this temporal extent. However, a quasi-static (QS) minimization may overcome these local extrema. It accomplishes this by gradually injecting the observations in the cost function. This method was introduced by Pires et al. (1996) in a 4D-Var context. We generalize this approach to four-dimensional strong-constraint nonlinear ensemble variational (EnVar) methods, which are based on both a nonlinear variational analysis and the propagation of dynamical error statistics via an ensemble. This forces one to consider the cost function minimizations in the broader context of cycled data assimilation algorithms. We adapt this QS approach to the iterative ensemble Kalman smoother (IEnKS), an exemplar of nonlinear deterministic four-dimensional EnVar methods. Using low-order models, we quantify the positive impact of the QS approach on the IEnKS, especially for long data assimilation windows. We also examine the computational cost of QS implementations and suggest cheaper algorithms.

Download Full-text

Parallel sequential Monte Carlo for stochastic gradient-free nonconvex optimization

Statistics and Computing ◽

10.1007/s11222-020-09964-4 ◽

2020 ◽

Vol 30 (6) ◽

pp. 1645-1663

Author(s):

Ömer Deniz Akyildiz ◽

Dan Crisan ◽

Joaquín Míguez

Keyword(s):

Monte Carlo ◽

Cost Function ◽

Global Minimum ◽

Sequential Monte Carlo ◽

Convergence Rates ◽

Optimization Problems ◽

Search Space ◽

Gradient Based ◽

Multiple Minima ◽

The Cost

Abstract We introduce and analyze a parallel sequential Monte Carlo methodology for the numerical solution of optimization problems that involve the minimization of a cost function that consists of the sum of many individual components. The proposed scheme is a stochastic zeroth-order optimization algorithm which demands only the capability to evaluate small subsets of components of the cost function. It can be depicted as a bank of samplers that generate particle approximations of several sequences of probability measures. These measures are constructed in such a way that they have associated probability density functions whose global maxima coincide with the global minima of the original cost function. The algorithm selects the best performing sampler and uses it to approximate a global minimum of the cost function. We prove analytically that the resulting estimator converges to a global minimum of the cost function almost surely and provide explicit convergence rates in terms of the number of generated Monte Carlo samples and the dimension of the search space. We show, by way of numerical examples, that the algorithm can tackle cost functions with multiple minima or with broad “flat” regions which are hard to minimize using gradient-based techniques.

Download Full-text

Optimal Eco-Driving Cycles for Conventional Vehicles Using a Genetic Algorithm

Energies ◽

10.3390/en13174362 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4362

Author(s):

Subramaniam Saravana Sankar ◽

Yiqun Xia ◽

Julaluk Carmai ◽

Saiprasit Koetniyom

Keyword(s):

Genetic Algorithm ◽

Cost Function ◽

Fuel Consumption ◽

Optimization Algorithm ◽

Internal Combustion Engines ◽

Optimization Method ◽

Computational Time ◽

Weight Factor ◽

Driving Cycles ◽

The Cost

The goal of this work is to compute the eco-driving cycles for vehicles equipped with internal combustion engines by using a genetic algorithm (GA) with a focus on reducing energy consumption. The proposed GA-based optimization method uses an optimal control problem (OCP), which is framed considering both fuel consumption and driver comfort in the cost function formulation with the support of a tunable weight factor to enhance the overall performance of the algorithm. The results and functioning of the optimization algorithm are analyzed with several widely used standard driving cycles and a simulated real-world driving cycle. For the selected optimal weight factor, the simulation results show that an average reduction of eight percent in fuel consumption is achieved. The results of parallelization in computing the cost function indicates that the computational time required by the optimization algorithm is reduced based on the hardware used.

Download Full-text

Fitting Parametric Vortices to Aliased Doppler Velocities Scanned from Hurricanes

Monthly Weather Review ◽

10.1175/mwr-d-12-00362.1 ◽

2014 ◽

Vol 142 (1) ◽

pp. 94-106 ◽

Cited By ~ 5

Author(s):

Qin Xu ◽

Yuan Jiang ◽

Liping Liu

Keyword(s):

Cost Function ◽

Global Minimum ◽

Least Squares Method ◽

Tangential Velocity ◽

Radial Distance ◽

Initial Guess ◽

Radial Velocities ◽

Vortex Center ◽

Robust Least Squares ◽

The Cost

Abstract An alias-robust least squares method that produces less errors than established methods is developed to produce reference radial velocities for automatically correcting raw aliased Doppler velocities scanned from hurricanes. This method estimates the maximum tangential velocity VM and its radial distance RM from the hurricane vortex center by fitting a parametric vortex model directly to raw aliased velocities at and around each selected vertical level. In this method, aliasing-caused zigzag discontinuities in the relationship between the observed and true radial velocities are formulated into the cost function by applying an alias operator to the entire analysis-minus-observation term to ensure the cost function to be smooth and concave around the global minimum. Simulated radar velocity observations are used to examine the cost function geometry around the global minimum in the space of control parameters (VM, RM). The results show that the global minimum point can estimate the true (VM, RM) approximately if the hurricane vortex center location is approximately known and the hurricane core and vicinity areas are adequately covered by the radar scans, and the global minimum can be found accurately by an efficient descent algorithm as long as the initial guess is in the concave vicinity of the global minimum. The method is used with elaborated refinements for automated dealiasing, and this utility is highlighted by an example applied to severely aliased radial velocities scanned from a hurricane.

Download Full-text

A Novel Learning Rate Schedule in Optimization for Neural Networks and It’s Convergence

Symmetry ◽

10.3390/sym12040660 ◽

2020 ◽

Vol 12 (4) ◽

pp. 660 ◽

Cited By ~ 2

Author(s):

Jieun Park ◽

Dokkyun Yi ◽

Sangmin Ji

Keyword(s):

Neural Networks ◽

Cost Function ◽

Numerical Experiments ◽

Learning Rate ◽

Optimal Parameters ◽

Effective Learning ◽

Symmetric Optimization ◽

Constant Rate ◽

The Cost ◽

Iteration Time

The process of machine learning is to find parameters that minimize the cost function constructed by learning the data. This is called optimization and the parameters at that time are called the optimal parameters in neural networks. In the process of finding the optimization, there were attempts to solve the symmetric optimization or initialize the parameters symmetrically. Furthermore, in order to obtain the optimal parameters, the existing methods have used methods in which the learning rate is decreased over the iteration time or is changed according to a certain ratio. These methods are a monotonically decreasing method at a constant rate according to the iteration time. Our idea is to make the learning rate changeable unlike the monotonically decreasing method. We introduce a method to find the optimal parameters which adaptively changes the learning rate according to the value of the cost function. Therefore, when the cost function is optimized, the learning is complete and the optimal parameters are obtained. This paper proves that the method ensures convergence to the optimal parameters. This means that our method achieves a minimum of the cost function (or effective learning). Numerical experiments demonstrate that learning is good effective when using the proposed learning rate schedule in various situations.

Download Full-text

Estimation of Overtopping Discharges with Deep Neural Network(DNN) Method

Korea Society of Coastal Disaster Prevention ◽

10.20481/kscdp.2021.8.4.229 ◽

2021 ◽

Vol 8 (4) ◽

pp. 229-236

Author(s):

Changkyum Kim ◽

Insik Chun ◽

Byungcheol Oh

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Cost Function ◽

Deep Neural Network ◽

Multiple Linear Regression Model ◽

Predictive Performance ◽

Data Sets ◽

Network Training ◽

The Neural Network ◽

The Cost

An Artificial Intelligence(AI) study was conducted to calculate overtopping discharges for various coastal structures. The Deep Neural Network(DNN), one of the artificial intelligence methods, was employed in the study. The neural network was trained, validated and tested using the EurOtop database containing the experimental data collected from all over the world. To improve the accuracy of the deep neural network results, all data were non-dimensionalized and max-min normalized as a preprocessing process. L2 regularization was also introduced in the cost function to secure the convergence of iterative learning, and the cost function was optimized using RMSProp and Adam techniques. In order to compare the performance of DNN, additional calculations based on the multiple linear regression model and EurOtop’s overtopping formulas were done as well, using the data sets which were not included in the network training. The results showed that the predictive performance of the AI technique was relatively superior to the two other methods.

Download Full-text

An Effective Optimization Method for Machine Learning Based on ADAM

Applied Sciences ◽

10.3390/app10031073 ◽

2020 ◽

Vol 10 (3) ◽

pp. 1073 ◽

Cited By ~ 4

Author(s):

Dokkyun Yi ◽

Jaehyun Ahn ◽

Sangmin Ji

Keyword(s):

Machine Learning ◽

Cost Function ◽

Gradient Descent ◽

Local Minimum ◽

Optimization Method ◽

Activation Function ◽

Numerical Comparison ◽

First Derivative ◽

Artificial Neural Network Ann ◽

The Cost

A machine is taught by finding the minimum value of the cost function which is induced by learning data. Unfortunately, as the amount of learning increases, the non-liner activation function in the artificial neural network (ANN), the complexity of the artificial intelligence structures, and the cost function’s non-convex complexity all increase. We know that a non-convex function has local minimums, and that the first derivative of the cost function is zero at a local minimum. Therefore, the methods based on a gradient descent optimization do not undergo further change when they fall to a local minimum because they are based on the first derivative of the cost function. This paper introduces a novel optimization method to make machine learning more efficient. In other words, we construct an effective optimization method for non-convex cost function. The proposed method solves the problem of falling into a local minimum by adding the cost function in the parameter update rule of the ADAM method. We prove the convergence of the sequences generated from the proposed method and the superiority of the proposed method by numerical comparison with gradient descent (GD, ADAM, and AdaMax).

Download Full-text

Two-Dimensional Symmetric Box Delivery Motion Prediction and Validation: Subtask-Based Optimization Method

Applied Sciences ◽

10.3390/app10248798 ◽

2020 ◽

Vol 10 (24) ◽

pp. 8798

Author(s):

Yujiang Xiang ◽

Shadman Tahmid ◽

Paul Owens ◽

James Yang

Keyword(s):

Cost Function ◽

Material Handling ◽

Optimization Method ◽

Joint Torque ◽

Two Dimensional ◽

Computationally Efficient ◽

Inverse Dynamic ◽

Manual Material Handling ◽

Research Outcome ◽

The Cost

Box delivery is a complicated manual material handling task which needs to consider the box weight, delivering speed, stability, and location. This paper presents a subtask-based inverse dynamic optimization formulation for determining the two-dimensional (2D) symmetric optimal box delivery motion. For the subtask-based formulation, the delivery task is divided into five subtasks: lifting, the first transition step, carrying, the second transition step, and unloading. To render a complete delivering task, each subtask is formulated as a separate optimization problem with appropriate boundary conditions. For carrying and lifting subtasks, the cost function is the sum of joint torque squared. In contrast, for transition subtasks, the cost function is the combination of joint discomfort and joint torque squared. Joint angle profiles are validated through experimental results using Pearson’s correlation coefficient (r) and root-mean-square-error (RMSE). Results show that the subtask-based approach is computationally efficient for complex box delivery motion simulation. This research outcome provides a practical guidance to prevent injury risks in joint torque space for workers who deliver heavy objects in their daily jobs.

Download Full-text