direct search
Recently Published Documents


TOTAL DOCUMENTS

591
(FIVE YEARS 68)

H-INDEX

44
(FIVE YEARS 4)

2022 ◽  
pp. 44-60
Author(s):  
J J McKeown ◽  
D Meegan ◽  
D Sprevak

2021 ◽  
Vol 6 (4(62)) ◽  
Author(s):  
Anna Zhaldak ◽  
Mariia Krasovska

The object of research is a set of stages of processes, used in the application of hunting as a method of closing vacancies. Such stages include: sources of search for candidates, ways of their interest formation, telephone conversation as an interview, negotiations and compilation of statistics with direct transfer of information to the director of the company. In the course of the study, such general scientific and specific research methods as analysis and synthesis, induction, deduction, as well as methods of comparison, observation and a systematic approach, were used. These methods are to determine the results and dynamics when recruiting strategies are changed or when they are combined. With the help of comparison methods and a systematic approach, it was possible to determine the optimal strategy for closing the required number of vacancies in the future. Using the observation method, it was possible to consider the dynamics of indicators from each selection method separately or in different combinations with each other. Among the complex methods, an analysis was used, which allowed to understand the dynamics of indicators and draw conclusions based on them on each of the options for implementing the methods. With the help of induction on the basis of a set of conclusions about each of the options separately, a generalized conclusion was made about the further rationality of the method of hunting as effective for businesses. The simulation allowed us to develop a strategy for the phased implementation of hunting based on direct search and understanding of its difference with the latter. With regard to theoretical methods, in the process of research the transition was made from the definitions and general provisions of the hunt to a specific consideration of the method in the enterprise and its direct implementation. The result of all studies was: – summary of theoretical aspects of headhunting as an effective method of attracting staff; – effective change of dynamics of indicators at the enterprise during introduction of hunting and its combination with direct search; – a developed strategy for the phased implementation of the hunt to increase the effectiveness of the method.


2021 ◽  
Author(s):  
◽  
Yiming Peng

<p>Reinforcement Learning (RL) problems appear in diverse real-world applications and are gaining substantial attention in academia and industry. Policy Direct Search (PDS) is widely recognized as an effective approach to RL problems. However, existing PDS algorithms have some major limitations. First, many step-wise Policy Gradient Search (PGS) algorithms cannot effectively utilize informative historical gradients to accurately estimate policy gradients. Second, although evolutionary PDS algorithms do not rely on accurate policy gradient estimations and can explore learning environments effectively, they are not sample efficient at learning policies in the form of deep neural networks. Third, existing PGS algorithms often diverge easily due to the lack of reliable and flexible techniques for value function learning. Fourth, existing PGS algorithms have not provided suitable mechanisms to learn proper state features automatically.  To address these limitations, the overall goal of this thesis is to develop effective policy direct search algorithms for tackling challenging RL problems through technical innovations in four key areas. First, the thesis aims to improve the accuracy of policy gradient estimation by utilizing historical gradients through a Primal-Dual Approximation technique. Second, the thesis targets on surpassing the state-of-the-art performance by properly balancing the exploration-exploitation trade-off via Covariance Matrix Adaption Evolutionary Strategy (CMA-ES) and Proximal Policy Optimization (PPO). Third, the thesis seeks to stabilize value function learning via a self-organized Sandpile Model (SM) meanwhile generalize the compatible condition to support flexible value function learning. Fourth, the thesis endeavors to develop innovative evolutionary feature learning techniques that are capable of automatically extracting useful state features so as to enhance various cutting-edge PGS algorithms.  In the thesis, we explore the four key technical areas by studying policies with increasing complexity. First of all, we start the research from a simple linear policy representation, and then proceed to a complex neural network based policy representation. Next, we consider a more complicated situation where policy learning is coupled with a value function learning. Subsequently, we consider policies modeled as a concatenation of two interrelated networks, one for feature learning and one for action selection.  To achieve the first goal, this thesis proposes a new policy gradient learning framework where a series of historical gradients are jointly exploited to obtain accurate policy gradient estimations via the Primal-Dual Approximation technique. Under the framework, three new PGS algorithms for step-wise policy training have been derived from three widely used PGS algorithms; meanwhile, the convergence properties of these new algorithms have been theoretically analyzed. The empirical results on several benchmark control problems further show that the newly proposed algorithms can significantly outperform their base algorithms.  To achieve the second goal, this thesis develops a new sample efficient evolutionary deep policy optimization algorithm based on CMA-ES and PPO. The algorithm has a layer-wise learning mechanism to improve computational efficiency in comparison to CMA-ES. Additionally, it uses a performance lower bound based surrogate model for fitness evaluation to significantly reduce the sample cost to the state-of-the-art level. More importantly, the best policy found by CMA-ES at every generation is further improved by PPO to properly balance exploration and exploitation. The experimental results confirm that the proposed algorithm outperforms various cutting-edge algorithms on many benchmark continuous control problems.  To achieve the third goal, this thesis develops new value function learning methods that are both reliable and flexible so as to further enhance the effectiveness of policy gradient search. Two Actor-Critic (AC) algorithms have been successfully developed from a commonly-used PGS algorithm, i.e., Regular Actor-Critic (RAC). The first algorithm adopts SM to stabilize value function learning, and the second algorithm generalizes the logarithm function used by the compatible condition to provide a flexible family of new compatible functions. The experimental results show that, with the help of reliable and flexible value function learning, the newly developed algorithms are more effective than RAC on several benchmark control problems.  To achieve the fourth goal, this thesis develops innovative NeuroEvolution algorithms for automated feature learning to enhance various cutting-edge PGS algorithms. The newly developed algorithms not only can extract useful state features but also learn good policies. The experimental analysis demonstrates that the newly proposed algorithms can achieve better performance on large-scale RL problems in comparison to both well-known PGS algorithms and NeuroEvolution techniques. Our experiments also confirm that the state features learned by NeuroEvolution on one RL task can be easily transferred to boost learning performance on similar but different tasks.</p>


2021 ◽  
Author(s):  
◽  
Yiming Peng

<p>Reinforcement Learning (RL) problems appear in diverse real-world applications and are gaining substantial attention in academia and industry. Policy Direct Search (PDS) is widely recognized as an effective approach to RL problems. However, existing PDS algorithms have some major limitations. First, many step-wise Policy Gradient Search (PGS) algorithms cannot effectively utilize informative historical gradients to accurately estimate policy gradients. Second, although evolutionary PDS algorithms do not rely on accurate policy gradient estimations and can explore learning environments effectively, they are not sample efficient at learning policies in the form of deep neural networks. Third, existing PGS algorithms often diverge easily due to the lack of reliable and flexible techniques for value function learning. Fourth, existing PGS algorithms have not provided suitable mechanisms to learn proper state features automatically.  To address these limitations, the overall goal of this thesis is to develop effective policy direct search algorithms for tackling challenging RL problems through technical innovations in four key areas. First, the thesis aims to improve the accuracy of policy gradient estimation by utilizing historical gradients through a Primal-Dual Approximation technique. Second, the thesis targets on surpassing the state-of-the-art performance by properly balancing the exploration-exploitation trade-off via Covariance Matrix Adaption Evolutionary Strategy (CMA-ES) and Proximal Policy Optimization (PPO). Third, the thesis seeks to stabilize value function learning via a self-organized Sandpile Model (SM) meanwhile generalize the compatible condition to support flexible value function learning. Fourth, the thesis endeavors to develop innovative evolutionary feature learning techniques that are capable of automatically extracting useful state features so as to enhance various cutting-edge PGS algorithms.  In the thesis, we explore the four key technical areas by studying policies with increasing complexity. First of all, we start the research from a simple linear policy representation, and then proceed to a complex neural network based policy representation. Next, we consider a more complicated situation where policy learning is coupled with a value function learning. Subsequently, we consider policies modeled as a concatenation of two interrelated networks, one for feature learning and one for action selection.  To achieve the first goal, this thesis proposes a new policy gradient learning framework where a series of historical gradients are jointly exploited to obtain accurate policy gradient estimations via the Primal-Dual Approximation technique. Under the framework, three new PGS algorithms for step-wise policy training have been derived from three widely used PGS algorithms; meanwhile, the convergence properties of these new algorithms have been theoretically analyzed. The empirical results on several benchmark control problems further show that the newly proposed algorithms can significantly outperform their base algorithms.  To achieve the second goal, this thesis develops a new sample efficient evolutionary deep policy optimization algorithm based on CMA-ES and PPO. The algorithm has a layer-wise learning mechanism to improve computational efficiency in comparison to CMA-ES. Additionally, it uses a performance lower bound based surrogate model for fitness evaluation to significantly reduce the sample cost to the state-of-the-art level. More importantly, the best policy found by CMA-ES at every generation is further improved by PPO to properly balance exploration and exploitation. The experimental results confirm that the proposed algorithm outperforms various cutting-edge algorithms on many benchmark continuous control problems.  To achieve the third goal, this thesis develops new value function learning methods that are both reliable and flexible so as to further enhance the effectiveness of policy gradient search. Two Actor-Critic (AC) algorithms have been successfully developed from a commonly-used PGS algorithm, i.e., Regular Actor-Critic (RAC). The first algorithm adopts SM to stabilize value function learning, and the second algorithm generalizes the logarithm function used by the compatible condition to provide a flexible family of new compatible functions. The experimental results show that, with the help of reliable and flexible value function learning, the newly developed algorithms are more effective than RAC on several benchmark control problems.  To achieve the fourth goal, this thesis develops innovative NeuroEvolution algorithms for automated feature learning to enhance various cutting-edge PGS algorithms. The newly developed algorithms not only can extract useful state features but also learn good policies. The experimental analysis demonstrates that the newly proposed algorithms can achieve better performance on large-scale RL problems in comparison to both well-known PGS algorithms and NeuroEvolution techniques. Our experiments also confirm that the state features learned by NeuroEvolution on one RL task can be easily transferred to boost learning performance on similar but different tasks.</p>


2021 ◽  
Vol 22 (11) ◽  
pp. 594-600
Author(s):  
V. P. Noskov ◽  
D. V. Gubernatorov

The actual problem of determining all six coordinates of the current position of a mobile robot (unmanned aerial vehicle) from 3D-range-finding images (point clouds) generated by an onboard 3D laser sensor when moving (flying) in an unknown environment is considered. An extreme navigation algorithm based on using multidimensional optimization methods is proposed. The rules for calculating the difference between 3D images of the external environment used for optimization of the functional are described. The form of the functional of the difference of 3D images for different environments (premises, industrial-urban environment, rugged and wooded areas) has been investigated. Requirements for the characteristics of the sensor and the geometry of the external environment are formulated, the fulfillment of which ensures the correct formulation and solution of the problem of extreme navigation. The optimal methods of scanning the surrounding space are described and the conditions are substantiated, the fulfillment of which ensures the solution of the navigation problem by the proposed algorithm in real time (at the rate of movement) when processing 3D images formed by modern 3D laser sensors. In particular, the dependence between the frequency of formation of 3D images and the angular and linear velocities of motion is described, which ensures that the functional of the difference of 3D images falls into the multidimensional interval of unimodality, which guarantees a direct search of global minimum in real time. Various methods of direct search for the global minimum of the functional are tested and the  fastest for the case under consideration are selected. The accuracy of solving the navigation problem is estimated and a method is proposed to reduce the accumulated error, based on using an older 3D image for correcting the calculated value of the current coordinates, which has an intersection of the view area with the current view area. The proposed method, which is a modification of the reference image method, allows reduce the total error, which grows in proportion to the number of cycles of solving the extreme navigation problem, to values that ensure the autonomous functioning of transport robots and UAVs in previously unprepared and unknown environments. The effectiveness of the proposed algorithmic and developed software and hardware for extreme navigation is confirmed by field experiments carried out in real conditions of various environments.


Author(s):  
Nataliya Gulayeva ◽  
Volodymyr Shylo ◽  
Mykola Glybovets

Introduction. As early as 1744, the great Leonhard Euler noted that nothing at all took place in the universe in which some rule of maximum or minimum did not appear [12]. Great many today’s scientific and engineering problems faced by humankind are of optimization nature. There exist many different methods developed to solve optimization problems, the number of these methods is estimated to be in the hundreds and continues to grow. A number of approaches to classify optimization methods based on various criteria (e.g. the type of optimization strategy or the type of solution obtained) are proposed, narrower classifications of methods solving specific types of optimization problems (e.g. combinatorial optimization problems or nonlinear programming problems) are also in use. Total number of known optimization method classes amounts to several hundreds. At the same time, methods falling into classes far from each other may often have many common properties and can be reduced to each other by rethinking certain characteristics. In view of the above, the pressing task of the modern science is to develop a general approach to classify optimization methods based on the disclosure of the involved search strategy basic principles, and to systematize existing optimization methods. The purpose is to show that genetic algorithms, usually classified as metaheuristic, population-based, simulation, etc., are inherently the stochastic numerical methods of direct search. Results. Alternative statements of optimization problem are given. An overview of existing classifications of optimization problems and basic methods to solve them is provided. The heart of optimization method classification into symbolic (analytical) and numerical ones is described. It is shown that a genetic algorithm scheme can be represented as a scheme of numerical method of direct search. A method to reduce a given optimization problem to a problem solvable by a genetic algorithm is described, and the class of problems that can be solved by genetic algorithms is outlined. Conclusions. Taking into account the existence of a great number of methods solving optimization problems and approaches to classify them it is necessary to work out a unified approach for optimization method classification and systematization. Reducing the class of genetic algorithms to numerical methods of direct search is the first step in this direction. Keywords: mathematical programming problem, unconstrained optimization problem, constrained optimization problem, multimodal optimization problem, numerical methods, genetic algorithms, metaheuristic algorithms.


2021 ◽  
pp. 433-439
Author(s):  
Ekaterina Gribanova

This paper is devoted to solving inverse problems of simulation modeling, which are presented in the form of an optimization problem. The article discusses the use of direct search methods taking into account the specifics of the problem under consideration. Due to the fact that these methods require a lot of computational experiments, two algorithms based on approximation were proposed for solving the problem. The first algorithm consists in determining and evaluating the parameters of the function of dependence (which can be linear or non-linear) of the output variable on the input variables and solving the inverse problem by minimizing increments of the arguments. In the second algorithm a linear function of dependence is iteratively constructed using the data set generated by changing the input variables in given increments, and the inverse problem is solved by minimizing increments of the arguments. The classical inventory management model with a threshold strategy is considered as an example. The inverse problem was solved using direct search and approximation-based methods.


Author(s):  
Khashayar Torabi Farsani ◽  
Maryam Dehghani ◽  
Roozbeh Abolpour ◽  
Navid Vafamand ◽  
Mohammad S. Javadi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document