Mirror descent algorithm on the indefinite control horizon
Abstract We consider the problem of optimal control in a random environment in a minimax setting as applied to data processing. It is assumed that the random environment provides two methods of data processing, the effectiveness of which is not known in advance. The goal of the control in this case is to find the optimal strategy for the application of processing methods and to minimize losses. To solve this problem, the mirror descent algorithm is used, including its modifications for batch processing. The use of algorithms for batch processing allows us to get a significant gain in speed due to the parallel processing of batches. In the classical statement, the search for the optimal strategy is considered on a fixed control horizon but this article considers an indefinite control horizon. With an indefinite horizon, the control algorithm cannot use information about the value of the horizon when searching for an optimal strategy. Using numerical modeling, the operation of the mirror descent algorithm and its modifications on an indefinite control horizon is studied and obtained results are presented.