Parameter estimation in quantum sensing based on deep reinforcement learning

Tailong Xiao; Jianping Fan; Guihua Zeng

doi:10.1038/s41534-021-00513-z

Parameter estimation in quantum sensing based on deep reinforcement learning

npj Quantum Information ◽

10.1038/s41534-021-00513-z ◽

2022 ◽

Vol 8 (1) ◽

Author(s):

Tailong Xiao ◽

Jianping Fan ◽

Guihua Zeng

Keyword(s):

Parameter Estimation ◽

Reinforcement Learning ◽

Quantum Control ◽

Linear Time ◽

Time Dependent ◽

Global Optimality ◽

True Parameter ◽

Reward Function ◽

Quantum Sensing ◽

Dependent Parameter

AbstractParameter estimation is a pivotal task, where quantum technologies can enhance precision greatly. We investigate the time-dependent parameter estimation based on deep reinforcement learning, where the noise-free and noisy bounds of parameter estimation are derived from a geometrical perspective. We propose a physical-inspired linear time-correlated control ansatz and a general well-defined reward function integrated with the derived bounds to accelerate the network training for fast generating quantum control signals. In the light of the proposed scheme, we validate the performance of time-dependent and time-independent parameter estimation under noise-free and noisy dynamics. In particular, we evaluate the transferability of the scheme when the parameter has a shift from the true parameter. The simulation showcases the robustness and sample efficiency of the scheme and achieves the state-of-the-art performance. Our work highlights the universality and global optimality of deep reinforcement learning over conventional methods in practical parameter estimation of quantum sensing.

Download Full-text

Generalizable control for quantum parameter estimation through reinforcement learning

npj Quantum Information ◽

10.1038/s41534-019-0198-z ◽

2019 ◽

Vol 5 (1) ◽

Cited By ~ 12

Author(s):

Han Xu ◽

Junning Li ◽

Liqiang Liu ◽

Yu Wang ◽

Haidong Yuan ◽

...

Keyword(s):

Parameter Estimation ◽

Reinforcement Learning ◽

Quantum Control ◽

Optimal Controls ◽

Estimation Of Parameters ◽

Science And Engineering ◽

Quantum Parameter ◽

The Neural Network ◽

Efficient Alternative ◽

Quantum Parameter Estimation

Abstract Measurement and estimation of parameters are essential for science and engineering, where one of the main quests is to find systematic schemes that can achieve high precision. While conventional schemes for quantum parameter estimation focus on the optimization of the probe states and measurements, it has been recently realized that control during the evolution can significantly improve the precision. The identification of optimal controls, however, is often computationally demanding, as typically the optimal controls depend on the value of the parameter which then needs to be re-calculated after the update of the estimation in each iteration. Here we show that reinforcement learning provides an efficient way to identify the controls that can be employed to improve the precision. We also demonstrate that reinforcement learning is highly generalizable, namely the neural network trained under one particular value of the parameter can work for different values within a broad range. These desired features make reinforcement learning an efficient alternative to conventional optimal quantum control methods.

Download Full-text

A time-dependent parameter estimation framework for crop modeling

Scientific Reports ◽

10.1038/s41598-021-90835-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Faezeh Akhavizadegan ◽

Javad Ansarifar ◽

Lizhi Wang ◽

Isaiah Huber ◽

Sotirios V. Archontoulis

Keyword(s):

Machine Learning ◽

Parameter Estimation ◽

Estimation Method ◽

Time Dependent ◽

Bayesian Optimization ◽

Parameter Estimates ◽

Crop Modeling ◽

Bayesian Optimization Algorithm ◽

Crop Models ◽

Dependent Parameter

AbstractThe performance of crop models in simulating various aspects of the cropping system is sensitive to parameter calibration. Parameter estimation is challenging, especially for time-dependent parameters such as cultivar parameters with 2–3 years of lifespan. Manual calibration of the parameters is time-consuming, requires expertise, and is prone to error. This research develops a new automated framework to estimate time-dependent parameters for crop models using a parallel Bayesian optimization algorithm. This approach integrates the power of optimization and machine learning with prior agronomic knowledge. To test the proposed time-dependent parameter estimation method, we simulated historical yield increase (from 1985 to 2018) in 25 environments in the US Corn Belt with APSIM. Then we compared yield simulation results and nine parameter estimates from our proposed parallel Bayesian framework, with Bayesian optimization and manual calibration. Results indicated that parameters calibrated using the proposed framework achieved an 11.6% reduction in the prediction error over Bayesian optimization and a 52.1% reduction over manual calibration. We also trained nine machine learning models for yield prediction and found that none of them was able to outperform the proposed method in terms of root mean square error and R2. The most significant contribution of the new automated framework for time-dependent parameter estimation is its capability to find close-to-optimal parameters for the crop model. The proposed approach also produced explainable insight into cultivar traits’ trends over 34 years (1985–2018).

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

Minerals ◽

10.3390/min11060587 ◽

2021 ◽

Vol 11 (6) ◽

pp. 587

Author(s):

Joao Pedro de Carvalho ◽

Roussos Dimitrakopoulos

Keyword(s):

Reinforcement Learning ◽

Discrete Event ◽

Mining Operations ◽

Fixed Sequence ◽

Q Learning ◽

Reward Function ◽

Copper Gold ◽

Mining Complex ◽

Learning Reinforcement ◽

Operational Plan

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Marginal quantile regression for longitudinal data analysis in the presence of time-dependent covariates

The International Journal of Biostatistics ◽

10.1515/ijb-2020-0010 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

I-Chen Chen ◽

Philip M. Westgate

Keyword(s):

Parameter Estimation ◽

Longitudinal Data ◽

Quantile Regression ◽

Mean Squared Error ◽

Time Dependent ◽

Regression Methods ◽

Squared Error ◽

Highly Correlated ◽

Time Dependent Covariates ◽

Independence Structure

AbstractWhen observations are correlated, modeling the within-subject correlation structure using quantile regression for longitudinal data can be difficult unless a working independence structure is utilized. Although this approach ensures consistent estimators of the regression coefficients, it may result in less efficient regression parameter estimation when data are highly correlated. Therefore, several marginal quantile regression methods have been proposed to improve parameter estimation. In a longitudinal study some of the covariates may change their values over time, and the topic of time-dependent covariate has not been explored in the marginal quantile literature. As a result, we propose an approach for marginal quantile regression in the presence of time-dependent covariates, which includes a strategy to select a working type of time-dependency. In this manuscript, we demonstrate that our proposed method has the potential to improve power relative to the independence estimating equations approach due to the reduction of mean squared error.

Download Full-text

Deep Inverse Reinforcement Learning for Reward Function Identification in Bidding Models

IEEE Transactions on Power Systems ◽

10.1109/tpwrs.2021.3076296 ◽

2021 ◽

pp. 1-1

Author(s):

Hongye Guo ◽

Qixin Chen ◽

Qing Xia ◽

Chongqing Kang

Keyword(s):

Reinforcement Learning ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Function Identification ◽

Bidding Models

Download Full-text

Reinforcement Learning Approaches in Social Robotics

Sensors ◽

10.3390/s21041292 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1292

Author(s):

Neziha Akalin ◽

Amy Loutfi

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Social Robotics ◽

Research Field ◽

Social Robots ◽

Learning Approaches ◽

Reward Function ◽

Optimal Behavior ◽

Learning Challenges ◽

Starting Point

This article surveys reinforcement learning approaches in social robotics. Reinforcement learning is a framework for decision-making problems in which an agent interacts through trial-and-error with its environment to discover an optimal behavior. Since interaction is a key component in both reinforcement learning and social robotics, it can be a well-suited approach for real-world interactions with physically embodied social robots. The scope of the paper is focused particularly on studies that include social physical robots and real-world human-robot interactions with users. We present a thorough analysis of reinforcement learning approaches in social robotics. In addition to a survey, we categorize existent reinforcement learning approaches based on the used method and the design of the reward mechanisms. Moreover, since communication capability is a prominent feature of social robots, we discuss and group the papers based on the communication medium used for reward formulation. Considering the importance of designing the reward function, we also provide a categorization of the papers based on the nature of the reward. This categorization includes three major themes: interactive reinforcement learning, intrinsically motivated methods, and task performance-driven methods. The benefits and challenges of reinforcement learning in social robotics, evaluation methods of the papers regarding whether or not they use subjective and algorithmic measures, a discussion in the view of real-world reinforcement learning challenges and proposed solutions, the points that remain to be explored, including the approaches that have thus far received less attention is also given in the paper. Thus, this paper aims to become a starting point for researchers interested in using and applying reinforcement learning methods in this particular research field.

Download Full-text

Travel Time-Dependent Maximum Entropy Inverse Reinforcement Learning for Seabird Trajectory Prediction

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR) ◽

10.1109/acpr.2017.20 ◽

2017 ◽

Cited By ~ 1

Author(s):

Tsubasa Hirakawa ◽

Takayoshi Yamashita ◽

Ken Yoda ◽

Toru Tamaki ◽

Hironobu Fujiyoshi

Keyword(s):

Reinforcement Learning ◽

Travel Time ◽

Maximum Entropy ◽

Time Dependent ◽

Trajectory Prediction ◽

Inverse Reinforcement Learning

Download Full-text

Finite-Time Stability of Switched Linear Time-Delay Systems Based on Time-Dependent Lyapunov Functions

IEEE Access ◽

10.1109/access.2020.2977419 ◽

2020 ◽

Vol 8 ◽

pp. 41551-41556 ◽

Cited By ~ 1

Author(s):

Tiantian Huang ◽

Yuangong Sun

Keyword(s):

Time Delay ◽

Finite Time ◽

Lyapunov Functions ◽

Linear Time ◽

Time Dependent ◽

Delay Systems ◽

Time Delay Systems ◽

Time Stability ◽

Finite Time Stability

Download Full-text