scholarly journals Sorting Objects from a Conveyor Belt Using POMDPs with Multiple-Object Observations and Information-Gain Rewards

Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2481
Author(s):  
Ady-Daniel Mezei ◽  
Levente Tamás ◽  
Lucian Buşoniu

We consider a robot that must sort objects transported by a conveyor belt into different classes. Multiple observations must be performed before taking a decision on the class of each object, because the imperfect sensing sometimes detects the incorrect object class. The objective is to sort the sequence of objects in a minimal number of observation and decision steps. We describe this task in the framework of partially observable Markov decision processes, and we propose a reward function that explicitly takes into account the information gain of the viewpoint selection actions applied. The DESPOT algorithm is applied to solve the problem, automatically obtaining a sequence of observation viewpoints and class decision actions. Observations are made either only for the object on the first position of the conveyor belt or for multiple adjacent positions at once. The performance of the single- and multiple-position variants is compared, and the impact of including the information gain is analyzed. Real-life experiments with a Baxter robot and an industrial conveyor belt are provided.

2013 ◽  
Vol 785-786 ◽  
pp. 1403-1407
Author(s):  
Qing Yang Song ◽  
Xun Li ◽  
Shu Yu Ding ◽  
Zhao Long Ning

Many vertical handoff decision algorithms have not considered the impact of call dropping during the vertical handoff decision process. Besides, most of current multi-attribute vertical handoff algorithms cannot predict users’ specific circumstances dynamically. In this paper, we formulate the vertical handoff decision problem as a Markov decision process, with the objective of maximizing the expected total reward during the handoff procedure. A reward function is formulated to assess the service quality during each connection. The G1 and entropy methods are applied in an iterative way, by which we can work out a stationary deterministic policy. Numerical results demonstrate the superiority of our proposed algorithm compared with the existing methods.


Sensors ◽  
2021 ◽  
Vol 21 (19) ◽  
pp. 6571
Author(s):  
Zhichao Jia ◽  
Qiang Gao ◽  
Xiaohong Peng

In recent years, machine learning for trading has been widely studied. The direction and size of position should be determined in trading decisions based on market conditions. However, there is no research so far that considers variable position sizes in models developed for trading purposes. In this paper, we propose a deep reinforcement learning model named LSTM-DDPG to make trading decisions with variable positions. Specifically, we consider the trading process as a Partially Observable Markov Decision Process, in which the long short-term memory (LSTM) network is used to extract market state features and the deep deterministic policy gradient (DDPG) framework is used to make trading decisions concerning the direction and variable size of position. We test the LSTM-DDPG model on IF300 (index futures of China stock market) data and the results show that LSTM-DDPG with variable positions performs better in terms of return and risk than models with fixed or few-level positions. In addition, the investment potential of the model can be better tapped by the reward function of the differential Sharpe ratio than that of profit reward function.


Author(s):  
YAODONG NI ◽  
ZHI-QIANG LIU

Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.


2010 ◽  
Vol 6 (3) ◽  
pp. 33
Author(s):  
Robert J Petrella ◽  

It is widely recognised that hypertension is a major risk factor for the development of future cardiovascular (CV) events, which in turn are a major cause of morbidity and mortality. Blood pressure (BP) control with antihypertensive drugs has been shown to reduce the risk of CV events. Angiotensin-II receptor blockers (ARBs) are one such class of antihypertensive drugs and randomised controlled trials (RCTs) have shown ARB-based therapies to have effective BP-lowering properties. However, data obtained under these tightly controlled settings do not necessarily reflect actual experience in clinical practice. Real-life databases may offer alternative information that reflects an uncontrolled real-world setting and complements and expands on the findings of clinical trials. Recent analyses of practice-based real-life databases have shown ARB-based therapies to be associated with better persistence and adherence rates and with superior BP control than non-ARB-based therapies. Analyses of real-life databases also suggest that ARB-based therapies may be associated with a lower risk of CV events than other antihypertensive-drug-based therapies.


This survey of research on psychology in five volumes is a part of a series undertaken by the ICSSR since 1969, which covers various disciplines under social science. Volume One of this survey, Cognitive and Affective Processes, discusses the developments in the study of cognitive and affective processes within the Indian context. It offers an up-to-date assessment of theoretical developments and empirical studies in the rapidly evolving fields of cognitive science, applied cognition, and positive psychology. It also analyses how pedagogy responds to a shift in the practices of knowing and learning. Additionally, drawing upon insights from related fields it proposes epithymetics–desire studies – as an upcoming field of research and the volume investigates the impact of evolving cognitive and affective processes in Indian research and real life contexts. The development of cognitive capability distinguishes human beings from other species and allows creation and use of complex verbal symbols, facilitates imagination and empowers to function at an abstract level. However, much of the vitality characterizing human life is owed to the diverse emotions and desires. This has made the study of cognition and affect as frontier areas of psychology. With this in view, this volume focuses on delineating cognitive scientific contributions, cognition in educational context, context, diverse applications of cognition, psychology of desire, and positive psychology. The five chapters comprising this volume have approached the scholarly developments in the fields of cognition and affect in innovative ways, and have addressed basic as well applied issues.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 1207.2-1207
Author(s):  
A. García Fernández ◽  
A. Briones-Figueroa ◽  
L. Calvo Sanz ◽  
Á. Andreu-Suárez ◽  
J. Bachiller-Corral ◽  
...  

Background:Biological therapy (BT) has changed the treatment and perspectives of JIA patients but little is known about when is the best moment to start BT and the impact of this prompt iniciation.Objectives:To analyze the response to BT of Juvenile Idiophatic Arthritis (JIA) patients according to the time when the BT was started.Methods:A retrospective, descriptive study was conducted on JIA patients followed up in a referal hospital that started BT up to 24 months after diagnosis from 2000 to 2018. Disease activity was measured, at 2 years after diagnosis, according to Wallace criteria for remission (absence of: active arthritis, active uveitis, fever, rash or any other manifestation attributable to JIA, normal CRP and ESR, PGA indicating no active disease) for at least 6 months.Results:55 JIA patients that started BT up to 24 months from diagnosis were analyzed. 69,1% were girls with a median age at diagnosis of 8 years old IQR(3-13), median age at the start of BT of 9 years old IQR(3-13). Regarding JIA categories: 25,5% were Oligoarticular Persistent (OligP), 18,2% Systemic JIA (sJIA), 16,4% Entesitis related Arthritis (ERA), 12,7% Psoriatic Arthritis (APso) and Polyarticular RF- (PolyRF-), 5,5% Oligoarticular Extended (OligE) and Polyarticular RF+ (PolyRF+), 3,6% Undifferentiated (Und). 20% of patients had uveitis during followup. Conventional DMARD (cDMARD) was indicated in 83,6% of patients (95,7% Methotrexate) at diagnosis [median 0 months IQR(0-2,3)]. At the end of followup (2 years) only 30,9% of patients continued with cDMARDs. The main causes of discontinuation were: adverse events (46,7%), remission (36,7%). TNF inhibitors were precribed in 81,8% of patients and 18,2% of patients recieved two BT during the first 2 years from diagnosis. 54,5% of BT were indicated during the first 6 months from diagnosis, 27,3% from 7 to 12 months, 12,7% from 13 to 18 months, 5,5% from 19 to 24 months.After 2 years from diagnosis, 78,2% of patients were on remission and 21,8% active. Among patients with active disease: 75% had arthritis, 16,7% had uveitis and 8,3% had both. There were no differences regarding disease activity among patients with uveitis and neither taking cDMARDs. Regarding JIA categories: 66,7% of OligE, 57,1% of PolyRF- and 57,1% of APso patients were active at 2 years from diagnosis when compared to the other categories (p=0.004).Patients on remission at 24 months from diagnosis started sooner the BT than active patients [CI 95% (0,46-8,29) p=0,029]. The time when the BT was started was correlated to the activity at 2 years (K= 0,294 p=0,029). When the BT was prescribed after 7,5months from diagnosis it was correlated, in a COR curve, with a higher probability of active disease at 2 years (S= 0,67 E= 0,63). There was a correlation, among patients on remission at 2 years, between prompt start of BT and less time to reach remission (K= -0,345 p=0,024). Patients with active disease at 2 years, regardless of moment of BT iniciation, required more BT during follow-up (p=0,002).Conclusion:Prompt iniciation of BT was correlated with a better outcome. JIA patients that started BT early after diagnosis had a higher probability of remission after 2 years. Starting BT after 7,5 months was correlated with a higher probability of active disease at 2 years. Active disease at 24 months was correlated with persistent active disease during follow-up.Disclosure of Interests:None declared


2021 ◽  
Author(s):  
Stav Belogolovsky ◽  
Philip Korsunsky ◽  
Shie Mannor ◽  
Chen Tessler ◽  
Tom Zahavy

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.


2021 ◽  
Vol 13 (6) ◽  
pp. 3465
Author(s):  
Jordi Colomer ◽  
Dolors Cañabate ◽  
Brigita Stanikūnienė ◽  
Remigijus Bubnys

In the face of today’s global challenges, the practice and theory of contemporary education inevitably focuses on developing the competences that help individuals to find meaningfulness in their societal and professional life, to understand the impact of local actions on global processes and to enable them to solve real-life problems [...]


Author(s):  
Chaochao Lin ◽  
Matteo Pozzi

Optimal exploration of engineering systems can be guided by the principle of Value of Information (VoI), which accounts for the topological important of components, their reliability and the management costs. For series systems, in most cases higher inspection priority should be given to unreliable components. For redundant systems such as parallel systems, analysis of one-shot decision problems shows that higher inspection priority should be given to more reliable components. This paper investigates the optimal exploration of redundant systems in long-term decision making with sequential inspection and repairing. When the expected, cumulated, discounted cost is considered, it may become more efficient to give higher inspection priority to less reliable components, in order to preserve system redundancy. To investigate this problem, we develop a Partially Observable Markov Decision Process (POMDP) framework for sequential inspection and maintenance of redundant systems, where the VoI analysis is embedded in the optimal selection of exploratory actions. We investigate the use of alternative approximate POMDP solvers for parallel and more general systems, compare their computation complexities and performance, and show how the inspection priorities depend on the economic discount factor, the degradation rate, the inspection precision, and the repair cost.


2021 ◽  
Vol 13 (9) ◽  
pp. 5284
Author(s):  
Timothy Van Renterghem ◽  
Francesco Aletta ◽  
Dick Botteldooren

The deployment of measures to mitigate sound during propagation outdoors is most often a compromise between the acoustic design, practical limitations, and visual preferences regarding the landscape. The current study of a raised berm next to a highway shows a number of common issues like the impact of the limited length of the noise shielding device, initially non-dominant sounds becoming noticeable, local drops in efficiency when the barrier is not fully continuous, and overall limited abatement efficiencies. Detailed assessments of both the objective and subjective effect of the intervention, both before and after the intervention was deployed, using the same methodology, showed that especially the more noise sensitive persons benefit from the noise abatement. Reducing the highest exposure levels did not result anymore in a different perception compared to more noise insensitive persons. People do react to spatial variation in exposure and abatement efficiency. Although level reductions might not be excessive in many real-life complex multi-source situations, they do improve the perception of the acoustic environment in the public space.


Sign in / Sign up

Export Citation Format

Share Document