Sorting Objects from a Conveyor Belt Using POMDPs with Multiple-Object Observations and Information-Gain Rewards

We consider a robot that must sort objects transported by a conveyor belt into different classes. Multiple observations must be performed before taking a decision on the class of each object, because the imperfect sensing sometimes detects the incorrect object class. The objective is to sort the sequence of objects in a minimal number of observation and decision steps. We describe this task in the framework of partially observable Markov decision processes, and we propose a reward function that explicitly takes into account the information gain of the viewpoint selection actions applied. The DESPOT algorithm is applied to solve the problem, automatically obtaining a sequence of observation viewpoints and class decision actions. Observations are made either only for the object on the first position of the conveyor belt or for multiple adjacent positions at once. The performance of the single- and multiple-position variants is compared, and the impact of including the information gain is analyzed. Real-life experiments with a Baxter robot and an industrial conveyor belt are provided.

Download Full-text

A Markov-Based Multi-Attribute Vertical Handoff Decision Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.785-786.1403 ◽

2013 ◽

Vol 785-786 ◽

pp. 1403-1407

Author(s):

Qing Yang Song ◽

Xun Li ◽

Shu Yu Ding ◽

Zhao Long Ning

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Decision Problem ◽

Vertical Handoff ◽

Decision Algorithm ◽

Reward Function ◽

Total Reward ◽

Call Dropping ◽

Markov Decision ◽

The Impact

Many vertical handoff decision algorithms have not considered the impact of call dropping during the vertical handoff decision process. Besides, most of current multi-attribute vertical handoff algorithms cannot predict users’ specific circumstances dynamically. In this paper, we formulate the vertical handoff decision problem as a Markov decision process, with the objective of maximizing the expected total reward during the handoff procedure. A reward function is formulated to assess the service quality during each connection. The G1 and entropy methods are applied in an iterative way, by which we can work out a stationary deterministic policy. Numerical results demonstrate the superiority of our proposed algorithm compared with the existing methods.

Download Full-text

LSTM-DDPG for Trading with Variable Positions

Sensors ◽

10.3390/s21196571 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6571

Author(s):

Zhichao Jia ◽

Qiang Gao ◽

Xiaohong Peng

Keyword(s):

Short Term Memory ◽

Index Futures ◽

Reward Function ◽

Variable Position ◽

Policy Gradient ◽

Trading Decisions ◽

Markov Decision ◽

Market State ◽

Lstm Network ◽

Partially Observable

In recent years, machine learning for trading has been widely studied. The direction and size of position should be determined in trading decisions based on market conditions. However, there is no research so far that considers variable position sizes in models developed for trading purposes. In this paper, we propose a deep reinforcement learning model named LSTM-DDPG to make trading decisions with variable positions. Specifically, we consider the trading process as a Partially Observable Markov Decision Process, in which the long short-term memory (LSTM) network is used to extract market state features and the deep deterministic policy gradient (DDPG) framework is used to make trading decisions concerning the direction and variable size of position. We test the LSTM-DDPG model on IF300 (index futures of China stock market) data and the results show that LSTM-DDPG with variable positions performs better in terms of return and risk than models with fixed or few-level positions. In addition, the investment potential of the model can be better tapped by the reward function of the differential Sharpe ratio than that of profit reward function.

Download Full-text

BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488513500396 ◽

2013 ◽

Vol 21 (06) ◽

pp. 821-863 ◽

Cited By ~ 2

Author(s):

YAODONG NI ◽

ZHI-QIANG LIU

Keyword(s):

Markov Decision Processes ◽

Real Life ◽

Decision Processes ◽

Iteration Algorithm ◽

Value Iteration ◽

Planning Under Uncertainty ◽

Markov Decision ◽

Real Life Situation ◽

Partially Observable Markov ◽

Partially Observable

Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.

Download Full-text

The Impact of Angiotensin-II Receptor Blockers on Cardiovascular Events in Hypertensive Patients – Evidence from Real-life Databases

European Cardiology Review ◽

10.15420/ecr.2010.6.3.33 ◽

2010 ◽

Vol 6 (3) ◽

pp. 33

Author(s):

Robert J Petrella ◽

Keyword(s):

Angiotensin Ii ◽

Antihypertensive Drugs ◽

Real Life ◽

Angiotensin Ii Receptor Blockers ◽

Angiotensin Ii Receptor ◽

Real World Setting ◽

Actual Experience ◽

Receptor Blockers ◽

Randomised Controlled ◽

The Impact

It is widely recognised that hypertension is a major risk factor for the development of future cardiovascular (CV) events, which in turn are a major cause of morbidity and mortality. Blood pressure (BP) control with antihypertensive drugs has been shown to reduce the risk of CV events. Angiotensin-II receptor blockers (ARBs) are one such class of antihypertensive drugs and randomised controlled trials (RCTs) have shown ARB-based therapies to have effective BP-lowering properties. However, data obtained under these tightly controlled settings do not necessarily reflect actual experience in clinical practice. Real-life databases may offer alternative information that reflects an uncontrolled real-world setting and complements and expands on the findings of clinical trials. Recent analyses of practice-based real-life databases have shown ARB-based therapies to be associated with better persistence and adherence rates and with superior BP control than non-ARB-based therapies. Analyses of real-life databases also suggest that ARB-based therapies may be associated with a lower risk of CV events than other antihypertensive-drug-based therapies.

Download Full-text

Psychology: Volume 1

10.1093/oso/9780199498840.001.0001 ◽

2019 ◽

Keyword(s):

Positive Psychology ◽

Human Life ◽

Empirical Studies ◽

Real Life ◽

Human Beings ◽

Cognitive Capability ◽

Affective Processes ◽

Abstract Level ◽

The Impact ◽

Cognition And Affect

This survey of research on psychology in five volumes is a part of a series undertaken by the ICSSR since 1969, which covers various disciplines under social science. Volume One of this survey, Cognitive and Affective Processes, discusses the developments in the study of cognitive and affective processes within the Indian context. It offers an up-to-date assessment of theoretical developments and empirical studies in the rapidly evolving fields of cognitive science, applied cognition, and positive psychology. It also analyses how pedagogy responds to a shift in the practices of knowing and learning. Additionally, drawing upon insights from related fields it proposes epithymetics–desire studies – as an upcoming field of research and the volume investigates the impact of evolving cognitive and affective processes in Indian research and real life contexts. The development of cognitive capability distinguishes human beings from other species and allows creation and use of complex verbal symbols, facilitates imagination and empowers to function at an abstract level. However, much of the vitality characterizing human life is owed to the diverse emotions and desires. This has made the study of cognition and affect as frontier areas of psychology. With this in view, this volume focuses on delineating cognitive scientific contributions, cognition in educational context, context, diverse applications of cognition, psychology of desire, and positive psychology. The five chapters comprising this volume have approached the scholarly developments in the fields of cognition and affect in innovative ways, and have addressed basic as well applied issues.

Download Full-text

SAT0501 EARLY START OF BIOLOGICAL TREATMENT IN JUVENILE IDIOPHATIC ARTHRITIS: DOES A THERAPEUTIC WINDOW EXIST IN REAL LIFE?

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.4611 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1207.2-1207

Author(s):

A. García Fernández ◽

A. Briones-Figueroa ◽

L. Calvo Sanz ◽

Á. Andreu-Suárez ◽

J. Bachiller-Corral ◽

...

Keyword(s):

Disease Activity ◽

Biological Treatment ◽

Real Life ◽

Therapeutic Window ◽

Systemic Jia ◽

Active Arthritis ◽

The Impact ◽

Active Patients ◽

Active Disease

Background:Biological therapy (BT) has changed the treatment and perspectives of JIA patients but little is known about when is the best moment to start BT and the impact of this prompt iniciation.Objectives:To analyze the response to BT of Juvenile Idiophatic Arthritis (JIA) patients according to the time when the BT was started.Methods:A retrospective, descriptive study was conducted on JIA patients followed up in a referal hospital that started BT up to 24 months after diagnosis from 2000 to 2018. Disease activity was measured, at 2 years after diagnosis, according to Wallace criteria for remission (absence of: active arthritis, active uveitis, fever, rash or any other manifestation attributable to JIA, normal CRP and ESR, PGA indicating no active disease) for at least 6 months.Results:55 JIA patients that started BT up to 24 months from diagnosis were analyzed. 69,1% were girls with a median age at diagnosis of 8 years old IQR(3-13), median age at the start of BT of 9 years old IQR(3-13). Regarding JIA categories: 25,5% were Oligoarticular Persistent (OligP), 18,2% Systemic JIA (sJIA), 16,4% Entesitis related Arthritis (ERA), 12,7% Psoriatic Arthritis (APso) and Polyarticular RF- (PolyRF-), 5,5% Oligoarticular Extended (OligE) and Polyarticular RF+ (PolyRF+), 3,6% Undifferentiated (Und). 20% of patients had uveitis during followup. Conventional DMARD (cDMARD) was indicated in 83,6% of patients (95,7% Methotrexate) at diagnosis [median 0 months IQR(0-2,3)]. At the end of followup (2 years) only 30,9% of patients continued with cDMARDs. The main causes of discontinuation were: adverse events (46,7%), remission (36,7%). TNF inhibitors were precribed in 81,8% of patients and 18,2% of patients recieved two BT during the first 2 years from diagnosis. 54,5% of BT were indicated during the first 6 months from diagnosis, 27,3% from 7 to 12 months, 12,7% from 13 to 18 months, 5,5% from 19 to 24 months.After 2 years from diagnosis, 78,2% of patients were on remission and 21,8% active. Among patients with active disease: 75% had arthritis, 16,7% had uveitis and 8,3% had both. There were no differences regarding disease activity among patients with uveitis and neither taking cDMARDs. Regarding JIA categories: 66,7% of OligE, 57,1% of PolyRF- and 57,1% of APso patients were active at 2 years from diagnosis when compared to the other categories (p=0.004).Patients on remission at 24 months from diagnosis started sooner the BT than active patients [CI 95% (0,46-8,29) p=0,029]. The time when the BT was started was correlated to the activity at 2 years (K= 0,294 p=0,029). When the BT was prescribed after 7,5months from diagnosis it was correlated, in a COR curve, with a higher probability of active disease at 2 years (S= 0,67 E= 0,63). There was a correlation, among patients on remission at 2 years, between prompt start of BT and less time to reach remission (K= -0,345 p=0,024). Patients with active disease at 2 years, regardless of moment of BT iniciation, required more BT during follow-up (p=0,002).Conclusion:Prompt iniciation of BT was correlated with a better outcome. JIA patients that started BT early after diagnosis had a higher probability of remission after 2 years. Starting BT after 7,5 months was correlated with a higher probability of active disease at 2 years. Active disease at 24 months was correlated with persistent active disease during follow-up.Disclosure of Interests:None declared

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Formulating Modes of Cooperative Leaning for Education for Sustainable Development

Sustainability ◽

10.3390/su13063465 ◽

2021 ◽

Vol 13 (6) ◽

pp. 3465

Author(s):

Jordi Colomer ◽

Dolors Cañabate ◽

Brigita Stanikūnienė ◽

Remigijus Bubnys

Keyword(s):

Sustainable Development ◽

Real Life ◽

Education For Sustainable Development ◽

Professional Life ◽

Global Challenges ◽

Life Problems ◽

The Face ◽

Contemporary Education ◽

The Impact

In the face of today’s global challenges, the practice and theory of contemporary education inevitably focuses on developing the competences that help individuals to find meaningfulness in their societal and professional life, to understand the impact of local actions on global processes and to enable them to solve real-life problems [...]

Download Full-text

Optimal adaptive inspection and maintenance for redundant systems

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x211020151 ◽

2021 ◽

pp. 1748006X2110201

Author(s):

Chaochao Lin ◽

Matteo Pozzi

Keyword(s):

Engineering Systems ◽

Discounted Cost ◽

Markov Decision ◽

Inspection And Maintenance ◽

And Performance ◽

Partially Observable ◽

Series Systems ◽

Selection Of ◽

Redundant Systems

Optimal exploration of engineering systems can be guided by the principle of Value of Information (VoI), which accounts for the topological important of components, their reliability and the management costs. For series systems, in most cases higher inspection priority should be given to unreliable components. For redundant systems such as parallel systems, analysis of one-shot decision problems shows that higher inspection priority should be given to more reliable components. This paper investigates the optimal exploration of redundant systems in long-term decision making with sequential inspection and repairing. When the expected, cumulated, discounted cost is considered, it may become more efficient to give higher inspection priority to less reliable components, in order to preserve system redundancy. To investigate this problem, we develop a Partially Observable Markov Decision Process (POMDP) framework for sequential inspection and maintenance of redundant systems, where the VoI analysis is embedded in the optimal selection of exploratory actions. We investigate the use of alternative approximate POMDP solvers for parallel and more general systems, compare their computation complexities and performance, and show how the inspection priorities depend on the economic discount factor, the degradation rate, the inspection precision, and the repair cost.

Download Full-text

Changes in the Soundscape of the Public Space Close to a Highway by a Noise Control Intervention

Sustainability ◽

10.3390/su13095284 ◽

2021 ◽

Vol 13 (9) ◽

pp. 5284

Author(s):

Timothy Van Renterghem ◽

Francesco Aletta ◽

Dick Botteldooren

Keyword(s):

Public Space ◽

Real Life ◽

Noise Abatement ◽

Acoustic Environment ◽

The Public ◽

Control Intervention ◽

Before And After ◽

Noise Shielding ◽

Visual Preferences ◽

The Impact

The deployment of measures to mitigate sound during propagation outdoors is most often a compromise between the acoustic design, practical limitations, and visual preferences regarding the landscape. The current study of a raised berm next to a highway shows a number of common issues like the impact of the limited length of the noise shielding device, initially non-dominant sounds becoming noticeable, local drops in efficiency when the barrier is not fully continuous, and overall limited abatement efficiencies. Detailed assessments of both the objective and subjective effect of the intervention, both before and after the intervention was deployed, using the same methodology, showed that especially the more noise sensitive persons benefit from the noise abatement. Reducing the highest exposure levels did not result anymore in a different perception compared to more noise insensitive persons. People do react to spatial variation in exposure and abatement efficiency. Although level reductions might not be excessive in many real-life complex multi-source situations, they do improve the perception of the acoustic environment in the public space.

Download Full-text