BAR — A Reinforcement Learning Agent for Bounding-Box Automated Refinement

Morgane Ayle; Jimmy Tekli; Julia El-Zini; Boulos El-Asmar; Mariette Awad

doi:10.1609/aaai.v34i03.5639

BAR — A Reinforcement Learning Agent for Bounding-Box Automated Refinement

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5639 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2561-2568

Author(s):

Morgane Ayle ◽

Jimmy Tekli ◽

Julia El-Zini ◽

Boulos El-Asmar ◽

Mariette Awad

Keyword(s):

Reinforcement Learning ◽

Industrial Sector ◽

Detection Methods ◽

Learning Approaches ◽

Human Intervention ◽

Car Industry ◽

Bounding Box ◽

Learning Agent ◽

Industry Standards ◽

Bounding Boxes

Research has shown that deep neural networks are able to help and assist human workers throughout the industrial sector via different computer vision applications. However, such data-driven learning approaches require a very large number of labeled training images in order to generalize well and achieve high accuracies that meet industry standards. Gathering and labeling large amounts of images is both expensive and time consuming, specifically for industrial use-cases. In this work, we introduce BAR (Bounding-box Automated Refinement), a reinforcement learning agent that learns to correct inaccurate bounding-boxes that are weakly generated by certain detection methods, or wrongly annotated by a human, using either an offline training method with Deep Reinforcement Learning (BAR-DRL), or an online one using Contextual Bandits (BAR-CB). Our agent limits the human intervention to correcting or verifying a subset of bounding-boxes instead of re-drawing new ones. Results on a car industry-related dataset and on the PASCAL VOC dataset show a consistent increase of up to 0.28 in the Intersection-over-Union of bounding-boxes with their desired ground-truths, while saving 30%-82% of human intervention time in either correcting or re-drawing inaccurate proposals.

Download Full-text

YOLO-Tomato: A Robust Algorithm for Tomato Detection Based on YOLOv3

Sensors ◽

10.3390/s20072145 ◽

2020 ◽

Vol 20 (7) ◽

pp. 2145 ◽

Cited By ~ 11

Author(s):

Guoxu Liu ◽

Joseph Christian Nouaze ◽

Philippe Lyonel Touko Mbouembe ◽

Jae Ho Kim

Keyword(s):

State Of The Art ◽

Detection Methods ◽

Illumination Variation ◽

Accurate Model ◽

Detection Model ◽

Bounding Box ◽

Important Benefit ◽

Ablation Study ◽

Bounding Boxes ◽

Tomato Detection

Automatic fruit detection is a very important benefit of harvesting robots. However, complicated environment conditions, such as illumination variation, branch, and leaf occlusion as well as tomato overlap, have made fruit detection very challenging. In this study, an improved tomato detection model called YOLO-Tomato is proposed for dealing with these problems, based on YOLOv3. A dense architecture is incorporated into YOLOv3 to facilitate the reuse of features and help to learn a more compact and accurate model. Moreover, the model replaces the traditional rectangular bounding box (R-Bbox) with a circular bounding box (C-Bbox) for tomato localization. The new bounding boxes can then match the tomatoes more precisely, and thus improve the Intersection-over-Union (IoU) calculation for the Non-Maximum Suppression (NMS). They also reduce prediction coordinates. An ablation study demonstrated the efficacy of these modifications. The YOLO-Tomato was compared to several state-of-the-art detection methods and it had the best detection performance.

Download Full-text

Hybrid Reinforcement Learning with Expert State Sequences

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013739 ◽

2019 ◽

Vol 33 ◽

pp. 3739-3746 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Guo ◽

Shiyu Chang ◽

Mo Yu ◽

Gerald Tesauro ◽

Murray Campbell

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Deep Neural Networks ◽

Hybrid Approach ◽

Imitation Learning ◽

Learning Approaches ◽

Inference Model ◽

Learning Agent ◽

Hybrid Reinforcement ◽

Policy Optimization

Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.

Download Full-text

Advancing Tassel Detection and Counting: Annotation and Algorithms

Remote Sensing ◽

10.3390/rs13152881 ◽

2021 ◽

Vol 13 (15) ◽

pp. 2881

Author(s):

Azam Karami ◽

Karoll Quijano ◽

Melba Crawford

Keyword(s):

Unmanned Aerial Vehicles ◽

State Of The Art ◽

Early Stage ◽

High Accuracy ◽

Yield Prediction ◽

Learning Approaches ◽

Bounding Box ◽

Detection And Counting ◽

Rgb Images ◽

Bounding Boxes

Tassel counts provide valuable information related to flowering and yield prediction in maize, but are expensive and time-consuming to acquire via traditional manual approaches. High-resolution RGB imagery acquired by unmanned aerial vehicles (UAVs), coupled with advanced machine learning approaches, including deep learning (DL), provides a new capability for monitoring flowering. In this article, three state-of-the-art DL techniques, CenterNet based on point annotation, task-aware spatial disentanglement (TSD), and detecting objects with recursive feature pyramids and switchable atrous convolution (DetectoRS) based on bounding box annotation, are modified to improve their performance for this application and evaluated for tassel detection relative to Tasselnetv2+. The dataset for the experiments is comprised of RGB images of maize tassels from plant breeding experiments, which vary in size, complexity, and overlap. Results show that the point annotations are more accurate and simpler to acquire than the bounding boxes, and bounding box-based approaches are more sensitive to the size of the bounding boxes and background than point-based approaches. Overall, CenterNet has high accuracy in comparison to the other techniques, but DetectoRS can better detect early-stage tassels. The results for these experiments were more robust than Tasselnetv2+, which is sensitive to the number of tassels in the image.

Download Full-text

A formal methods approach to interpretable reinforcement learning for robotic planning

Science Robotics ◽

10.1126/scirobotics.aay6276 ◽

2019 ◽

Vol 4 (37) ◽

pp. eaay6276 ◽

Cited By ~ 6

Author(s):

Xiao Li ◽

Zachary Serlin ◽

Guang Yang ◽

Calin Belta

Keyword(s):

Reinforcement Learning ◽

Formal Methods ◽

Learning Algorithm ◽

A Priori ◽

Generation Process ◽

Learning Approaches ◽

Learning Agent ◽

Domain Specific Knowledge ◽

Robotic Planning ◽

High Level

Growing interest in reinforcement learning approaches to robotic planning and control raises concerns of predictability and safety of robot behaviors realized solely through learned control policies. In addition, formally defining reward functions for complex tasks is challenging, and faulty rewards are prone to exploitation by the learning agent. Here, we propose a formal methods approach to reinforcement learning that (i) provides a formal specification language that integrates high-level, rich, task specifications with a priori, domain-specific knowledge; (ii) makes the reward generation process easily interpretable; (iii) guides the policy generation process according to the specification; and (iv) guarantees the satisfaction of the (critical) safety component of the specification. The main ingredients of our computational framework are a predicate temporal logic specifically tailored for robotic tasks and an automaton-guided, safe reinforcement learning algorithm based on control barrier functions. Although the proposed framework is quite general, we motivate it and illustrate it experimentally for a robotic cooking task, in which two manipulators worked together to make hot dogs.

Download Full-text

Deep Reinforcement Learning for Stock Recommendation

Journal of Physics Conference Series ◽

10.1088/1742-6596/2050/1/012012 ◽

2021 ◽

Vol 2050 (1) ◽

pp. 012012

Author(s):

Yifei Shen ◽

Tian Liu ◽

Wenke Liu ◽

Ruiqing Xu ◽

Zhuo Li ◽

...

Keyword(s):

Reinforcement Learning ◽

Industrial Sector ◽

Minimum Variance ◽

Portfolio Allocation ◽

Selection Strategy ◽

Stock Selection ◽

Industrial Sectors ◽

Learning Agent ◽

Stock Portfolio ◽

Relation Graph

Abstract Recommending stocks is very important for investment companies and investors. However, without enough analysts, no stock selection strategy can capture the dynamics of all S&P 500 stocks. Nevertheless, most existing recommending strategies are based on predictive models to buy and hold stocks with high return potential. But these strategies fail to recommend stocks from different industrial sectors to reduce risks. In this article, we propose a novel solution that recommends a stock portfolio with reinforcement learning from the S&P 500 index. Our basic idea is to construct a stock relation graph (RG) which provide rich relations among stocks and industrial sectors, to generate diversified recommendation result. To this end, we design a new method to explore high-quality stocks from the constructed relation graph with reinforcement learning. Specifically, the reinforcement learning agent jumps from each industrial sector to select stock based on the feedback signals from the market. Finally, we apply portfolio allocation methods (i.e., mean-variance and minimum-variance) to test the validity of the recommendation. The empirical results show that the performance of portfolio allocation based on the selected stocks is better than the long-term strategy on the S&P 500 Index in terms of cumulative returns.

Download Full-text

An Evaluation Methodology for Interactive Reinforcement Learning with Simulated Users

Biomimetics ◽

10.3390/biomimetics6010013 ◽

2021 ◽

Vol 6 (1) ◽

pp. 13

Author(s):

Adam Bignold ◽

Francisco Cruz ◽

Richard Dazeley ◽

Peter Vamplew ◽

Cameron Foale

Keyword(s):

Reinforcement Learning ◽

Information Source ◽

Human Interaction ◽

Evaluation Methodology ◽

External Information ◽

Preliminary Evaluation ◽

Learning Agents ◽

Learning Agent ◽

Knowledge Bias ◽

The Impact

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.

Download Full-text

Location- and Person-Independent Activity Recognition with WiFi, Deep Neural Networks, and Reinforcement Learning

ACM Transactions on Internet of Things ◽

10.1145/3424739 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-25

Author(s):

Yongsen Ma ◽

Sheheryar Arshad ◽

Swetha Muniraju ◽

Eric Torkildson ◽

Enrico Rantala ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reinforcement Learning ◽

Activity Recognition ◽

Deep Neural Networks ◽

State Machine ◽

Recognition Algorithm ◽

The State ◽

Neural Architecture ◽

Learning Agent

In recent years, Channel State Information (CSI) measured by WiFi is widely used for human activity recognition. In this article, we propose a deep learning design for location- and person-independent activity recognition with WiFi. The proposed design consists of three Deep Neural Networks (DNNs): a 2D Convolutional Neural Network (CNN) as the recognition algorithm, a 1D CNN as the state machine, and a reinforcement learning agent for neural architecture search. The recognition algorithm learns location- and person-independent features from different perspectives of CSI data. The state machine learns temporal dependency information from history classification results. The reinforcement learning agent optimizes the neural architecture of the recognition algorithm using a Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). The proposed design is evaluated in a lab environment with different WiFi device locations, antenna orientations, sitting/standing/walking locations/orientations, and multiple persons. The proposed design has 97% average accuracy when testing devices and persons are not seen during training. The proposed design is also evaluated by two public datasets with accuracy of 80% and 83%. The proposed design needs very little human efforts for ground truth labeling, feature engineering, signal processing, and tuning of learning parameters and hyperparameters.

Download Full-text

A generalised approach for high-throughput instance segmentation of stomata in microscope images

Plant Methods ◽

10.1186/s13007-021-00727-4 ◽

2021 ◽

Vol 17 (1) ◽

Author(s):

Hiranya Jayakody ◽

Paul Petrie ◽

Hugo Jan de Boer ◽

Mark Whitty

Keyword(s):

High Throughput ◽

Imaging Techniques ◽

Detection Algorithm ◽

Input Image ◽

Detection Methods ◽

Sample Collection ◽

High Throughput Analysis ◽

General Applicability ◽

Additional Image ◽

Bounding Boxes

Abstract Background Stomata analysis using microscope imagery provides important insight into plant physiology, health and the surrounding environmental conditions. Plant scientists are now able to conduct automated high-throughput analysis of stomata in microscope data, however, existing detection methods are sensitive to the appearance of stomata in the training images, thereby limiting general applicability. In addition, existing methods only generate bounding-boxes around detected stomata, which require users to implement additional image processing steps to study stomata morphology. In this paper, we develop a fully automated, robust stomata detection algorithm which can also identify individual stomata boundaries regardless of the plant species, sample collection method, imaging technique and magnification level. Results The proposed solution consists of three stages. First, the input image is pre-processed to remove any colour space biases occurring from different sample collection and imaging techniques. Then, a Mask R-CNN is applied to estimate individual stomata boundaries. The feature pyramid network embedded in the Mask R-CNN is utilised to identify stomata at different scales. Finally, a statistical filter is implemented at the Mask R-CNN output to reduce the number of false positive generated by the network. The algorithm was tested using 16 datasets from 12 sources, containing over 60,000 stomata. For the first time in this domain, the proposed solution was tested against 7 microscope datasets never seen by the algorithm to show the generalisability of the solution. Results indicated that the proposed approach can detect stomata with a precision, recall, and F-score of 95.10%, 83.34%, and 88.61%, respectively. A separate test conducted by comparing estimated stomata boundary values with manually measured data showed that the proposed method has an IoU score of 0.70; a 7% improvement over the bounding-box approach. Conclusions The proposed method shows robust performance across multiple microscope image datasets of different quality and scale. This generalised stomata detection algorithm allows plant scientists to conduct stomata analysis whilst eliminating the need to re-label and re-train for each new dataset. The open-source code shared with this project can be directly deployed in Google Colab or any other Tensorflow environment.

Download Full-text

Crowd Evacuation Guidance Based on Combined Action Reinforcement Learning

Algorithms ◽

10.3390/a14010026 ◽

2021 ◽

Vol 14 (1) ◽

pp. 26

Author(s):

Yiran Xue ◽

Rui Wu ◽

Jiafeng Liu ◽

Xianglong Tang

Keyword(s):

Reinforcement Learning ◽

Guidance System ◽

Force Model ◽

Interactive Simulation ◽

Social Force ◽

Novel Approach ◽

Learning Agent ◽

Network Output ◽

Combined Action ◽

Crowd Evacuation

Existing crowd evacuation guidance systems require the manual design of models and input parameters, incurring a significant workload and a potential for errors. This paper proposed an end-to-end intelligent evacuation guidance method based on deep reinforcement learning, and designed an interactive simulation environment based on the social force model. The agent could automatically learn a scene model and path planning strategy with only scene images as input, and directly output dynamic signage information. Aiming to solve the “dimension disaster” phenomenon of the deep Q network (DQN) algorithm in crowd evacuation, this paper proposed a combined action-space DQN (CA-DQN) algorithm that grouped Q network output layer nodes according to action dimensions, which significantly reduced the network complexity and improved system practicality in complex scenes. In this paper, the evacuation guidance system is defined as a reinforcement learning agent and implemented by the CA-DQN method, which provides a novel approach for the evacuation guidance problem. The experiments demonstrate that the proposed method is superior to the static guidance method, and on par with the manually designed model method.

Download Full-text

Autonomous reinforcement learning agent for chemical vapor deposition synthesis of quantum materials

npj Computational Materials ◽

10.1038/s41524-021-00535-3 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Pankaj Rajak ◽

Aravind Krishnamoorthy ◽

Ankit Mishra ◽

Rajiv Kalia ◽

Aiichiro Nakano ◽

...

Keyword(s):

Chemical Vapor Deposition ◽

Reinforcement Learning ◽

Vapor Deposition ◽

Chemical Vapor ◽

Time Behavior ◽

Materials Synthesis ◽

Design Synthesis ◽

Learning Agent ◽

Threshold Temperatures ◽

Quantum Materials

AbstractPredictive materials synthesis is the primary bottleneck in realizing functional and quantum materials. Strategies for synthesis of promising materials are currently identified by time-consuming trial and error and there are no known predictive schemes to design synthesis parameters for materials. We use offline reinforcement learning (RL) to predict optimal synthesis schedules, i.e., a time-sequence of reaction conditions like temperatures and concentrations, for the synthesis of semiconducting monolayer MoS2 using chemical vapor deposition. The RL agent, trained on 10,000 computational synthesis simulations, learned threshold temperatures and chemical potentials for onset of chemical reactions and predicted previously unknown synthesis schedules that produce well-sulfidized crystalline, phase-pure MoS2. The model can be extended to multi-task objectives such as predicting profiles for synthesis of complex structures including multi-phase heterostructures and can predict long-time behavior of reacting systems, far beyond the domain of molecular dynamics simulations, making these predictions directly relevant to experimental synthesis.

Download Full-text