scholarly journals Improved Algorithms for Conservative Exploration in Bandits

2020 ◽  
Vol 34 (04) ◽  
pp. 3962-3969
Author(s):  
Evrard Garcelon ◽  
Mohammad Ghavamzadeh ◽  
Alessandro Lazaric ◽  
Matteo Pirotta

In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself. In this paper, we study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2). We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems. Finally, we consider a more realistic constraint where the performance is verified only at predefined checkpoints (instead of at every step) and show how this relaxed constraint favorably impacts the regret and empirical performance of CLUCB2.

Author(s):  
Anshuka Rangi ◽  
Massimo Franceschetti ◽  
Long Tran-Thanh

This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of the learning algorithm. We propose a novel algorithm EXP3.BwK and show that the expected regret of the algorithm is order optimal in the budget. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions. These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature.


1999 ◽  
Vol 18 (3-4) ◽  
pp. 265-273
Author(s):  
Giovanni B. Garibotto

The paper is intended to provide an overview of advanced robotic technologies within the context of Postal Automation services. The main functional requirements of the application are briefly referred, as well as the state of the art and new emerging solutions. Image Processing and Pattern Recognition have always played a fundamental role in Address Interpretation and Mail sorting and the new challenging objective is now off-line handwritten cursive recognition, in order to be able to handle all kind of addresses in a uniform way. On the other hand, advanced electromechanical and robotic solutions are extremely important to solve the problems of mail storage, transportation and distribution, as well as for material handling and logistics. Finally a short description of new services of Postal Automation is referred, by considering new emerging services of hybrid mail and paper to electronic conversion.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Huaping Guo ◽  
Weimei Zhi ◽  
Hongbing Liu ◽  
Mingliang Xu

In recent years, imbalanced learning problem has attracted more and more attentions from both academia and industry, and the problem is concerned with the performance of learning algorithms in the presence of data with severe class distribution skews. In this paper, we apply the well-known statistical model logistic discrimination to this problem and propose a novel method to improve its performance. To fully consider the class imbalance, we design a new cost function which takes into account the accuracies of both positive class and negative class as well as the precision of positive class. Unlike traditional logistic discrimination, the proposed method learns its parameters by maximizing the proposed cost function. Experimental results show that, compared with other state-of-the-art methods, the proposed one shows significantly better performance on measures of recall,g-mean,f-measure, AUC, and accuracy.


2019 ◽  
Author(s):  
Jenny Bottek ◽  
Camille Soun ◽  
Julia K Volke ◽  
Akanksha Dixit ◽  
Stephanie Thiebes ◽  
...  

SUMMARYMacrophages perform essential functions during bacterial infections, such as phagocytosis of pathogens and elimination of neutrophils to reduce spreading of infection, inflammation and tissue damage. The spatial distribution of macrophages is critical to respond to tissue specific adaptations upon infections. Using a novel algorithm for correlative mass spectrometry imaging and state-of-the-art multiplex microscopy, we report here that macrophages within the urinary bladder are positioned in the connective tissue underneath the urothelium. Invading uropathogenic E.coli induced an IL-6–dependent CX3CL1 expression by urothelial cells, facilitating relocation of macrophages from the connective tissue into the urothelium. These cells phagocytosed UPECs and eliminated neutrophils to maintain barrier function of the urothelium, preventing persistent and recurrent urinary tract infection. GRAPHICAL ABSTRACT


2021 ◽  
Author(s):  
Hadi Qovaizi

Modern state-of-the-art planners operate by generating a grounded transition system prior to performing search for a solution to a given planning task. Some tasks involve a significant number of objects or entail managing predicates and action schemas with a significant number of arguments. Hence, this instantiation procedure can exhaust all available memory and therefore prevent a planner from performing search to find a solution. This thesis explores this limitation by presenting a benchmark set of problems based on Organic Chemistry Synthesis that was submitted to the latest International Planning Competition (IPC-2018). This benchmark was constructed to gauge the performance of the competing planners given that instantiation is an issue. Furthermore, a novel algorithm, the Regression-Based Heuristic Planner (RBHP), is developed with the aim of averting this issue. RBHP was inspired by the retro-synthetic approach commonly used to solve organic synthesis problems efficiently. RBHP solves planning tasks by applying domain independent heuristics, computed by regression, and performing best-first search. In contrast to most modern planners, RBHP computes heuristics backwards by applying the goal-directed regression operator. However, the best-first search proceeds forward similar to other planners. The proposed planner is evaluated on a set of planning tasks included in previous International Planning Competitions (IPC) against a subset of the top scoring state-of-the-art planners submitted to the IPC-2018.


Author(s):  
Nurshazwani Muhamad Mahfuz ◽  
Marina Yusoff ◽  
Zakiah Ahmad

<div style="’text-align: justify;">Clustering provides a prime important role as an unsupervised learning method in data analytics to assist many real-world problems such as image segmentation, object recognition or information retrieval. It is often an issue of difficulty for traditional clustering technique due to non-optimal result exist because of the presence of outliers and noise data.  This review paper provides a review of single clustering methods that were applied in various domains.  The aim is to see the potential suitable applications and aspect of improvement of the methods. Three categories of single clustering methods were suggested, and it would be beneficial to the researcher to see the clustering aspects as well as to determine the requirement for clustering method for an employment based on the state of the art of the previous research findings.</div>


2019 ◽  
Vol 11 (4) ◽  
Author(s):  
Jari Haverinen ◽  
Niina Keränen ◽  
Petra Falkenbach ◽  
Anna Maijala ◽  
Timo Kolehmainen ◽  
...  

Health technology assessment (HTA) refers to the systematic evaluation of the properties, effects, and/or impacts of health technology. The main purpose of the assessment is to inform decisionmakers in order to better support the introduction of new health technologies. New digital healthcare solutions like mHealth, artificial intelligence (AI), and robotics have brought with them a great potential to further develop healthcare services, but their introduction should follow the same criteria as that of other healthcare methods. They must provide evidence-based benefits and be safe to use, and their impacts on patients and organizations need to be clarified. The first objective of this study was to describe the state-of-the-art HTA methods for mHealth, AI, and robotics. The second objective of this study was to evaluate the domains needed in the assessment. The final aim was to develop an HTA framework for digital healthcare services to support the introduction of novel technologies into Finnish healthcare. In this study, the state-of-the-art HTA methods were evaluated using a literature review and interviews. It was noted that some good practices already existed, but the overall picture showed that further development is still needed, especially in the AI and robotics fields. With the cooperation of professionals, key aspects and domains that should be taken into account to make fast but comprehensive assessments were identified. Based on this information, we created a new framework which supports the HTA process for digital healthcare services. The framework was named Digi-HTA.


2018 ◽  
Vol 8 (12) ◽  
pp. 2512 ◽  
Author(s):  
Ghouthi Boukli Hacene ◽  
Vincent Gripon ◽  
Nicolas Farrugia ◽  
Matthieu Arzel ◽  
Michel Jezequel

Deep learning-based methods have reached state of the art performances, relying on a large quantity of available data and computational power. Such methods still remain highly inappropriate when facing a major open machine learning problem, which consists of learning incrementally new classes and examples over time. Combining the outstanding performances of Deep Neural Networks (DNNs) with the flexibility of incremental learning techniques is a promising venue of research. In this contribution, we introduce Transfer Incremental Learning using Data Augmentation (TILDA). TILDA is based on pre-trained DNNs as feature extractors, robust selection of feature vectors in subspaces using a nearest-class-mean based technique, majority votes and data augmentation at both the training and the prediction stages. Experiments on challenging vision datasets demonstrate the ability of the proposed method for low complexity incremental learning, while achieving significantly better accuracy than existing incremental counterparts.


Entropy ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. 1143
Author(s):  
Zhenwu Wang ◽  
Tielin Wang ◽  
Benting Wan ◽  
Mengjie Han

Multi-label classification (MLC) is a supervised learning problem where an object is naturally associated with multiple concepts because it can be described from various dimensions. How to exploit the resulting label correlations is the key issue in MLC problems. The classifier chain (CC) is a well-known MLC approach that can learn complex coupling relationships between labels. CC suffers from two obvious drawbacks: (1) label ordering is decided at random although it usually has a strong effect on predictive performance; (2) all the labels are inserted into the chain, although some of them may carry irrelevant information that discriminates against the others. In this work, we propose a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously. In the PCC-FS algorithm, feature selection is performed by learning the covariance between feature set and label set, thus eliminating the irrelevant features that can diminish classification performance. Couplings in the label set are extracted, and the coupled labels of each label are inserted simultaneously into the chain structure to execute the training and prediction activities. The experimental results from five metrics demonstrate that, in comparison to eight state-of-the-art MLC algorithms, the proposed method is a significant improvement on existing multi-label classification.


2012 ◽  
Vol 24 (9) ◽  
pp. 2508-2542 ◽  
Author(s):  
Farbound Tai ◽  
Hsuan-Tien Lin

We consider a hypercube view to perceive the label space of multilabel classification problems geometrically. The view allows us not only to unify many existing multilabel classification approaches but also design a novel algorithm, principal label space transformation (PLST), that captures key correlations between labels before learning. The simple and efficient PLST relies on only singular value decomposition as the key step. We derive the theoretical guarantee of PLST and evaluate its empirical performance using real-world data sets. Experimental results demonstrate that PLST is faster than the traditional binary relevance approach and is superior to the modern compressive sensing approach in terms of both accuracy and efficiency.


Sign in / Sign up

Export Citation Format

Share Document