scholarly journals Unifying the Stochastic and the Adversarial Bandits with Knapsack

Author(s):  
Anshuka Rangi ◽  
Massimo Franceschetti ◽  
Long Tran-Thanh

This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of the learning algorithm. We propose a novel algorithm EXP3.BwK and show that the expected regret of the algorithm is order optimal in the budget. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions. These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature.

2020 ◽  
Vol 34 (04) ◽  
pp. 3962-3969
Author(s):  
Evrard Garcelon ◽  
Mohammad Ghavamzadeh ◽  
Alessandro Lazaric ◽  
Matteo Pirotta

In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself. In this paper, we study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2). We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems. Finally, we consider a more realistic constraint where the performance is verified only at predefined checkpoints (instead of at every step) and show how this relaxed constraint favorably impacts the regret and empirical performance of CLUCB2.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tianqi Tu ◽  
Xueling Wei ◽  
Yue Yang ◽  
Nianrong Zhang ◽  
Wei Li ◽  
...  

Abstract Background Common subtypes seen in Chinese patients with membranous nephropathy (MN) include idiopathic membranous nephropathy (IMN) and hepatitis B virus-related membranous nephropathy (HBV-MN). However, the morphologic differences are not visible under the light microscope in certain renal biopsy tissues. Methods We propose here a deep learning-based framework for processing hyperspectral images of renal biopsy tissue to define the difference between IMN and HBV-MN based on the component of their immune complex deposition. Results The proposed framework can achieve an overall accuracy of 95.04% in classification, which also leads to better performance than support vector machine (SVM)-based algorithms. Conclusion IMN and HBV-MN can be correctly separated via the deep learning framework using hyperspectral imagery. Our results suggest the potential of the deep learning algorithm as a new method to aid in the diagnosis of MN.


2021 ◽  
pp. 1-11
Author(s):  
Yanan Huang ◽  
Yuji Miao ◽  
Zhenjing Da

The methods of multi-modal English event detection under a single data source and isomorphic event detection of different English data sources based on transfer learning still need to be improved. In order to improve the efficiency of English and data source time detection, based on the transfer learning algorithm, this paper proposes multi-modal event detection under a single data source and isomorphic event detection based on transfer learning for different data sources. Moreover, by stacking multiple classification models, this paper makes each feature merge with each other, and conducts confrontation training through the difference between the two classifiers to further make the distribution of different source data similar. In addition, in order to verify the algorithm proposed in this paper, a multi-source English event detection data set is collected through a data collection method. Finally, this paper uses the data set to verify the method proposed in this paper and compare it with the current most mainstream transfer learning methods. Through experimental analysis, convergence analysis, visual analysis and parameter evaluation, the effectiveness of the algorithm proposed in this paper is demonstrated.


2013 ◽  
Vol 2013 ◽  
pp. 1-11
Author(s):  
Zhicong Zhang ◽  
Kaishun Hu ◽  
Shuai Li ◽  
Huiyu Huang ◽  
Shaoyong Zhao

Chip attach is the bottleneck operation in semiconductor assembly. Chip attach scheduling is in nature unrelated parallel machine scheduling considering practical issues, for example, machine-job qualification, sequence-dependant setup times, initial machine status, and engineering time. The major scheduling objective is to minimize the total weighted unsatisfied Target Production Volume in the schedule horizon. To apply Q-learning algorithm, the scheduling problem is converted into reinforcement learning problem by constructing elaborate system state representation, actions, and reward function. We select five heuristics as actions and prove the equivalence of reward function and the scheduling objective function. We also conduct experiments with industrial datasets to compare the Q-learning algorithm, five action heuristics, and Largest Weight First (LWF) heuristics used in industry. Experiment results show that Q-learning is remarkably superior to the six heuristics. Compared with LWF, Q-learning reduces three performance measures, objective function value, unsatisfied Target Production Volume index, and unsatisfied job type index, by considerable amounts of 80.92%, 52.20%, and 31.81%, respectively.


1960 ◽  
Vol 40 (2) ◽  
pp. 225-234 ◽  
Author(s):  
J. W. Tanner ◽  
E. E. Gamble ◽  
W. E. Tossell

A comparative study was made in 1958 of the visual estimation and hand separation methods of determining botanical composition of two-component forage mixtures. The results indicated that there were positive significant correlations between the per cent legume values obtained by the two methods. The visual estimation method was less variable than the hand separation method and the precision per unit cost was greater. The differences between per cent legume values obtained by the two methods were influenced by the stage of maturity (medium or late hay) of the components and the cut (hay or aftermath). In this study, the difference was significant only in the medium aftermath cut.Individually, three observers showed some inconsistencies between estimates on the medium and late maturity groups and between the hay and aftermath cut. However, by averaging the three estimates to obtain a mean sample, these inconsistencies were minimized.Both methods were more precise in the aftermath pasture cut than in the hay. An additional observer increased precision of the visual estimate more than an additional replicate or sample. The greater precision resulting from additional replicates, samples, or observers increased at a decreasing rate. The number of replicates, samples, and observers required for specific degrees of precision and a specific cost were calculated.The experiment showed that the visual estimation method can be superior to the hand separation method as a means of determining botanical composition.


Author(s):  
Nicola Fanizzi

This paper presents an approach to ontology construction pursued through the induction of concept descriptions expressed in Description Logics. The author surveys the theoretical foundations of the standard representations for formal ontologies in the Semantic Web. After stating the learning problem in this peculiar context, a FOIL-like algorithm is presented that can be applied to learn DL concept descriptions. The algorithm performs a search through a space of candidate concept definitions by means of refinement operators. This process is guided by heuristics that are based on the available examples. The author discusses related theoretical aspects of learning with the inherent incompleteness underlying the semantics of this representation. The experimental evaluation of the system DL-Foil, which implements the learning algorithm, was carried out in two series of sessions on real ontologies from standard repositories for different domains expressed in diverse description logics.


2017 ◽  
Vol 33 (2) ◽  
pp. 233-269 ◽  
Author(s):  
Jennifer Cabrelli Amaro

This study tests the hypothesis that late first-language English / second-language Spanish learners (L1 English / L2 Spanish learners) acquire spirantization in stages according to the prosodic hierarchy (Zampini, 1997, 1998). In Spanish, voiced stops [b d g] surface after a pause or nasal stop, and continuants [β̞ ð̞ ɣ̞] surface postvocalically, among other contexts. We adopt an Optimality Theoretic analysis of the phenomenon that assumes that postvocalic continuants surface due to the ranking of prosodic positional faithfulness constraints below a markedness constraint that prohibits stops in postvocalic position. L1 English speakers are presumed to start with a ranking in which prosodic positional faithfulness outranks the markedness constraint. In line with the Gradual Learning Algorithm (Boersma and Hayes, 2001), gradual demotion of the relevant faithfulness constraints is predicted in L2 Spanish, extending the prosodic domain until continuants surface postvocalically across domains. A cross-section of 44 L1 English / L2 Spanish learners and a control group ( n = 5) completed a recitation task, and data were analysed acoustically for manner of articulation and degree of constriction. Results partially align with Zampini’s impressionistic data: Learners first produce underlying stops as postvocalic approximants at the onset of the syllable (word-medial position), followed by the onset of the prosodic word (word-initial position). Unlike Zampini’s findings, there is no evidence for an intermediate stage of acquisition across the boundary of a word and its clitic. Advanced L2 learners produce continuants in postvocalic position at all applicable prosodic levels, which we take to indicate acquisition of the target ranking. We also examined whether learners’ postvocalic continuants are lenited to the same degree as the control group, and whether degree of lenition changes across development. The difference in degree of lenition between controls and learners lessens at higher levels of the prosodic hierarchy as acquisition progresses, and several advanced learners produce target-like segments across prosodic levels.


Author(s):  
YUESHENG HE ◽  
YUAN YAN TANG

Graphical avatars have gained popularity in many application domains such as three-dimensional (3D) animation movies and animated simulations for product design. However, the methods to edit avatars' behaviors in the 3D graphical environment remained to be a challenging research topic. Since the hand-crafted methods are time-consuming and inefficient, the automatic actions of the avatars are required. To achieve the autonomous behaviors of the avatars, artificial intelligence should be used in this research area. In this paper, we present a novel approach to construct a system of automatic avatars in the 3D graphical environments based on the machine learning techniques. Specific framework is created for controlling the behaviors of avatars, such as classifying the difference among the environments and using hierarchical structure to describe these actions. Because of the requirement of simulating the interactions between avatars and environments after the classification of the environment, Reinforcement Learning is used to compute the policy to control the avatar intelligently in the 3D environment for the solution of the problem of different situations. Thus, our approach has solved problems such as where the levels of the missions will be defined and how the learning algorithm will be used to control the avatars. In this paper, our method to achieve these goals will be presented. The main contributions of this paper are presenting a hierarchical structure to control avatars automatically, developing a method for avatars to recognize environment and presenting an approach for making the policy of avatars' actions intelligently.


2019 ◽  
Vol 23 (12) ◽  
pp. 29-33 ◽  
Author(s):  
A.Yu. Bryuchanov ◽  
I.A. Subbotin ◽  
E.V. Timofeev ◽  
A.F. Erk

The "Ecological and energy criterion for the effectiveness of the introduction of BAT" is proposed. This indicator expresses the ratio of the unit cost of the consumption of fuel and energy resources to the difference in the values of nitrogen emissions in the base technology and the compared technology. The option of using this coefficient is considered on the example of comparing poultry manure disposal technologies. The environmental and energy criterion for the effectiveness of the introduction of BAT will be useful for evaluating technologies at the same time both in terms of energy and environmental indicators.


2011 ◽  
Vol 187 ◽  
pp. 371-376
Author(s):  
Ping Zhang ◽  
Xiao Hong Hao ◽  
Heng Jie Li

In order to avoid the over fitting and training and solve the knowledge extraction problem in fuzzy neural networks system. Ying Learning Dynamic Fuzzy Neural Network (YL-DFNN) algorithm is proposed. The Learning Set based on K-VNN is constituted from message. Then the framework of is designed and its stability is proved. Finally, Simulation indicates that the novel algorithm is fast, compact, and capable in generalization.


Sign in / Sign up

Export Citation Format

Share Document