Unifying the Stochastic and the Adversarial Bandits with Knapsack

This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of the learning algorithm. We propose a novel algorithm EXP3.BwK and show that the expected regret of the algorithm is order optimal in the budget. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions. These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature.

Download Full-text

Improved Algorithms for Conservative Exploration in Bandits

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5812 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3962-3969

Author(s):

Evrard Garcelon ◽

Mohammad Ghavamzadeh ◽

Alessandro Lazaric ◽

Matteo Pirotta

Keyword(s):

State Of The Art ◽

Digital Marketing ◽

Learning Problem ◽

Online Learning Algorithms ◽

Empirical Performance ◽

Regret Bounds ◽

Healthcare Finance ◽

And Robotics ◽

Novel Algorithm ◽

Real World Problems

In many fields such as digital marketing, healthcare, finance, and robotics, it is common to have a well-tested and reliable baseline policy running in production (e.g., a recommender system). Nonetheless, the baseline policy is often suboptimal. In this case, it is desirable to deploy online learning algorithms (e.g., a multi-armed bandit algorithm) that interact with the system to learn a better/optimal policy under the constraint that during the learning process the performance is almost never worse than the performance of the baseline itself. In this paper, we study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2). We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems. Finally, we consider a more realistic constraint where the performance is verified only at predefined checkpoints (instead of at every step) and show how this relaxed constraint favorably impacts the regret and empirical performance of CLUCB2.

Download Full-text

Deep learning-based framework for the distinction of membranous nephropathy: a new approach through hyperspectral imagery

BMC Nephrology ◽

10.1186/s12882-021-02421-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Tianqi Tu ◽

Xueling Wei ◽

Yue Yang ◽

Nianrong Zhang ◽

Wei Li ◽

...

Keyword(s):

Deep Learning ◽

Renal Biopsy ◽

Membranous Nephropathy ◽

Learning Algorithm ◽

Hyperspectral Imagery ◽

Chinese Patients ◽

Support Vector ◽

Deep Learning Algorithm ◽

The Difference ◽

Complex Deposition

Abstract Background Common subtypes seen in Chinese patients with membranous nephropathy (MN) include idiopathic membranous nephropathy (IMN) and hepatitis B virus-related membranous nephropathy (HBV-MN). However, the morphologic differences are not visible under the light microscope in certain renal biopsy tissues. Methods We propose here a deep learning-based framework for processing hyperspectral images of renal biopsy tissue to define the difference between IMN and HBV-MN based on the component of their immune complex deposition. Results The proposed framework can achieve an overall accuracy of 95.04% in classification, which also leads to better performance than support vector machine (SVM)-based algorithms. Conclusion IMN and HBV-MN can be correctly separated via the deep learning framework using hyperspectral imagery. Our results suggest the potential of the deep learning algorithm as a new method to aid in the diagnosis of MN.

Download Full-text

Event detection of different English data sources based on transfer learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189798 ◽

2021 ◽

pp. 1-11

Author(s):

Yanan Huang ◽

Yuji Miao ◽

Zhenjing Da

Keyword(s):

Transfer Learning ◽

Event Detection ◽

Visual Analysis ◽

Learning Algorithm ◽

Data Sources ◽

Data Set ◽

Data Source ◽

Single Data Source ◽

The Difference ◽

Single Data

The methods of multi-modal English event detection under a single data source and isomorphic event detection of different English data sources based on transfer learning still need to be improved. In order to improve the efficiency of English and data source time detection, based on the transfer learning algorithm, this paper proposes multi-modal event detection under a single data source and isomorphic event detection based on transfer learning for different data sources. Moreover, by stacking multiple classification models, this paper makes each feature merge with each other, and conducts confrontation training through the difference between the two classifiers to further make the distribution of different source data similar. In addition, in order to verify the algorithm proposed in this paper, a multi-source English event detection data set is collected through a data collection method. Finally, this paper uses the data set to verify the method proposed in this paper and compare it with the current most mainstream transfer learning methods. Through experimental analysis, convergence analysis, visual analysis and parameter evaluation, the effectiveness of the algorithm proposed in this paper is demonstrated.

Download Full-text

Chip Attach Scheduling in Semiconductor Assembly

Journal of Industrial Engineering ◽

10.1155/2013/295604 ◽

2013 ◽

Vol 2013 ◽

pp. 1-11

Author(s):

Zhicong Zhang ◽

Kaishun Hu ◽

Shuai Li ◽

Huiyu Huang ◽

Shaoyong Zhao

Keyword(s):

Objective Function ◽

Objective Function Value ◽

Learning Algorithm ◽

Parallel Machine Scheduling ◽

Volume Index ◽

Production Volume ◽

Learning Problem ◽

Reward Function ◽

Unrelated Parallel Machine Scheduling ◽

Target Production

Chip attach is the bottleneck operation in semiconductor assembly. Chip attach scheduling is in nature unrelated parallel machine scheduling considering practical issues, for example, machine-job qualification, sequence-dependant setup times, initial machine status, and engineering time. The major scheduling objective is to minimize the total weighted unsatisfied Target Production Volume in the schedule horizon. To apply Q-learning algorithm, the scheduling problem is converted into reinforcement learning problem by constructing elaborate system state representation, actions, and reward function. We select five heuristics as actions and prove the equivalence of reward function and the scheduling objective function. We also conduct experiments with industrial datasets to compare the Q-learning algorithm, five action heuristics, and Largest Weight First (LWF) heuristics used in industry. Experiment results show that Q-learning is remarkably superior to the six heuristics. Compared with LWF, Q-learning reduces three performance measures, objective function value, unsatisfied Target Production Volume index, and unsatisfied job type index, by considerable amounts of 80.92%, 52.20%, and 31.81%, respectively.

Download Full-text

DETERMINATION OF BOTANICAL COMPOSITION OF TWO-COMPONENT FORAGE MIXTURES

Canadian Journal of Plant Science ◽

10.4141/cjps60-033 ◽

1960 ◽

Vol 40 (2) ◽

pp. 225-234 ◽

Cited By ~ 2

Author(s):

J. W. Tanner ◽

E. E. Gamble ◽

W. E. Tossell

Keyword(s):

Unit Cost ◽

Estimation Method ◽

Separation Method ◽

Botanical Composition ◽

Visual Estimation ◽

Two Component ◽

Late Maturity ◽

The Difference ◽

Forage Mixtures

A comparative study was made in 1958 of the visual estimation and hand separation methods of determining botanical composition of two-component forage mixtures. The results indicated that there were positive significant correlations between the per cent legume values obtained by the two methods. The visual estimation method was less variable than the hand separation method and the precision per unit cost was greater. The differences between per cent legume values obtained by the two methods were influenced by the stage of maturity (medium or late hay) of the components and the cut (hay or aftermath). In this study, the difference was significant only in the medium aftermath cut.Individually, three observers showed some inconsistencies between estimates on the medium and late maturity groups and between the hay and aftermath cut. However, by averaging the three estimates to obtain a mean sample, these inconsistencies were minimized.Both methods were more precise in the aftermath pasture cut than in the hay. An additional observer increased precision of the visual estimate more than an additional replicate or sample. The greater precision resulting from additional replicates, samples, or observers increased at a decreasing rate. The number of replicates, samples, and observers required for specific degrees of precision and a specific cost were calculated.The experiment showed that the visual estimation method can be superior to the hand separation method as a means of determining botanical composition.

Download Full-text

Concept Induction in Description Logics Using Information-Theoretic Heuristics

International Journal on Semantic Web and Information Systems ◽

10.4018/jswis.2011040102 ◽

2011 ◽

Vol 7 (2) ◽

pp. 23-44 ◽

Cited By ~ 6

Author(s):

Nicola Fanizzi

Keyword(s):

Semantic Web ◽

Experimental Evaluation ◽

Learning Algorithm ◽

Description Logics ◽

Learning Problem ◽

Information Theoretic ◽

Concept Induction ◽

Formal Ontologies ◽

Refinement Operators ◽

Theoretical Foundations

This paper presents an approach to ontology construction pursued through the induction of concept descriptions expressed in Description Logics. The author surveys the theoretical foundations of the standard representations for formal ontologies in the Semantic Web. After stating the learning problem in this peculiar context, a FOIL-like algorithm is presented that can be applied to learn DL concept descriptions. The algorithm performs a search through a space of candidate concept definitions by means of refinement operators. This process is guided by heuristics that are based on the available examples. The author discusses related theoretical aspects of learning with the inherent incompleteness underlying the semantics of this representation. The experimental evaluation of the system DL-Foil, which implements the learning algorithm, was carried out in two series of sessions on real ontologies from standard repositories for different domains expressed in diverse description logics.

Download Full-text

The role of prosodic structure in the L2 acquisition of Spanish stop lenition

Second language Research ◽

10.1177/0267658316687356 ◽

2017 ◽

Vol 33 (2) ◽

pp. 233-269 ◽

Cited By ~ 5

Author(s):

Jennifer Cabrelli Amaro

Keyword(s):

Learning Algorithm ◽

Intermediate Stage ◽

Initial Position ◽

Control Group ◽

Advanced Learners ◽

Markedness Constraint ◽

L2 Spanish ◽

The Difference ◽

Positional Faithfulness ◽

L1 English

This study tests the hypothesis that late first-language English / second-language Spanish learners (L1 English / L2 Spanish learners) acquire spirantization in stages according to the prosodic hierarchy (Zampini, 1997, 1998). In Spanish, voiced stops [b d g] surface after a pause or nasal stop, and continuants [β̞ ð̞ ɣ̞] surface postvocalically, among other contexts. We adopt an Optimality Theoretic analysis of the phenomenon that assumes that postvocalic continuants surface due to the ranking of prosodic positional faithfulness constraints below a markedness constraint that prohibits stops in postvocalic position. L1 English speakers are presumed to start with a ranking in which prosodic positional faithfulness outranks the markedness constraint. In line with the Gradual Learning Algorithm (Boersma and Hayes, 2001), gradual demotion of the relevant faithfulness constraints is predicted in L2 Spanish, extending the prosodic domain until continuants surface postvocalically across domains. A cross-section of 44 L1 English / L2 Spanish learners and a control group ( n = 5) completed a recitation task, and data were analysed acoustically for manner of articulation and degree of constriction. Results partially align with Zampini’s impressionistic data: Learners first produce underlying stops as postvocalic approximants at the onset of the syllable (word-medial position), followed by the onset of the prosodic word (word-initial position). Unlike Zampini’s findings, there is no evidence for an intermediate stage of acquisition across the boundary of a word and its clitic. Advanced L2 learners produce continuants in postvocalic position at all applicable prosodic levels, which we take to indicate acquisition of the target ranking. We also examined whether learners’ postvocalic continuants are lenited to the same degree as the control group, and whether degree of lenition changes across development. The difference in degree of lenition between controls and learners lessens at higher levels of the prosodic hierarchy as acquisition progresses, and several advanced learners produce target-like segments across prosodic levels.

Download Full-text

AUTONOMOUS BEHAVIORS OF GRAPHICAL AVATARS BASED ON MACHINE LEARNING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001412510020 ◽

2012 ◽

Vol 26 (02) ◽

pp. 1251002

Author(s):

YUESHENG HE ◽

YUAN YAN TANG

Keyword(s):

Machine Learning ◽

Hierarchical Structure ◽

Learning Algorithm ◽

Three Dimensional ◽

Research Area ◽

Machine Learning Techniques ◽

3D Animation ◽

Graphical Environment ◽

Novel Approach ◽

The Difference

Graphical avatars have gained popularity in many application domains such as three-dimensional (3D) animation movies and animated simulations for product design. However, the methods to edit avatars' behaviors in the 3D graphical environment remained to be a challenging research topic. Since the hand-crafted methods are time-consuming and inefficient, the automatic actions of the avatars are required. To achieve the autonomous behaviors of the avatars, artificial intelligence should be used in this research area. In this paper, we present a novel approach to construct a system of automatic avatars in the 3D graphical environments based on the machine learning techniques. Specific framework is created for controlling the behaviors of avatars, such as classifying the difference among the environments and using hierarchical structure to describe these actions. Because of the requirement of simulating the interactions between avatars and environments after the classification of the environment, Reinforcement Learning is used to compute the policy to control the avatar intelligently in the 3D environment for the solution of the problem of different situations. Thus, our approach has solved problems such as where the levels of the missions will be defined and how the learning algorithm will be used to control the avatars. In this paper, our method to achieve these goals will be presented. The main contributions of this paper are presenting a hierarchical structure to control avatars automatically, developing a method for avatars to recognize environment and presenting an approach for making the policy of avatars' actions intelligently.

Download Full-text

Ecological and Energy Indicator of the Implementation of the Best Available Technologies for the Disposal of Poultry Manure

Ecology and Industry of Russia ◽

10.18412/1816-0395-2019-12-29-33 ◽

2019 ◽

Vol 23 (12) ◽

pp. 29-33 ◽

Cited By ~ 1

Author(s):

A.Yu. Bryuchanov ◽

I.A. Subbotin ◽

E.V. Timofeev ◽

A.F. Erk

Keyword(s):

Unit Cost ◽

Poultry Manure ◽

Energy Criterion ◽

Environmental Indicators ◽

Energy Resources ◽

Nitrogen Emissions ◽

Best Available Technologies ◽

The Difference

The "Ecological and energy criterion for the effectiveness of the introduction of BAT" is proposed. This indicator expresses the ratio of the unit cost of the consumption of fuel and energy resources to the difference in the values of nitrogen emissions in the base technology and the compared technology. The option of using this coefficient is considered on the example of comparing poultry manure disposal technologies. The environmental and energy criterion for the effectiveness of the introduction of BAT will be useful for evaluating technologies at the same time both in terms of energy and environmental indicators.

Download Full-text

Dynamic Controller Based on Ying Learning Algorithm for Medical Temperature System

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.187.371 ◽

2011 ◽

Vol 187 ◽

pp. 371-376

Author(s):

Ping Zhang ◽

Xiao Hong Hao ◽

Heng Jie Li

Keyword(s):

Neural Network ◽

Neural Networks ◽

Fuzzy Neural Network ◽

Learning Algorithm ◽

Fuzzy Neural Networks ◽

The Novel ◽

Fuzzy Neural ◽

Learning Set ◽

And Training ◽

Novel Algorithm

In order to avoid the over fitting and training and solve the knowledge extraction problem in fuzzy neural networks system. Ying Learning Dynamic Fuzzy Neural Network (YL-DFNN) algorithm is proposed. The Learning Set based on K-VNN is constituted from message. Then the framework of is designed and its stability is proved. Finally, Simulation indicates that the novel algorithm is fast, compact, and capable in generalization.

Download Full-text