Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback

We study the problem of stochastic multiple-arm identification, where an agent sequentially explores a size-[Formula: see text] subset of arms (also known as a super arm) from given [Formula: see text] arms and tries to identify the best super arm. Most work so far has considered the semi-bandit setting, where the agent can observe the reward of each pulled arm or assumed each arm can be queried at each round. However, in real-world applications, it is costly or sometimes impossible to observe a reward of individual arms. In this study, we tackle the full-bandit setting, where only a noisy observation of the total sum of a super arm is given at each pull. Although our problem can be regarded as an instance of the best arm identification in linear bandits, a naive approach based on linear bandits is computationally infeasible since the number of super arms [Formula: see text] is exponential. To cope with this problem, we first design a polynomial-time approximation algorithm for a 0-1 quadratic programming problem arising in confidence ellipsoid maximization. Based on our approximation algorithm, we propose a bandit algorithm whose computation time is [Formula: see text](log [Formula: see text]), thereby achieving an exponential speedup over linear bandit algorithms. We provide a sample complexity upper bound that is still worst-case optimal. Finally, we conduct experiments on large-scale data sets with more than 10[Formula: see text] super arms, demonstrating the superiority of our algorithms in terms of both the computation time and the sample complexity.

Download Full-text

Faculty Opinions recommendation of Comparative assessment of large-scale data sets of protein-protein interactions.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1006598.82257 ◽

2002 ◽

Author(s):

Rob Russell

Keyword(s):

Protein Interactions ◽

Large Scale ◽

Comparative Assessment ◽

Data Sets ◽

Protein Protein Interactions ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Inapproximability and Polynomial-Time Approximation Algorithm for UET Tasks on Structured Processor Networks

Advances in Operations Research ◽

10.1155/2011/476939 ◽

2011 ◽

Vol 2011 ◽

pp. 1-20 ◽

Cited By ~ 1

Author(s):

M. Bouznif ◽

R. Giroudeau

Keyword(s):

Approximation Algorithm ◽

Large Class ◽

Bipartite Graph ◽

Polynomial Time ◽

Performance Guarantee ◽

Makespan Minimization ◽

Time Approximation ◽

Polynomial Time Approximation Algorithm ◽

Precedence Graph ◽

Polynomial Time Approximation

We investigate complexity and approximation results on a processor networks where the communication delay depends on the distance between the processors performing tasks. We then prove that there is no heuristic with a performance guarantee smaller than 4/3 for makespan minimization for precedence graph on a large class of processor networks like hypercube, grid, torus, and so forth, with a fixed diameter . We extend complexity results when the precedence graph is a bipartite graph. We also design an efficient polynomial-time -approximation algorithm for the makespan minimization on processor networks with diameter .

Download Full-text

A Polynomial-Time Approximation Algorithm for Weighted Sum-Rate Maximization in UWB Networks

2008 IEEE International Conference on Communications ◽

10.1109/icc.2008.709 ◽

2008 ◽

Author(s):

G. Kim ◽

Q. Li ◽

R. Negi

Keyword(s):

Approximation Algorithm ◽

Polynomial Time ◽

Weighted Sum ◽

Time Approximation ◽

Polynomial Time Approximation Algorithm ◽

Rate Maximization ◽

Sum Rate Maximization ◽

Sum Rate ◽

Weighted Sum Rate ◽

Weighted Sum Rate Maximization

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Pattern Recognition in Large-Scale Data Sets: Application in Integrated Circuit Manufacturing

Big Data Analytics - Lecture Notes in Computer Science ◽

10.1007/978-3-319-03689-2_13 ◽

2013 ◽

pp. 185-196 ◽

Cited By ~ 1

Author(s):

Choudur K. Lakshminarayan ◽

Michael I. Baron

Keyword(s):

Pattern Recognition ◽

Integrated Circuit ◽

Large Scale ◽

Data Sets ◽

Integrated Circuit Manufacturing ◽

Large Scale Data ◽

Scale Data ◽

Large Scale Data Sets

Download Full-text

Discovering Latent Class Labels for Multi-Label Learning

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/423 ◽

2020 ◽

Author(s):

Jun Huang ◽

Linchuan Xu ◽

Jing Wang ◽

Lei Feng ◽

Kenji Yamanishi

Keyword(s):

Large Scale ◽

Latent Class ◽

Training Data ◽

Data Sets ◽

Robust Learning ◽

Large Scale Data ◽

Novel Approach ◽

Fixed Set ◽

Class Labels ◽

Scale Data

Existing multi-label learning (MLL) approaches mainly assume all the labels are observed and construct classification models with a fixed set of target labels (known labels). However, in some real applications, multiple latent labels may exist outside this set and hide in the data, especially for large-scale data sets. Discovering and exploring the latent labels hidden in the data may not only find interesting knowledge but also help us to build a more robust learning model. In this paper, a novel approach named DLCL (i.e., Discovering Latent Class Labels for MLL) is proposed which can not only discover the latent labels in the training data but also predict new instances with the latent and known labels simultaneously. Extensive experiments show a competitive performance of DLCL against other state-of-the-art MLL approaches.

Download Full-text

BSO-MV: An Optimized Multiview Clustering Approach for Items Recommendation in Social Networks

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.70341 ◽

2021 ◽

Vol 27 (7) ◽

pp. 667-692

Author(s):

Lamia Berkani ◽

Lylia Betit ◽

Louiza Belarif

Keyword(s):

Social Networks ◽

Large Scale ◽

Data Sets ◽

Large Scale Data ◽

Recommendation Algorithms ◽

Clustering Approach ◽

Real World Datasets ◽

Multiview Clustering ◽

Improving Accuracy

Clustering-based approaches have been demonstrated to be efficient and scalable to large-scale data sets. However, clustering-based recommender systems suffer from relatively low accuracy and coverage. To address these issues, we propose in this article an optimized multiview clustering approach for the recommendation of items in social networks. First, the selection of the initial medoids is optimized using the Bees Swarm optimization algorithm (BSO) in order to generate better partitions (i.e. refining the quality of medoids according to the objective function). Then, the multiview clustering (MV) is applied, where users are iteratively clustered from the views of both rating patterns and social information (i.e. friendships and trust). Finally, a framework is proposed for testing the different alternatives, namely: (1) the standard recommendation algorithms; (2) the clustering-based and the optimized clustering-based recommendation algorithms using BSO; and (3) the MV and the optimized MV (BSO-MV) algorithms. Experimental results conducted on two real-world datasets demonstrate the effectiveness of the proposed BSO-MV algorithm in terms of improving accuracy, as it outperforms the existing related approaches and baselines.

Download Full-text

A Real-Time Log Analyzer Based on MongoDB

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.497 ◽

2014 ◽

Vol 571-572 ◽

pp. 497-501 ◽

Cited By ~ 3

Author(s):

Qi Lv ◽

Wei Xie

Keyword(s):

Real Time ◽

Large Scale ◽

Performance Comparison ◽

Log Analysis ◽

Data Sets ◽

Time Data ◽

Real Time Analysis ◽

Large Scale Data ◽

Implementation Approach ◽

And Performance

Real-time log analysis on large scale data is important for applications. Specifically, real-time refers to UI latency within 100ms. Therefore, techniques which efficiently support real-time analysis over large log data sets are desired. MongoDB provides well query performance, aggregation frameworks, and distributed architecture which is suitable for real-time data query and massive log analysis. In this paper, a novel implementation approach for an event driven file log analyzer is presented, and performance comparison of query, scan and aggregation operations over MongoDB, HBase and MySQL is analyzed. Our experimental results show that HBase performs best balanced in all operations, while MongoDB provides less than 10ms query speed in some operations which is most suitable for real-time applications.

Download Full-text

An Applicative Survey on Few-Shot Learning

Recent Patents on Engineering ◽

10.2174/1872212115666210715121344 ◽

2021 ◽

Vol 15 ◽

Author(s):

Jianwei Zhang ◽

Xubin Zhang ◽

Lei Lv ◽

Yining Di ◽

Wei Chen

Keyword(s):

Large Scale ◽

Representation Learning ◽

Language Models ◽

Data Sets ◽

Research Directions ◽

Large Scale Data ◽

Cross Domain ◽

Meta Learning ◽

Definition Of ◽

Future Work

Background: Learning discriminative representation from large-scale data sets has made a breakthrough in decades. However, it is still a thorny problem to generate representative embedding from limited examples, for example, a class containing only one image. Recently, deep learning-based Few-Shot Learning (FSL) has been proposed. It tackles this problem by leveraging prior knowledge in various ways. Objective: In this work, we review recent advances of FSL from the perspective of high-dimensional representation learning. The results of the analysis can provide insights and directions for future work. Methods: We first present the definition of general FSL. Then we propose a general framework for the FSL problem and give the taxonomy under the framework. We survey two FSL directions: learning policy and meta-learning. Results: We review the advanced applications of FSL, including image classification, object detection, image segmentation and other tasks etc., as well as the corresponding benchmarks to provide an overview of recent progress. Conclusion: FSL needs to be further studied in medical images, language models, and reinforcement learning in future work. In addition, cross-domain FSL, successive FSL, and associated FSL are more challenging and valuable research directions.

Download Full-text