scholarly journals Efficient and Robust High-Dimensional Linear Contextual Bandits

Author(s):  
Cheng Chen ◽  
Luo Luo ◽  
Weinan Zhang ◽  
Yong Yu ◽  
Yijiang Lian

The linear contextual bandits is a sequential decision-making problem where an agent decides among sequential actions given their corresponding contexts. Since large-scale data sets become more and more common, we study the linear contextual bandits in high-dimensional situations. Recent works focus on employing matrix sketching methods to accelerating contextual bandits. However, the matrix approximation error will bring additional terms to the regret bound. In this paper we first propose a novel matrix sketching method which is called Spectral Compensation Frequent Directions (SCFD). Then we propose an efficient approach for contextual bandits by adopting SCFD to approximate the covariance matrices. By maintaining and manipulating sketched matrices, our method only needs O(md) space and O(md) updating time in each round, where d is the dimensionality of the data and m is the sketching size. Theoretical analysis reveals that our method has better regret bounds than previous methods in high-dimensional cases. Experimental results demonstrate the effectiveness of our algorithm and verify our theoretical guarantees.

Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


Author(s):  
Jun Huang ◽  
Linchuan Xu ◽  
Jing Wang ◽  
Lei Feng ◽  
Kenji Yamanishi

Existing multi-label learning (MLL) approaches mainly assume all the labels are observed and construct classification models with a fixed set of target labels (known labels). However, in some real applications, multiple latent labels may exist outside this set and hide in the data, especially for large-scale data sets. Discovering and exploring the latent labels hidden in the data may not only find interesting knowledge but also help us to build a more robust learning model. In this paper, a novel approach named DLCL (i.e., Discovering Latent Class Labels for MLL) is proposed which can not only discover the latent labels in the training data but also predict new instances with the latent and known labels simultaneously. Extensive experiments show a competitive performance of DLCL against other state-of-the-art MLL approaches.


2021 ◽  
Vol 27 (7) ◽  
pp. 667-692
Author(s):  
Lamia Berkani ◽  
Lylia Betit ◽  
Louiza Belarif

Clustering-based approaches have been demonstrated to be efficient and scalable to large-scale data sets. However, clustering-based recommender systems suffer from relatively low accuracy and coverage. To address these issues, we propose in this article an optimized multiview clustering approach for the recommendation of items in social networks. First, the selection of the initial medoids is optimized using the Bees Swarm optimization algorithm (BSO) in order to generate better partitions (i.e. refining the quality of medoids according to the objective function). Then, the multiview clustering (MV) is applied, where users are iteratively clustered from the views of both rating patterns and social information (i.e. friendships and trust). Finally, a framework is proposed for testing the different alternatives, namely: (1) the standard recommendation algorithms; (2) the clustering-based and the optimized clustering-based recommendation algorithms using BSO; and (3) the MV and the optimized MV (BSO-MV) algorithms. Experimental results conducted on two real-world datasets demonstrate the effectiveness of the proposed BSO-MV algorithm in terms of improving accuracy, as it outperforms the existing related approaches and baselines.


2014 ◽  
Vol 571-572 ◽  
pp. 497-501 ◽  
Author(s):  
Qi Lv ◽  
Wei Xie

Real-time log analysis on large scale data is important for applications. Specifically, real-time refers to UI latency within 100ms. Therefore, techniques which efficiently support real-time analysis over large log data sets are desired. MongoDB provides well query performance, aggregation frameworks, and distributed architecture which is suitable for real-time data query and massive log analysis. In this paper, a novel implementation approach for an event driven file log analyzer is presented, and performance comparison of query, scan and aggregation operations over MongoDB, HBase and MySQL is analyzed. Our experimental results show that HBase performs best balanced in all operations, while MongoDB provides less than 10ms query speed in some operations which is most suitable for real-time applications.


2021 ◽  
Vol 15 ◽  
Author(s):  
Jianwei Zhang ◽  
Xubin Zhang ◽  
Lei Lv ◽  
Yining Di ◽  
Wei Chen

Background: Learning discriminative representation from large-scale data sets has made a breakthrough in decades. However, it is still a thorny problem to generate representative embedding from limited examples, for example, a class containing only one image. Recently, deep learning-based Few-Shot Learning (FSL) has been proposed. It tackles this problem by leveraging prior knowledge in various ways. Objective: In this work, we review recent advances of FSL from the perspective of high-dimensional representation learning. The results of the analysis can provide insights and directions for future work. Methods: We first present the definition of general FSL. Then we propose a general framework for the FSL problem and give the taxonomy under the framework. We survey two FSL directions: learning policy and meta-learning. Results: We review the advanced applications of FSL, including image classification, object detection, image segmentation and other tasks etc., as well as the corresponding benchmarks to provide an overview of recent progress. Conclusion: FSL needs to be further studied in medical images, language models, and reinforcement learning in future work. In addition, cross-domain FSL, successive FSL, and associated FSL are more challenging and valuable research directions.


Author(s):  
Vo Ngoc Phu ◽  
Vo Thi Ngoc Tran

Artificial intelligence (ARTINT) and information have been famous fields for many years. A reason has been that many different areas have been promoted quickly based on the ARTINT and information, and they have created many significant values for many years. These crucial values have certainly been used more and more for many economies of the countries in the world, other sciences, companies, organizations, etc. Many massive corporations, big organizations, etc. have been established rapidly because these economies have been developed in the strongest way. Unsurprisingly, lots of information and large-scale data sets have been created clearly from these corporations, organizations, etc. This has been the major challenges for many commercial applications, studies, etc. to process and store them successfully. To handle this problem, many algorithms have been proposed for processing these big data sets.


2022 ◽  
pp. 41-67
Author(s):  
Vo Ngoc Phu ◽  
Vo Thi Ngoc Tran

Machine learning (ML), neural network (NN), evolutionary algorithm (EA), fuzzy systems (FSs), as well as computer science have been very famous and very significant for many years. They have been applied to many different areas. They have contributed much to developments of many large-scale corporations, massive organizations, etc. Lots of information and massive data sets (MDSs) have been generated from these big corporations, organizations, etc. These big data sets (BDSs) have been the challenges of many commercial applications, researches, etc. Therefore, there have been many algorithms of the ML, the NN, the EA, the FSs, as well as computer science which have been developed to handle these massive data sets successfully. To support for this process, the authors have displayed all the possible algorithms of the NN for the large-scale data sets (LSDSs) successfully in this chapter. Finally, they have presented a novel model of the NN for the BDS in a sequential environment (SE) and a distributed network environment (DNE).


Sign in / Sign up

Export Citation Format

Share Document