scholarly journals Quadruply Stochastic Gradients for Large Scale Nonlinear Semi-Supervised AUC Optimization

Author(s):  
Wanli Shi ◽  
Bin Gu ◽  
Xiang Li ◽  
Xiang Geng ◽  
Heng Huang

Semi-supervised learning is pervasive in real-world applications, where only a few labeled data are available and large amounts of instances remain unlabeled. Since AUC is an important model evaluation metric in classification, directly optimizing AUC in semi-supervised learning scenario has drawn much attention in the machine learning community. Recently, it has been shown that one could find an unbiased solution for the semi-supervised AUC maximization problem without knowing the class prior distribution. However, this method is hardly scalable for nonlinear classification problems with kernels. To address this problem, in this paper, we propose a novel scalable quadruply stochastic gradient algorithm (QSG-S2AUC) for nonlinear semi-supervised AUC optimization. In each iteration of the stochastic optimization process, our method randomly samples a positive instance, a negative instance, an unlabeled instance and their random features to compute the gradient and then update the model by using this quadruply stochastic gradient to approach the optimal solution. More importantly, we prove that QSG-S2AUC can converge to the optimal solution in O(1/t), where t is the iteration number. Extensive experimental results on  a variety of benchmark datasets show that QSG-S2AUC is far more efficient than the existing state-of-the-art algorithms for semi-supervised AUC maximization, while retaining the similar generalization performance.

2020 ◽  
Vol 34 (04) ◽  
pp. 5734-5741
Author(s):  
Wanli Shi ◽  
Bin Gu ◽  
Xiang Li ◽  
Heng Huang

Semi-supervised ordinal regression (S2OR) problems are ubiquitous in real-world applications, where only a few ordered instances are labeled and massive instances remain unlabeled. Recent researches have shown that directly optimizing concordance index or AUC can impose a better ranking on the data than optimizing the traditional error rate in ordinal regression (OR) problems. In this paper, we propose an unbiased objective function for S2OR AUC optimization based on ordinal binary decomposition approach. Besides, to handle the large-scale kernelized learning problems, we propose a scalable algorithm called QS3ORAO using the doubly stochastic gradients (DSG) framework for functional optimization. Theoretically, we prove that our method can converge to the optimal solution at the rate of O(1/t), where t is the number of iterations for stochastic data sampling. Extensive experimental results on various benchmark and real-world datasets also demonstrate that our method is efficient and effective while retaining similar generalization performance.


Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 697
Author(s):  
Ting Ma ◽  
Bo Qian ◽  
Dunbiao Niu ◽  
Enbin Song ◽  
Qingjiang Shi

This paper considers the binary Gaussian distribution robust hypothesis testing under a Bayesian optimal criterion in the wireless sensor network (WSN). The distribution covariance matrix under each hypothesis is known, while the distribution mean vector under each hypothesis drifts in an ellipsoidal uncertainty set. Because of the limited bandwidth and energy, we aim at seeking a subset of p out of m sensors such that the best detection performance is achieved. In this setup, the minimax robust sensor selection problem is proposed to deal with the uncertainties of distribution means. Following a popular method, minimizing the maximum overall error probability with respect to the selection matrix can be approximated by maximizing the minimum Chernoff distance between the distributions of the selected measurements under null hypothesis and alternative hypothesis to be detected. Then, we utilize Danskin’s theorem to compute the gradient of the objective function of the converted maximization problem, and apply the orthogonal constraint-preserving gradient algorithm (OCPGA) to solve the relaxed maximization problem without 0/1 constraints. It is shown that the OCPGA can obtain a stationary point of the relaxed problem. Meanwhile, we provide the computational complexity of the OCPGA, which is much lower than that of the existing greedy algorithm. Finally, numerical simulations illustrate that, after the same projection and refinement phases, the OCPGA-based method can obtain better solutions than the greedy algorithm-based method but with up to 48.72 % shorter runtimes. Particularly, for small-scale problems, the OCPGA -based method is able to attain the globally optimal solution.


2020 ◽  
Author(s):  
Shreyas Sekar ◽  
Milan Vojnovic ◽  
Se-Young Yun

We study the canonical problem of maximizing a stochastic submodular function subject to a cardinality constraint, where the goal is to select a subset from a ground set of items with uncertain individual performances to maximize their expected group value. Although near-optimal algorithms have been proposed for this problem, practical concerns regarding scalability, compatibility with distributed implementation, and expensive oracle queries persist in large-scale applications. Motivated by online platforms that rely on individual item scores for content recommendation and team selection, we study a special class of algorithms that select items based solely on individual performance measures known as test scores. The central contribution of this work is a novel and systematic framework for designing test score–based algorithms for a broad class of naturally occurring utility functions. We introduce a new scoring mechanism that we refer to as replication test scores and prove that as long as the objective function satisfies a diminishing-returns condition, one can leverage these scores to compute solutions that are within a constant factor of the optimum. We then extend these scoring mechanisms to the more general stochastic submodular welfare-maximization problem, where the goal is to partition items into groups to maximize the sum of the expected group values. For this more difficult problem, we show that replication test scores can be used to develop an algorithm that approximates the optimal solution up to a logarithmic factor. The techniques presented in this work bridge the gap between the rigorous theoretical work on submodular optimization and simple, scalable heuristics that are useful in certain domains. In particular, our results establish that in many applications involving the selection and assignment of items, one can design algorithms that are intuitive and practically relevant with only a small loss in performance compared with the state-of-the-art approaches. This paper was accepted by Chung Piaw Teo, optimization.


Author(s):  
Bin Gu ◽  
Zhouyuan Huo ◽  
Heng Huang

Pairwise learning is an important learning topic in the machine learning community, where the loss function involves pairs of samples (e.g., AUC maximization and metric learning). Existing pairwise learning algorithms do not perform well in the generality, scalability and efficiency simultaneously. To address these challenging problems, in this paper, we first analyze the relationship between the statistical accuracy and the regularized empire risk for pairwise loss. Based on the relationship, we propose a scalable and efficient adaptive doubly stochastic gradient algorithm (AdaDSG) for generalized regularized pairwise learning problems. More importantly, we prove that the overall computational cost of AdaDSG is O(n) to achieve the statistical accuracy on the full training set with the size of n, which is the best theoretical result for pairwise learning to the best of our knowledge. The experimental results on a variety of real-world datasets not only confirm the effectiveness of our AdaDSG algorithm, but also show that AdaDSG has significantly better scalability and efficiency than the existing pairwise learning algorithms.


Author(s):  
Nguyen N. Tran ◽  
Ha X. Nguyen

A capacity analysis for generally correlated wireless multi-hop multi-input multi-output (MIMO) channels is presented in this paper. The channel at each hop is spatially correlated, the source symbols are mutually correlated, and the additive Gaussian noises are colored. First, by invoking Karush-Kuhn-Tucker condition for the optimality of convex programming, we derive the optimal source symbol covariance for the maximum mutual information between the channel input and the channel output when having the full knowledge of channel at the transmitter. Secondly, we formulate the average mutual information maximization problem when having only the channel statistics at the transmitter. Since this problem is almost impossible to be solved analytically, the numerical interior-point-method is employed to obtain the optimal solution. Furthermore, to reduce the computational complexity, an asymptotic closed-form solution is derived by maximizing an upper bound of the objective function. Simulation results show that the average mutual information obtained by the asymptotic design is very closed to that obtained by the optimal design, while saving a huge computational complexity.


Technologies ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 2
Author(s):  
Ashish Jaiswal ◽  
Ashwin Ramesh Babu ◽  
Mohammad Zaki Zadeh ◽  
Debapriya Banerjee ◽  
Fillia Makedon

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.


Author(s):  
Ruiyang Song ◽  
Kuang Xu

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.


Sign in / Sign up

Export Citation Format

Share Document