Projective Quadratic Regression for Online Learning

Wenye Ma

doi:10.1609/aaai.v34i04.5951

Projective Quadratic Regression for Online Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5951 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5093-5100

Author(s):

Wenye Ma

Keyword(s):

Online Learning ◽

Large Scale ◽

Learning Algorithm ◽

Optimal Solution ◽

Streaming Data ◽

Low Rank ◽

High Dimensional ◽

Quadratic Regression ◽

Convex Model ◽

Real World Data

This paper considers online convex optimization (OCO) problems - the paramount framework for online learning algorithm design. The loss function of learning task in OCO setting is based on streaming data so that OCO is a powerful tool to model large scale applications such as online recommender systems. Meanwhile, real-world data are usually of extreme high-dimensional due to modern feature engineering techniques so that the quadratic regression is impractical. Factorization Machine as well as its variants are efficient models for capturing feature interactions with low-rank matrix model but they can't fulfill the OCO setting due to their non-convexity. In this paper, We propose a projective quadratic regression (PQR) model. First, it can capture the import second-order feature information. Second, it is a convex model, so the requirements of OCO are fulfilled and the global optimal solution can be achieved. Moreover, existing modern online optimization methods such as Online Gradient Descent (OGD) or Follow-The-Regularized-Leader (FTRL) can be applied directly. In addition, by choosing a proper hyper-parameter, we show that it has the same order of space and time complexity as the linear model and thus can handle high-dimensional data. Experimental results demonstrate the performance of the proposed PQR model in terms of accuracy and efficiency by comparing with the state-of-the-art methods.

Download Full-text

A Scalable Redefined Stochastic Blockmodel

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3442589 ◽

2021 ◽

Vol 15 (3) ◽

pp. 1-28

Author(s):

Xueyan Liu ◽

Bo Yang ◽

Hechang Chen ◽

Katarzyna Musial ◽

Hongxu Chen ◽

...

Keyword(s):

Large Scale ◽

Network Science ◽

Learning Algorithm ◽

State Of The Art ◽

Real World Data ◽

Computational Overhead ◽

Stochastic Blockmodel ◽

Np Hard Problem ◽

Large Scale Networks ◽

The Cost

Stochastic blockmodel (SBM) is a widely used statistical network representation model, with good interpretability, expressiveness, generalization, and flexibility, which has become prevalent and important in the field of network science over the last years. However, learning an optimal SBM for a given network is an NP-hard problem. This results in significant limitations when it comes to applications of SBMs in large-scale networks, because of the significant computational overhead of existing SBM models, as well as their learning methods. Reducing the cost of SBM learning and making it scalable for handling large-scale networks, while maintaining the good theoretical properties of SBM, remains an unresolved problem. In this work, we address this challenging task from a novel perspective of model redefinition. We propose a novel redefined SBM with Poisson distribution and its block-wise learning algorithm that can efficiently analyse large-scale networks. Extensive validation conducted on both artificial and real-world data shows that our proposed method significantly outperforms the state-of-the-art methods in terms of a reasonable trade-off between accuracy and scalability. 1

Download Full-text

Navo Minority Over-sampling Technique (NMOTe): A Consistent Performance Booster on Imbalanced Datasets

Journal of Electronics and Informatics - September 2019 ◽

10.36548/jei.2020.2.004 ◽

2020 ◽

Vol 2 (2) ◽

pp. 96-136

Author(s):

Navoneel Chakrabarty ◽

Sanket Biswas

Keyword(s):

State Of The Art ◽

High Dimensional Data ◽

Optimal Solution ◽

Sampling Technique ◽

High Dimensional ◽

Real World Data ◽

Imbalanced Datasets ◽

Comprehensive Overview ◽

Unequal Distribution ◽

Data Imbalance

Imbalanced data refers to a problem in machine learning where there exists unequal distribution of instances for each classes. Performing a classification task on such data can often turn bias in favour of the majority class. The bias gets multiplied in cases of high dimensional data. To settle this problem, there exists many real-world data mining techniques like over-sampling and under-sampling, which can reduce the Data Imbalance. Synthetic Minority Oversampling Technique (SMOTe) provided one such state-of-the-art and popular solution to tackle class imbalancing, even on high-dimensional data platform. In this work, a novel and consistent oversampling algorithm has been proposed that can further enhance the performance of classification, especially on binary imbalanced datasets. It has been named as NMOTe (Navo Minority Oversampling Technique), an upgraded and superior alternative to the existing techniques. A critical analysis and comprehensive overview on the literature has been done to get a deeper insight into the problem statements and nurturing the need to obtain the most optimal solution. The performance of NMOTe on some standard datasets has been established in this work to get a statistical understanding on why it has edged the existing state-of-the-art to become the most robust technique for solving the two-class data imbalance problem.

Download Full-text

Feature selection using autoencoders with Bayesian methods to high-dimensional data

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211348 ◽

2021 ◽

pp. 1-10

Author(s):

Lei Shu ◽

Kun Huang ◽

Wenhao Jiang ◽

Wenming Wu ◽

Hongling Liu

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Bayesian Methods ◽

Large Scale ◽

High Dimensional Data ◽

Hybrid Approach ◽

High Dimensional ◽

Real World Data ◽

Learning Tasks ◽

Low Dimensional

It is easy to lead to poor generalization in machine learning tasks using real-world data directly, since such data is usually high-dimensional dimensionality and limited. Through learning the low dimensional representations of high-dimensional data, feature selection can retain useful features for machine learning tasks. Using these useful features effectively trains machine learning models. Hence, it is a challenge for feature selection from high-dimensional data. To address this issue, in this paper, a hybrid approach consisted of an autoencoder and Bayesian methods is proposed for a novel feature selection. Firstly, Bayesian methods are embedded in the proposed autoencoder as a special hidden layer. This of doing is to increase the precision during selecting non-redundant features. Then, the other hidden layers of the autoencoder are used for non-redundant feature selection. Finally, compared with the mainstream approaches for feature selection, the proposed method outperforms them. We find that the way consisted of autoencoders and probabilistic correction methods is more meaningful than that of stacking architectures or adding constraints to autoencoders as regards feature selection. We also demonstrate that stacked autoencoders are more suitable for large-scale feature selection, however, sparse autoencoders are beneficial for a smaller number of feature selection. We indicate that the value of the proposed method provides a theoretical reference to analyze the optimality of feature selection.

Download Full-text

Label Distribution Learning with Label Correlations via Low-Rank Approximation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/461 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tingting Ren ◽

Xiuyi Jia ◽

Weiwei Li ◽

Shu Zhao

Keyword(s):

Correlation Matrix ◽

Learning Algorithm ◽

Low Rank ◽

Low Rank Approximation ◽

Real World Data ◽

Label Correlations ◽

Rank Approximation ◽

Label Distribution Learning ◽

Label Distribution ◽

Label Correlation

Label distribution learning (LDL) can be viewed as the generalization of multi-label learning. This novel paradigm focuses on the relative importance of different labels to a particular instance. Most previous LDL methods either ignore the correlation among labels, or only exploit the label correlations in a global way. In this paper, we utilize both the global and local relevance among labels to provide more information for training model and propose a novel label distribution learning algorithm. In particular, a label correlation matrix based on low-rank approximation is applied to capture the global label correlations. In addition, the label correlation among local samples are adopted to modify the label correlation matrix. The experimental results on real-world data sets show that the proposed algorithm outperforms state-of-the-art LDL methods.

Download Full-text

The Geometric Sparse Matrix Completion Model for Predicting Drug Side effects

10.1101/652412 ◽

2019 ◽

Author(s):

Diego Galeano ◽

Alberto Paccanaro

Keyword(s):

Clinical Trials ◽

Side Effects ◽

Side Effect ◽

Large Scale ◽

Learning Algorithm ◽

Sparse Matrix ◽

Matrix Completion ◽

Optimal Solution ◽

Drug Side Effects ◽

Chemical Structures

AbstractPair-input associations for drug-side effects are obtained through expensive placebo-controlled experiments in human clinical trials. An important challenge in computational pharmacology is to predict missing associations given a few entries in the drug-side effect matrix, as these predictions can be used to direct further clinical trials. Here we introduce the Geometric Sparse Matrix Completion (GSMC) model for predicting drug side effects. Our high-rank matrix completion model learns non-negative sparse matrices of coefficients for drugs and side effects by imposing smoothness priors that exploit a set of pharmacological side information graphs, including information about drug chemical structures, drug interactions, molecular targets, and disease indications. Our learning algorithm is based on the diagonally rescaled gradient descend principle of non-negative matrix factorization. We prove that it converges to a globally optimal solution with a first-order rate of convergence. Experiments on large-scale side effect data from human clinical trials show that our method achieves better prediction performance than six state-of-the-art methods for side effect prediction while offering biological interpretability and favouring explainable predictions.

Download Full-text

High-Dimensional Least-Squares with Perfect Positive Correlation

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595919500167 ◽

2019 ◽

Vol 36 (04) ◽

pp. 1950016

Author(s):

Zhiyong Huang ◽

Ziyan Luo ◽

Naihua Xiu

Keyword(s):

Least Squares ◽

Correlation Coefficient ◽

Large Scale ◽

Rank Correlation ◽

Optimal Solution ◽

Ordinary Least Squares ◽

High Dimensional ◽

Dual Algorithm ◽

Positive Correlation ◽

Proposed Model

The least-squares is a common and important method in linear regression. However, it often leads to overfitting phenomenon as dealing with high-dimensional problems, and various regularization schemes regarding prior information for specific problems are studied to make up such a deficiency. In the sense of Kendall’s [Formula: see text] from the community of nonparametric analysis, we establish a new model wherein the ordinary least-squares is equipped with perfect positive correlation constraint, sought to maintain the concordance of the rankings of the observations and the systematic components. By sorting the observations into an ascending order, we reduce the perfect positive correlation constraint into a linear inequality system. The resulting linearly constrained least-squares problem together with its dual problem is shown to be solvable. In particular, we introduce a mild assumption on the observations and the measurement matrix which rules out the zero vector from the optimal solution set. This indicates that our proposed model is statistically meaningful. To handle large-scale instances, we propose an efficient alternating direction method of multipliers (ADMM) to solve the proposed model from the dual perspective. The effectiveness of our model compared to ordinary least-squares is evaluated in terms of rank correlation coefficient between outputs and the systematic components, and the efficiency of our dual algorithm is demonstrated with the comparison to three efficient solvers via CVX in terms of computation time, solution accuracy and rank correlation coefficient.

Download Full-text

A Hybrid Multi-Agent-Based BFPSO Algorithm for Optimization of Benchmark Functions

Journal of Circuits System and Computers ◽

10.1142/s0218126620501121 ◽

2019 ◽

Vol 29 (07) ◽

pp. 2050112

Author(s):

Renuka Kamdar ◽

Priyanka Paliwal ◽

Yogendra Kumar

Keyword(s):

Optimization Problems ◽

Learning Algorithm ◽

Optimal Solution ◽

System Optimization ◽

High Dimensional ◽

Potential Candidate ◽

Optimization Strategy ◽

Multi Agent Systems ◽

Agent Based ◽

Multi Agent

The goal to provide faster and optimal solution to complex and high-dimensional problem is pushing the technical envelope related to new algorithms. While many approaches use centralized strategies, the concept of multi-agent systems (MASS) is creating a new option related to distributed analyses for the optimization problems. A novel learning algorithm for solving the global numerical optimization problems is proposed. The proposed learning algorithm integrates the multi-agent system and the hybrid butterfly–particle swarm optimization (BFPSO) algorithm. Thus it is named as multi-agent-based BFPSO (MABFPSO). In order to obtain the optimal solution quickly, each agent competes and cooperates with its neighbors and it can also learn by using its knowledge. Making use of these agent–agent interactions and sensitivity and probability mechanism of BFPSO, MABFPSO realizes the purpose of optimizing the value of objective function. The designed MABFPSO algorithm is tested on specific benchmark functions. Simulations of the proposed algorithm have been performed for the optimization of functions of 2, 20 and 30 dimensions. The comparative simulation results with conventional PSO approaches demonstrate that the proposed algorithm is a potential candidate for optimization of both low-and high-dimensional functions. The optimization strategy is general and can be used to solve other power system optimization problems as well.

Download Full-text

Active Subspace: Toward Scalable Low-Rank Learning

Neural Computation ◽

10.1162/neco_a_00369 ◽

2012 ◽

Vol 24 (12) ◽

pp. 3371-3394 ◽

Cited By ~ 55

Author(s):

Guangcan Liu ◽

Shuicheng Yan

Keyword(s):

Large Scale ◽

Optimization Problems ◽

Optimal Solution ◽

Low Rank ◽

Small Scale ◽

Large Solution ◽

Rank Matrix ◽

Low Rank Matrix ◽

Active Subspace ◽

Solution Matrix

We address the scalability issues in low-rank matrix learning problems. Usually these problems resort to solving nuclear norm regularized optimization problems (NNROPs), which often suffer from high computational complexities if based on existing solvers, especially in large-scale settings. Based on the fact that the optimal solution matrix to an NNROP is often low rank, we revisit the classic mechanism of low-rank matrix factorization, based on which we present an active subspace algorithm for efficiently solving NNROPs by transforming large-scale NNROPs into small-scale problems. The transformation is achieved by factorizing the large solution matrix into the product of a small orthonormal matrix (active subspace) and another small matrix. Although such a transformation generally leads to nonconvex problems, we show that a suboptimal solution can be found by the augmented Lagrange alternating direction method. For the robust PCA (RPCA) (Candès, Li, Ma, & Wright, 2009 ) problem, a typical example of NNROPs, theoretical results verify the suboptimality of the solution produced by our algorithm. For the general NNROPs, we empirically show that our algorithm significantly reduces the computational complexity without loss of optimality.

Download Full-text

Efficient Adaptive Online Learning via Frequent Directions

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/381 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yuanyu Wan ◽

Nan Wei ◽

Lijun Zhang

Keyword(s):

Online Learning ◽

Large Scale ◽

Descent Method ◽

Low Rank ◽

Subgradient Methods ◽

Outer Product ◽

Mirror Descent Method ◽

Mirror Descent ◽

Primal Dual ◽

Frequent Directions

By employing time-varying proximal functions, adaptive subgradient methods (ADAGRAD) have improved the regret bound and been widely used in online learning and optimization. However, ADAGRAD with full matrix proximal functions (ADA-FULL) cannot deal with large-scale problems due to the impractical time and space complexities, though it has better performance when gradients are correlated. In this paper, we propose ADA-FD, an efficient variant of ADA-FULL based on a deterministic matrix sketching technique called frequent directions. Following ADA-FULL, we incorporate our ADA-FD into both primal-dual subgradient method and composite mirror descent method to develop two efficient methods. By maintaining and manipulating low-rank matrices, at each iteration, the space complexity is reduced from $O(d^2)$ to $O(\tau d)$ and the time complexity is reduced from $O(d^3)$ to $O(\tau^2d)$, where $d$ is the dimensionality of the data and $\tau \ll d$ is the sketching size. Theoretical analysis reveals that the regret of our methods is close to that of ADA-FULL as long as the outer product matrix of gradients is approximately low-rank. Experimental results show that our ADA-FD is comparable to ADA-FULL and outperforms other state-of-the-art algorithms in online convex optimization as well as in training convolutional neural networks (CNN).

Download Full-text

High-dimensional Similarity Learning via Dual-sparse Random Projection

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/417 ◽

2018 ◽

Author(s):

Dezhong Yao ◽

Peilin Zhao ◽

Tuan-Anh Nguyen Pham ◽

Gao Cong

Keyword(s):

Dimensional Space ◽

Computational Cost ◽

Optimal Solution ◽

Random Projection ◽

Projection Methods ◽

Low Rank ◽

High Dimensional ◽

Similarity Learning ◽

Reduced Space ◽

Low Dimensional

We investigate how to adopt dual random projection for high-dimensional similarity learning. For a high-dimensional similarity learning problem, projection is usually adopted to map high-dimensional features into low-dimensional space, in order to reduce the computational cost. However, dimensionality reduction method sometimes results in unstable performance due to the suboptimal solution in original space. In this paper, we propose a dual random projection framework for similarity learning to recover the original optimal solution from subspace optimal solution. Previous dual random projection methods usually make strong assumptions about the data, which need to be low rank or have a large margin. Those assumptions limit dual random projection applications in similarity learning. Thus, we adopt a dual-sparse regularized random projection method that introduces a sparse regularizer into the reduced dual problem. As the original dual solution is a sparse one, applying a sparse regularizer in the reduced space relaxes the low-rank assumption. Experimental results show that our method enjoys higher effectiveness and efficiency than state-of-the-art solutions.

Download Full-text