Stochastic sub-sampled Newton method with variance reduction

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

The Strength of Nesterov's Extrapolation2019

10.36227/techrxiv.11653218 ◽

2020 ◽

Author(s):

Qing Tao

Keyword(s):

Machine Learning ◽

Large Scale ◽

Convergence Rates ◽

Optimization Problems ◽

Gradient Methods ◽

Learning Problems ◽

Smooth Convex ◽

Simple Modification ◽

Convex Problems ◽

Hinge Loss

The extrapolation strategy raised by Nesterov, which can accelerate the convergence rate of gradient descent methods by orders of magnitude when dealing with smooth convex objective, has led to tremendous success in training machine learning tasks. In this paper, we theoretically study its strength in the convergence of individual iterates of general non-smooth convex optimization problems, which we name \textit{individual convergence}. We prove that Nesterov's extrapolation is capable of making the individual convergence of projected gradient methods optimal for general convex problems, which is now a challenging problem in the machine learning community. In light of this consideration, a simple modification of the gradient operation suffices to achieve optimal individual convergence for strongly convex problems, which can be regarded as making an interesting step towards the open question about SGD posed by Shamir \cite{shamir2012open}. Furthermore, the derived algorithms are extended to solve regularized non-smooth learning problems in stochastic settings. {\color{blue}They can serve as an alternative to the most basic SGD especially in coping with machine learning problems, where an individual output is needed to guarantee the regularization structure while keeping an optimal rate of convergence.} Typically, our method is applicable as an efficient tool for solving large-scale $l_1$-regularized hinge-loss learning problems. Several real experiments demonstrate that the derived algorithms not only achieve optimal individual convergence rates but also guarantee better sparsity than the averaged solution.

Download Full-text

Application of spectral conjugate gradient methods for solving unconstrained optimization problems

An International Journal of Optimization and Control Theories & Applications (IJOCTA) ◽

10.11121/ijocta.01.2020.00859 ◽

2020 ◽

Vol 10 (2) ◽

pp. 198-205

Author(s):

Sulaiman Mohammed Ibrahim ◽

Usman Abbas Yakubu ◽

Mustafa Mamat

Keyword(s):

Unconstrained Optimization ◽

Conjugate Gradient ◽

Large Scale ◽

Optimization Problems ◽

Computational Cost ◽

Real Life ◽

Gradient Methods ◽

Nonlinear Problems ◽

Exact Line Search ◽

Unconstrained Optimization Problems

Conjugate gradient (CG) methods are among the most efficient numerical methods for solving unconstrained optimization problems. This is due to their simplicty and less computational cost in solving large-scale nonlinear problems. In this paper, we proposed some spectral CG methods using the classical CG search direction. The proposed methods are applied to real-life problems in regression analysis. Their convergence proof was establised under exact line search. Numerical results has shown that the proposed methods are efficient and promising.

Download Full-text

Entropy-Penalized Semidefinite Programming

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/157 ◽

2019 ◽

Cited By ~ 2

Author(s):

Mikhail Krechetov ◽

Jakub Marecek ◽

Yury Maximov ◽

Martin Takac

Keyword(s):

Machine Learning ◽

Time Complexity ◽

Optimization Problems ◽

Linear Time ◽

Broad Class ◽

Low Rank ◽

Learning Problems ◽

Unified Framework ◽

Gradient Computation ◽

Machine Learning Applications

Low-rank methods for semi-definite programming (SDP) have gained a lot of interest recently, especially in machine learning applications. Their analysis often involves determinant-based or Schatten-norm penalties, which are difficult to implement in practice due to high computational efforts. In this paper, we propose Entropy-Penalized Semi-Definite Programming (EP-SDP), which provides a unified framework for a broad class of penalty functions used in practice to promote a low-rank solution. We show that EP-SDP problems admit an efficient numerical algorithm, having (almost) linear time complexity of the gradient computation; this makes it useful for many machine learning and optimization problems. We illustrate the practical efficiency of our approach on several combinatorial optimization and machine learning problems.

Download Full-text

Prediction of vascular aging based on smartphone acquired PPG signals

Scientific Reports ◽

10.1038/s41598-020-76816-6 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Lorenzo Dall’Olio ◽

Nico Curti ◽

Daniel Remondini ◽

Yosef Safi Harb ◽

Folkert W. Asselbergs ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Screening Tool ◽

Computational Cost ◽

Penalized Regression ◽

Prediction Performance ◽

Second Derivative ◽

Vascular Aging ◽

Non Invasive ◽

Potential Biomarkers

AbstractPhotoplethysmography (PPG) measured by smartphone has the potential for a large scale, non-invasive, and easy-to-use screening tool. Vascular aging is linked to increased arterial stiffness, which can be measured by PPG. We investigate the feasibility of using PPG to predict healthy vascular aging (HVA) based on two approaches: machine learning (ML) and deep learning (DL). We performed data preprocessing, including detrending, demodulating, and denoising on the raw PPG signals. For ML, ridge penalized regression has been applied to 38 features extracted from PPG, whereas for DL several convolutional neural networks (CNNs) have been applied to the whole PPG signals as input. The analysis has been conducted using the crowd-sourced Heart for Heart data. The prediction performance of ML using two features (AUC of 94.7%) – the a wave of the second derivative PPG and tpr, including four covariates, sex, height, weight, and smoking – was similar to that of the best performing CNN, 12-layer ResNet (AUC of 95.3%). Without having the heavy computational cost of DL, ML might be advantageous in finding potential biomarkers for HVA prediction. The whole workflow of the procedure is clearly described, and open software has been made available to facilitate replication of the results.

Download Full-text

Accelerated Variance Reduction Stochastic ADMM for Large-Scale Machine Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence ◽

10.1109/tpami.2020.3000512 ◽

2020 ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Yuanyuan Liu ◽

Fanhua Shang ◽

Hongying Liu ◽

Lin Kong ◽

Jiao Licheng ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Variance Reduction

Download Full-text

Δ-Quantum machine learning for medicinal chemistry

10.33774/chemrxiv-2021-fz6v7-v2 ◽

2021 ◽

Author(s):

Kenneth Atz ◽

Clemens Isert ◽

Markus N. A. Böcker ◽

José Jiménez-Luna ◽

Gisbert Schneider

Keyword(s):

Machine Learning ◽

Message Passing ◽

Density Functional ◽

Large Scale ◽

Molecular Design ◽

Low Cost ◽

Computational Cost ◽

Three Dimensional ◽

Biomolecular Systems ◽

Quantum Observables

Many molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like molecules currently renders large-scale applications of quantum chemistry challenging. Aiming to mitigate this problem, we developed DelFTa, an open-source toolbox for the prediction of electronic properties of drug-like molecules at the density functional (DFT) level of theory, using Δ-machine-learning. Δ-Learning corrects the prediction error (Δ) of a fast but inaccurate property calculation. DelFTa employs state-of-the-art three-dimensional message-passing neural networks trained on a large dataset of QM properties. It provides access to a wide array of quantum observables on the molecular, atomic and bond levels by predicting approximations to DFT values from a low-cost semiempirical baseline. Δ-Learning outperformed its direct-learning counterpart for most of the considered QM endpoints. The results suggest that predictions for non-covalent intra- and intermolecular interactions can be extrapolated to larger biomolecular systems. The software is fully open-sourced and features documented command-line and Python APIs.

Download Full-text

Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization

IEEE Signal Processing Magazine ◽

10.1109/msp.2020.2975210 ◽

2020 ◽

Vol 37 (3) ◽

pp. 92-101 ◽

Cited By ~ 3

Author(s):

Angelia Nedic

Keyword(s):

Machine Learning ◽

Distributed Optimization ◽

Gradient Methods ◽

Learning Problems

Download Full-text

Open-source Δ-quantum machine learning for medicinal chemistry

10.33774/chemrxiv-2021-fz6v7 ◽

2021 ◽

Author(s):

Kenneth Atz ◽

Clemens Isert ◽

Markus N. A. Böcker ◽

José Jiménez-Luna ◽

Gisbert Schneider

Keyword(s):

Machine Learning ◽

Open Source ◽

Density Functional ◽

Large Scale ◽

Molecular Design ◽

State Of The Art ◽

Computational Cost ◽

Quantum Mechanical ◽

Quantum Observables ◽

Graph Neural Networks

Certain molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like compounds currently makes large-scale applications of quantum chemistry challenging. In order to mitigate this problem, we developed DelFTa, an open-source toolbox for predicting small-molecule electronic properties at the density functional (DFT) level of theory, using the Δ-machine learning principle. DelFTa employs state-of-the-art E(3)-equivariant graph neural networks that were trained on the QMugs dataset of QM properties. It provides access to a wide array of quantum observables by predicting approximations to ωB97X-D/def2-SVP values from a GFN2-xTB semiempirical baseline. Δ-learning with DelFTa was shown to outperform direct DFT learning for most of the considered QM endpoints. The software is provided as open-source code with fully-documented command-line and Python APIs.

Download Full-text

Designing production-friendly machine learning

Proceedings of the VLDB Endowment ◽

10.14778/3484224.3484241 ◽

2021 ◽

Vol 14 (13) ◽

pp. 3420-3420

Author(s):

Matei Zaharia

Keyword(s):

Machine Learning ◽

Open Source ◽

Large Scale ◽

Question Answering ◽

Failure Modes ◽

Computational Cost ◽

Language Models ◽

Software Systems ◽

Resource Cost ◽

Low Computational Cost

Building production ML applications is difficult because of their resource cost and complex failure modes. I will discuss these challenges from two perspectives: the Stanford DAWN Lab and experience with large-scale commercial ML users at Databricks. I will then present two emerging ideas to help address these challenges. The first is "ML platforms", an emerging class of software systems that standardize the interfaces used in ML applications to make them easier to build and maintain. I will give a few examples, including the open-source MLflow system from Databricks [3]. The second idea is models that are more "production-friendly" by design. As a concrete example, I will discuss retrieval-based NLP models such as Stanford's ColBERT [1, 2] that query documents from an updateable corpus to perform tasks such as question-answering, which gives multiple practical advantages, including low computational cost, high interpretability, and very fast updates to the model's "knowledge". These models are an exciting alternative to large language models such as GPT-3.

Download Full-text