Parametric UMAP Embeddings for Representation and Semisupervised Learning

Abstract UMAP is a nonparametric graph-based dimensionality reduction algorithm using applied Riemannian geometry and algebraic topology to find low-dimensional embeddings of structured data. The UMAP algorithm consists of two steps: (1) computing a graphical representation of a data set (fuzzy simplicial complex) and (2) through stochastic gradient descent, optimizing a low-dimensional embedding of the graph. Here, we extend the second step of UMAP to a parametric optimization over neural network weights, learning a parametric relationship between data and embedding. We first demonstrate that parametric UMAP performs comparably to its nonparametric counterpart while conferring the benefit of a learned parametric mapping (e.g., fast online embeddings for new data). We then explore UMAP as a regularization, constraining the latent distribution of autoencoders, parametrically varying global structure preservation, and improving classifier accuracy for semisupervised learning by capturing structure in unlabeled data.

Download Full-text

Cloning Safe Driving Behavior for Self-Driving Cars using Convolutional Neural Networks

Recent Patents on Computer Science ◽

10.2174/2213275911666181106160002 ◽

2019 ◽

Vol 12 (2) ◽

pp. 120-127 ◽

Cited By ~ 5

Author(s):

Wael Farag

Keyword(s):

Gradient Descent ◽

Autonomous Driving ◽

Driving Behavior ◽

Training Data ◽

Stochastic Gradient Descent ◽

Data Set ◽

Safe Driving ◽

Processing Pipeline ◽

Self Driving Cars ◽

And Training

Background: In this paper, a Convolutional Neural Network (CNN) to learn safe driving behavior and smooth steering manoeuvring, is proposed as an empowerment of autonomous driving technologies. The training data is collected from a front-facing camera and the steering commands issued by an experienced driver driving in traffic as well as urban roads. Methods: This data is then used to train the proposed CNN to facilitate what it is called “Behavioral Cloning”. The proposed Behavior Cloning CNN is named as “BCNet”, and its deep seventeen-layer architecture has been selected after extensive trials. The BCNet got trained using Adam’s optimization algorithm as a variant of the Stochastic Gradient Descent (SGD) technique. Results: The paper goes through the development and training process in details and shows the image processing pipeline harnessed in the development. Conclusion: The proposed approach proved successful in cloning the driving behavior embedded in the training data set after extensive simulations.

Download Full-text

Deep Convolutional Spiking Neural Networks for Image Classification

10.18122/td.1782.boisestate ◽

2021 ◽

Author(s):

Ruthvik Vaila

Keyword(s):

Neural Network ◽

Neural Networks ◽

Artificial Neural Networks ◽

Gradient Descent ◽

Stochastic Gradient ◽

Spiking Neural Networks ◽

Stochastic Gradient Descent ◽

Data Set ◽

Learning Capabilities ◽

Artificial Neural

Spiking neural networks are biologically plausible counterparts of artificial neural networks. Artificial neural networks are usually trained with stochastic gradient descent (SGD) and spiking neural networks are trained with bioinspired spike timing dependent plasticity (STDP). Spiking networks could potentially help in reducing power usage owing to their binary activations. In this work, we use unsupervised STDP in the feature extraction layers of a neural network with instantaneous neurons to extract meaningful features. The extracted binary feature vectors are then classified using classification layers containing neurons with binary activations. Gradient descent (backpropagation) is used only on the output layer to perform training for classification. Surrogate gradients are proposed to perform backpropagation with binary gradients. The accuracies obtained for MNIST and the balanced EMNIST data set compare favorably with other approaches. The effect of the stochastic gradient descent (SGD) approximations on learning capabilities of our network are also explored. We also studied catastrophic forgetting and its effect on spiking neural networks (SNNs). For the experiments regarding catastrophic forgetting, in the classification sections of the network we use a modified synaptic intelligence that we refer to as cost per synapse metric as a regularizer to immunize the network against catastrophic forgetting in a Single-Incremental-Task scenario (SIT). In catastrophic forgetting experiments, we use MNIST and EMNIST handwritten digits datasets that were divided into five and ten incremental subtasks respectively. We also examine behavior of the spiking neural network and empirically study the effect of various hyperparameters on its learning capabilities using the software tool SPYKEFLOW that we developed. We employ MNIST, EMNIST and NMNIST data sets to produce our results.

Download Full-text

Stochastic gradient descent for hybrid quantum-classical optimization

Quantum ◽

10.22331/q-2020-08-31-314 ◽

2020 ◽

Vol 4 ◽

pp. 314 ◽

Cited By ~ 2

Author(s):

Ryan Sweke ◽

Frederik Wilde ◽

Johannes Jakob Meyer ◽

Maria Schuld ◽

Paul K. Fährmann ◽

...

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Expectation Values ◽

Data Set ◽

Doubly Stochastic ◽

Learning Tasks ◽

Value Estimation ◽

Near Term ◽

Classical Optimization

Within the context of hybrid quantum-classical optimization, gradient descent based optimizers typically require the evaluation of expectation values with respect to the outcome of parameterized quantum circuits. In this work, we explore the consequences of the prior observation that estimation of these quantities on quantum hardware results in a form of stochastic gradient descent optimization. We formalize this notion, which allows us to show that in many relevant cases, including VQE, QAOA and certain quantum classifiers, estimating expectation values with k measurement outcomes results in optimization algorithms whose convergence properties can be rigorously well understood, for any value of k. In fact, even using single measurement outcomes for the estimation of expectation values is sufficient. Moreover, in many settings the required gradients can be expressed as linear combinations of expectation values -- originating, e.g., from a sum over local terms of a Hamiltonian, a parameter shift rule, or a sum over data-set instances -- and we show that in these cases k-shot expectation value estimation can be combined with sampling over terms of the linear combination, to obtain ``doubly stochastic'' gradient descent optimizers. For all algorithms we prove convergence guarantees, providing a framework for the derivation of rigorous optimization results in the context of near-term quantum devices. Additionally, we explore numerically these methods on benchmark VQE, QAOA and quantum-enhanced machine learning tasks and show that treating the stochastic settings as hyper-parameters allows for state-of-the-art results with significantly fewer circuit executions and measurements.

Download Full-text

Variance Counterbalancing for Stochastic Large-scale Learning

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213020500104 ◽

2020 ◽

Vol 29 (05) ◽

pp. 2050010

Author(s):

Pola Lydia Lagari ◽

Lefteri H. Tsoukalas ◽

Isaac E. Lagaris

Keyword(s):

Gradient Descent ◽

Large Scale ◽

Performance Enhancement ◽

Mean Squared Error ◽

Large Data ◽

Random Sets ◽

Stochastic Gradient Descent ◽

Step Size ◽

Data Set ◽

Acceleration Techniques

Stochastic Gradient Descent (SGD) is perhaps the most frequently used method for large scale training. A common example is training a neural network over a large data set, which amounts to minimizing the corresponding mean squared error (MSE). Since the convergence of SGD is rather slow, acceleration techniques based on the notion of “Mini-Batches” have been developed. All of them however, mimicking SGD, impose diminishing step-sizes as a means to inhibit large variations in the MSE objective. In this article, we introduce random sets of mini-batches instead of individual mini-batches. We employ an objective function that minimizes the average MSE and its variance over these sets, eliminating so the need for the systematic step size reduction. This approach permits the use of state-of-the-art optimization methods, far more efficient than the gradient descent, and yields a significant performance enhancement.

Download Full-text

Abstract 14131: Proof of Concept: Interpretation of EKG With Image Recognition and Convolutional Neural Networks

Circulation ◽

10.1161/circ.142.suppl_3.14131 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Subrat Das ◽

Matthew Epland ◽

Jiang Yu ◽

Ranjit Suri

Keyword(s):

Image Recognition ◽

Gradient Descent ◽

Confusion Matrix ◽

Stochastic Gradient Descent ◽

Training Set ◽

Data Set ◽

Computing Power ◽

The Past ◽

Novel Approach ◽

Validation Set

Introduction: EKGs are the cornerstone of management in cardiovascular diseases. There have been multiple efforts to computerize the EKG interpretation with algorithms, which unfortunately are machine specific and proprietary. We propose the development of an image recognition model which can be used to read EKG strips (which use standard notations) and hence be used universally. Method: A convolutional neural network (CNN) was trained to classify 12-lead EKGs between seven clinically important diagnostic classes (Figure 1a). Pre-labeled EKG recordings (6-60s) from a publicly available data set on PhysioNet were used to construct the images. The EKG images displayed the 12 channel traces, of 2.5s each, on a consistent 4x3 grid at a resolution of 800x800 pixels (Figure 1a). The data set (23,336 images) was divided into training, tuning, and validation sets; containing 70%, 15%, and 15% of the images, respectively. An austere variation of the MobileNetV3 model was trained from the ground up on the labeled training set. Stochastic gradient descent (SGD) was used to minimize the cross-entropy loss. Training was halted when the tuning loss had not improved from its previous minimum by 0.05% over the past 10 epochs. Results: The model trained over 52 epochs of batches of 32 images. The model’s accuracy was tested using the validation set (which was not used for development of model) and reported as a confusion matrix (Figure 1b). The accuracy per class varies from 69-91%. Conclusion: We used a labeled dataset of EKG images to develop a CNN model to predict seven different diagnostic classes with good accuracy. This is a novel approach to EKG interpretation as an image recognition problem and thus generates the ability to create diagnostic algorithms that are not dependent on proprietary voltage signals generated by commercial EKG machines. With the addition of more images to the data set and higher computing power we are confident that we can achieve enhanced accuracy.

Download Full-text

Parallel Implementation on FPGA of Support Vector Machines Using Stochastic Gradient Descent

Electronics ◽

10.3390/electronics8060631 ◽

2019 ◽

Vol 8 (6) ◽

pp. 631 ◽

Cited By ~ 7

Author(s):

Felipe F. Lopes ◽

João Canas Ferreira ◽

Marcelo A. C. Fernandes

Keyword(s):

Support Vector Machines ◽

Gradient Descent ◽

Parallel Implementation ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Data Set ◽

Viable Solution ◽

Vector Machines ◽

Field Programmable

Sequential Minimal Optimization (SMO) is the traditional training algorithm for Support Vector Machines (SVMs). However, SMO does not scale well with the size of the training set. For that reason, Stochastic Gradient Descent (SGD) algorithms, which have better scalability, are a better option for massive data mining applications. Furthermore, even with the use of SGD, training times can become extremely large depending on the data set. For this reason, accelerators such as Field-programmable Gate Arrays (FPGAs) are used. This work describes an implementation in hardware, using FPGA, of a fully parallel SVM using Stochastic Gradient Descent. The proposed FPGA implementation of an SVM with SGD presents speedups of more than 10,000× relative to software implementations running on a quad-core processor and up to 319× compared to state-of-the-art FPGA implementations while requiring fewer hardware resources. The results show that the proposed architecture is a viable solution for highly demanding problems such as those present in big data analysis.

Download Full-text

State-of-the-Art CNN Optimizer for Brain Tumor Segmentation in Magnetic Resonance Images

Brain Sciences ◽

10.3390/brainsci10070427 ◽

2020 ◽

Vol 10 (7) ◽

pp. 427

Author(s):

Muhammad Yaqub ◽

Jinchao Feng ◽

M. Sultan Zia ◽

Kaleem Arshid ◽

Kebin Jia ◽

...

Keyword(s):

Comparative Analysis ◽

Magnetic Resonance ◽

Gradient Descent ◽

State Of The Art ◽

Magnetic Resonance Images ◽

Learning Rate ◽

Stochastic Gradient Descent ◽

Data Set ◽

Strong Argument ◽

Adaptive Momentum

Brain tumors have become a leading cause of death around the globe. The main reason for this epidemic is the difficulty conducting a timely diagnosis of the tumor. Fortunately, magnetic resonance images (MRI) are utilized to diagnose tumors in most cases. The performance of a Convolutional Neural Network (CNN) depends on many factors (i.e., weight initialization, optimization, batches and epochs, learning rate, activation function, loss function, and network topology), data quality, and specific combinations of these model attributes. When we deal with a segmentation or classification problem, utilizing a single optimizer is considered weak testing or validity unless the decision of the selection of an optimizer is backed up by a strong argument. Therefore, optimizer selection processes are considered important to validate the usage of a single optimizer in order to attain these decision problems. In this paper, we provides a comprehensive comparative analysis of popular optimizers of CNN to benchmark the segmentation for improvement. In detail, we perform a comparative analysis of 10 different state-of-the-art gradient descent-based optimizers, namely Adaptive Gradient (Adagrad), Adaptive Delta (AdaDelta), Stochastic Gradient Descent (SGD), Adaptive Momentum (Adam), Cyclic Learning Rate (CLR), Adaptive Max Pooling (Adamax), Root Mean Square Propagation (RMS Prop), Nesterov Adaptive Momentum (Nadam), and Nesterov accelerated gradient (NAG) for CNN. The experiments were performed on the BraTS2015 data set. The Adam optimizer had the best accuracy of 99.2% in enhancing the CNN ability in classification and segmentation.

Download Full-text

Linear Support Vector Machine (SVM) with Stochastic Gradient Descent (SGD) training and multinomial Nave Bayes (NB) in News Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.360363 ◽

2019 ◽

Vol 7 (4) ◽

pp. 360-363

Author(s):

Feroz Ahmed ◽

Shabina Ghafir

Keyword(s):

Support Vector Machine ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Linear Support Vector Machine

Download Full-text

Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1770 ◽

2020 ◽

Vol 4 (2) ◽

pp. 329-335

Author(s):

Rusydi Umar ◽

Imam Riadi ◽

Purwono

Keyword(s):

Social Media ◽

Gradient Descent ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Svm Algorithm ◽

Vector Machines ◽

Performance Patterns ◽

A Company

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Download Full-text

Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09 ◽

10.3115/1687878.1687946 ◽

2009 ◽

Cited By ~ 45

Author(s):

Yoshimasa Tsuruoka ◽

Jun'ichi Tsujii ◽

Sophia Ananiadou

Keyword(s):

Gradient Descent ◽

Linear Models ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Log Linear

Download Full-text