Large-Scale Machine Learning with Stochastic Gradient Descent Léon Bottou

2011 ◽  
pp. 33-42 ◽  
Symmetry ◽  
2021 ◽  
Vol 13 (9) ◽  
pp. 1652
Author(s):  
Wanida Panup ◽  
Rabian Wangkeeree

In this paper, we propose a stochastic gradient descent algorithm, called stochastic gradient descent method-based generalized pinball support vector machine (SG-GPSVM), to solve data classification problems. This approach was developed by replacing the hinge loss function in the conventional support vector machine (SVM) with a generalized pinball loss function. We show that SG-GPSVM is convergent and that it approximates the conventional generalized pinball support vector machine (GPSVM). Further, the symmetric kernel method was adopted to evaluate the performance of SG-GPSVM as a nonlinear classifier. Our suggested algorithm surpasses existing methods in terms of noise insensitivity, resampling stability, and accuracy for large-scale data scenarios, according to the experimental results.


2021 ◽  
Vol 7 ◽  
pp. e712
Author(s):  
Babacar Gaye ◽  
Dezheng Zhang ◽  
Aziguli Wulamu

The satisfaction of employees is very important for any organization to make sufficient progress in production and to achieve its goals. Organizations try to keep their employees satisfied by making their policies according to employees’ demands which help to create a good environment for the collective. For this reason, it is beneficial for organizations to perform staff satisfaction surveys to be analyzed, allowing them to gauge the levels of satisfaction among employees. Sentiment analysis is an approach that can assist in this regard as it categorizes sentiments of reviews into positive and negative results. In this study, we perform experiments for the world’s big six companies and classify their employees’ reviews based on their sentiments. For this, we proposed an approach using lexicon-based and machine learning based techniques. Firstly, we extracted the sentiments of employees from text reviews and labeled the dataset as positive and negative using TextBlob. Then we proposed a hybrid/voting model named Regression Vector-Stochastic Gradient Descent Classifier (RV-SGDC) for sentiment classification. RV-SGDC is a combination of logistic regression, support vector machines, and stochastic gradient descent. We combined these models under a majority voting criteria. We also used other machine learning models in the performance comparison of RV-SGDC. Further, three feature extraction techniques: term frequency-inverse document frequency (TF-IDF), bag of words, and global vectors are used to train learning models. We evaluated the performance of all models in terms of accuracy, precision, recall, and F1 score. The results revealed that RV-SGDC outperforms with a 0.97 accuracy score using the TF-IDF feature due to its hybrid architecture.


Author(s):  
L. S. Koriashkina ◽  
H. V. Symonets

Purpose. Detecting toxic comments on YouTube video hosting under training videos by classifying unstructured text using a combination of machine learning methods. Methodology. To work with the specified type of data, machine learning methods were used for cleaning, normalizing, and presenting textual data in a form acceptable for processing on a computer. Directly to classify comments as “toxic”, we used a logistic regression classifier, a linear support vector classification method without and with a learning method – stochastic gradient descent, a random forest classifier and a gradient enhancement classifier. In order to assess the work of the classifiers, the methods of calculating the matrix of errors, accuracy, completeness and F-measure were used. For a more generalized assessment, a cross-validation method was used. Python programming language. Findings. Based on the assessment indicators, the most optimal methods were selected – support vector machine (Linear SVM), without and with the training method using stochastic gradient descent. The described technologies can be used to analyze the textual comments under any training videos to detect toxic reviews. Also, the approach can be useful for identifying unwanted or even aggressive information on social networks or services where reviews are provided. Originality. It consists in a combination of methods for preprocessing a specific type of text, taking into account such features as the possibility of having a timecode, emoji, links, and the like, as well as in the adaptation of classification methods of machine learning for the analysis of Russian-language comments. Practical value. It is about optimizing (simplification) the comment analysis process. The need for this processing is due to the growing volumes of text data, especially in the field of education through quarantine conditions and the transition to distance learning. The volume of educational Internet content already needs to automate the processing and analysis of feedback, over time this need will only grow.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jinhuan Duan ◽  
Xianxian Li ◽  
Shiqi Gao ◽  
Zili Zhong ◽  
Jinyan Wang

With the vigorous development of artificial intelligence technology, various engineering technology applications have been implemented one after another. The gradient descent method plays an important role in solving various optimization problems, due to its simple structure, good stability, and easy implementation. However, in multinode machine learning system, the gradients usually need to be shared, which will cause privacy leakage, because attackers can infer training data with the gradient information. In this paper, to prevent gradient leakage while keeping the accuracy of the model, we propose the super stochastic gradient descent approach to update parameters by concealing the modulus length of gradient vectors and converting it or them into a unit vector. Furthermore, we analyze the security of super stochastic gradient descent approach and demonstrate that our algorithm can defend against the attacks on the gradient. Experiment results show that our approach is obviously superior to prevalent gradient descent approaches in terms of accuracy, robustness, and adaptability to large-scale batches. Interestingly, our algorithm can also resist model poisoning attacks to a certain extent.


2014 ◽  
Vol 687-691 ◽  
pp. 1342-1345 ◽  
Author(s):  
Jie Ding ◽  
Li Peng Zhu ◽  
Bin Hu ◽  
Ren Long Hang ◽  
Yu Bao Sun

With the rapid advance of data collection and storage technique, it is easy to acquire tens of millions or even billions of data sets. How to explore and exploit the useful or interesting information for human beings from these data sets has become an urgent issue. Traditional k-means clustering algorithm has been widely used in data mining community. First, randomly initialize k clustering centres. Then, all instances are classified into k different classes according to their distances to clustering centres. Lastly, update the clustering centres by the mean of its corresponding constituent instances. This whole process will be iterated until convergence. Obviously, at each iteration, distance matrix from all instances to k clustering centres must be calculated which will cost so much time when encounter large scale data sets. To address this issue, in this paper, we proposed a fast optimization algorithm based on stochastic gradient descent (SGD). At each iteration, randomly choose an instance, search its corresponding clustering centre and then update it immediately. Experimental results show that our proposed method achieves a competitive clustering results with less time cost.


Sign in / Sign up

Export Citation Format

Share Document