Linear Support Vector Machine (SVM) with Stochastic Gradient Descent (SGD) training and multinomial Nave Bayes (NB) in News Classification

In this paper, we propose a stochastic gradient descent algorithm, called stochastic gradient descent method-based generalized pinball support vector machine (SG-GPSVM), to solve data classification problems. This approach was developed by replacing the hinge loss function in the conventional support vector machine (SVM) with a generalized pinball loss function. We show that SG-GPSVM is convergent and that it approximates the conventional generalized pinball support vector machine (GPSVM). Further, the symmetric kernel method was adopted to evaluate the performance of SG-GPSVM as a nonlinear classifier. Our suggested algorithm surpasses existing methods in terms of noise insensitivity, resampling stability, and accuracy for large-scale data scenarios, according to the experimental results.

Download Full-text

Perbandingan Teknik Klasifikasi Dalam Data Mining Untuk Bank Direct Marketing

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.201855958 ◽

2018 ◽

Vol 5 (5) ◽

pp. 567 ◽

Cited By ~ 2

Author(s):

Irvi Oktanisa ◽

Ahmad Afif Supianto

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Random Forest ◽

Gradient Descent ◽

Naive Bayes ◽

Direct Marketing ◽

Naïve Bayes ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector

Klasifikasi merupakan teknik dalam data mining untuk mengelompokkan data berdasarkan keterikatan data terhadap data sampel. Pada penelitian ini, kami melakukan perbandingan 9 teknik klasifikasi untuk mengklasifikasi respon pelanggan pada dataset Bank Direct Marketing. Perbandingan teknik klasifikasi ini dilakukan untuk mengetahui model dalam teknik klasfikasi yang paling efektif untuk mengklasifikasi target pada dataset Bank Direct Marketing. Teknik klasifikasi yang digunakan yaitu Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, dan CN2 Rule. Proses klasifikasi diawali dengan preprocessing data untuk melakukan penghilangan missing value dan pemilihan fitur pada dataset. Pada tahap evaluasi digunakan teknik 10 fold cross validation. Setelah dilakukan pengujian, didapatkan bahwa hasil klasifikasi menunjukkan akurasi terbaik diperoleh oleh model Tree, Constant, Naive Bayes, dan Stochastic Gardient Descent. Kemudian diikuti oleh model Random Forest, K-Nearest Neighbor, CN-2 Rule, AdaBoost dan Support Vector Machine. Dari keempat model yang menunjukkan hasil akurasi terbaik, untuk kasus ini Stochastic Gradient Descent terpilih sebagai model yang memiliki akurasi terbaik dengan nilai akurasi sebesar 0,972 dan hasil visualisasi yang dihasilkan lebih jelas untuk mengklasifikasi target pada dataset Bank Direct Marketing. AbstractClassification is a technique in data mining to classify data based on the attachment of data to the sample data.. In this paper, we present the comparison of 9 classification techniques performed to classify customer response on the dataset of Bank Direct Marketing. The techniques performed to find out the effectiveness model in the classification technique used to classify targets on the dataset of Bank Direct Marketing. The techniques used are Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, and CN2 Rule. The classification process begins with preprocessing data to perform missing value omissions and feature selection on the dataset. Cross validation technique, with k value is 10, used in the evaluation stage. After testing, it was found that the classification results showed the best accuracy obtained when using the Tree model, Constant, Naive Bayes and Stochastic Gradient Descent. Afterwards the Random Forest model, K-Nearest Neighbor, CN-2 Rule, AdaBoost, and Support Vector Machine are followed. Of the four models with the high accuracy results, in this case Stochastic Gradient Descent was selected as the best accuracy model with an accuracy value of 0.972 and resulting visualization more clearly to classify targets on the dataset of Bank Direct Marketing.

Download Full-text

Sentiment classification for employees reviews using regression vector- stochastic gradient descent classifier (RV-SGDC)

PeerJ Computer Science ◽

10.7717/peerj-cs.712 ◽

2021 ◽

Vol 7 ◽

pp. e712

Author(s):

Babacar Gaye ◽

Dezheng Zhang ◽

Aziguli Wulamu

Keyword(s):

Machine Learning ◽

Gradient Descent ◽

Stochastic Gradient ◽

Sentiment Classification ◽

Majority Voting ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Hybrid Architecture ◽

Accuracy Score ◽

Learning Models

The satisfaction of employees is very important for any organization to make sufficient progress in production and to achieve its goals. Organizations try to keep their employees satisfied by making their policies according to employees’ demands which help to create a good environment for the collective. For this reason, it is beneficial for organizations to perform staff satisfaction surveys to be analyzed, allowing them to gauge the levels of satisfaction among employees. Sentiment analysis is an approach that can assist in this regard as it categorizes sentiments of reviews into positive and negative results. In this study, we perform experiments for the world’s big six companies and classify their employees’ reviews based on their sentiments. For this, we proposed an approach using lexicon-based and machine learning based techniques. Firstly, we extracted the sentiments of employees from text reviews and labeled the dataset as positive and negative using TextBlob. Then we proposed a hybrid/voting model named Regression Vector-Stochastic Gradient Descent Classifier (RV-SGDC) for sentiment classification. RV-SGDC is a combination of logistic regression, support vector machines, and stochastic gradient descent. We combined these models under a majority voting criteria. We also used other machine learning models in the performance comparison of RV-SGDC. Further, three feature extraction techniques: term frequency-inverse document frequency (TF-IDF), bag of words, and global vectors are used to train learning models. We evaluated the performance of all models in terms of accuracy, precision, recall, and F1 score. The results revealed that RV-SGDC outperforms with a 0.97 accuracy score using the TF-IDF feature due to its hybrid architecture.

Download Full-text

APPLICATION OF MACHINE LEARNING ALGORITHMS FOR PROCESSING COMMENTS FROM THE YOUTUBE VIDEO HOSTING UNDER TRAINING VIDEOS

Science and Transport Progress Bulletin of Dnipropetrovsk National University of Railway Transport ◽

10.15802/stp2020/225264 ◽

2021 ◽

pp. 33-42

Author(s):

L. S. Koriashkina ◽

H. V. Symonets

Keyword(s):

Machine Learning ◽

Gradient Descent ◽

Russian Language ◽

Stochastic Gradient ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods ◽

Gradient Enhancement

Purpose. Detecting toxic comments on YouTube video hosting under training videos by classifying unstructured text using a combination of machine learning methods. Methodology. To work with the specified type of data, machine learning methods were used for cleaning, normalizing, and presenting textual data in a form acceptable for processing on a computer. Directly to classify comments as “toxic”, we used a logistic regression classifier, a linear support vector classification method without and with a learning method – stochastic gradient descent, a random forest classifier and a gradient enhancement classifier. In order to assess the work of the classifiers, the methods of calculating the matrix of errors, accuracy, completeness and F-measure were used. For a more generalized assessment, a cross-validation method was used. Python programming language. Findings. Based on the assessment indicators, the most optimal methods were selected – support vector machine (Linear SVM), without and with the training method using stochastic gradient descent. The described technologies can be used to analyze the textual comments under any training videos to detect toxic reviews. Also, the approach can be useful for identifying unwanted or even aggressive information on social networks or services where reviews are provided. Originality. It consists in a combination of methods for preprocessing a specific type of text, taking into account such features as the possibility of having a timecode, emoji, links, and the like, as well as in the adaptation of classification methods of machine learning for the analysis of Russian-language comments. Practical value. It is about optimizing (simplification) the comment analysis process. The need for this processing is due to the growing volumes of text data, especially in the field of education through quarantine conditions and the transition to distance learning. The volume of educational Internet content already needs to automate the processing and analysis of feedback, over time this need will only grow.

Download Full-text

Analyzing Fake News Based on Machine Learning Algorithms

Intelligent Systems and Computer Technology - Advances in Parallel Computing ◽

10.3233/apc200146 ◽

2020 ◽

Author(s):

Pawar A B ◽

Jawale M A ◽

Kyatanavar D N

Keyword(s):

Language Processing ◽

Gradient Descent ◽

Human Life ◽

Stochastic Gradient ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Gradient Boosting ◽

Support Vector ◽

Fake News ◽

Processing Techniques

Usages of Natural Language Processing techniques in the field of detection of fake news is analyzed in this research paper. Fake news are misleading concepts spread by invalid resources can provide damages to human-life, society. To carry out this analysis work, dataset obtained from web resource OpenSources.co is used which is mainly part of Signal Media. The document frequency terms as TF-IDF of bi-grams used in correlation with PCFG (Probabilistic Context Free Grammar) on a set of 11,000 documents extracted as news articles. This set tested on classification algorithms namely SVM (Support Vector Machines), Stochastic Gradient Descent, Bounded Decision Trees, Gradient Boosting algorithm with Random Forests. In experimental analysis, found that combination of Stochastic Gradient Descent with TF-IDF of bi-grams gives an accuracy of 77.2% in detecting fake contents, which observes with PCFGs having slight recalling defects

Download Full-text

Stochastic gradient descent support vector clustering

2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS) ◽

10.1109/nics.2015.7302228 ◽

2015 ◽

Cited By ~ 1

Author(s):

Tung Pham ◽

Hang Dang ◽

Trung Le ◽

Hoang-Thai Le

Keyword(s):

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Support Vector Clustering ◽

Vector Clustering

Download Full-text

Fast Hyperparameter Tuning for Support Vector Machines with Stochastic Gradient Descent

Machine Learning, Optimization, and Data Science - Lecture Notes in Computer Science ◽

10.1007/978-3-030-64580-9_40 ◽

2020 ◽

pp. 481-493

Author(s):

Marcin Orchel ◽

Johan A. K. Suykens

Keyword(s):

Support Vector Machines ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Vector Machines

Download Full-text

Large-scale support vector regression with budgeted stochastic gradient descent

International Journal of Machine Learning and Cybernetics ◽

10.1007/s13042-018-0832-7 ◽

2018 ◽

Vol 10 (6) ◽

pp. 1529-1541

Author(s):

Zongxia Xie ◽

Yingda Li

Keyword(s):

Support Vector Regression ◽

Gradient Descent ◽

Large Scale ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector

Download Full-text

A Modified Support Vector Machine Classifiers Using Stochastic Gradient Descent with Application to Leukemia Cancer Type Dataset

Baghdad Science Journal ◽

10.21123/bsj.2020.17.4.1255 ◽

2020 ◽

Vol 17 (4) ◽

pp. 1255

Author(s):

Ghadeer Jasim Mahdi

Keyword(s):

Gradient Descent ◽

Nearest Neighbor ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Cancer Type ◽

K Nearest Neighbor ◽

Modified Method ◽

Optimal Hyperplane ◽

Cancer Types

Support vector machines (SVMs) are supervised learning models that analyze data for classification or regression. For classification, SVM is widely used by selecting an optimal hyperplane that separates two classes. SVM has very good accuracy and extremally robust comparing with some other classification methods such as logistics linear regression, random forest, k-nearest neighbor and naïve model. However, working with large datasets can cause many problems such as time-consuming and inefficient results. In this paper, the SVM has been modified by using a stochastic Gradient descent process. The modified method, stochastic gradient descent SVM (SGD-SVM), checked by using two simulation datasets. Since the classification of different cancer types is important for cancer diagnosis and drug discovery, SGD-SVM is applied for classifying the most common leukemia cancer type dataset. The results that are gotten using SGD-SVM are much accurate than other results of many studies that used the same leukemia datasets.

Download Full-text