Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Rusydi Umar;  Imam Riadi;  Purwono

doi:10.29207/resti.v4i2.1770

Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1770 ◽

2020 ◽

Vol 4 (2) ◽

pp. 329-335

Author(s):

Rusydi Umar ◽

Imam Riadi ◽

Purwono

Keyword(s):

Social Media ◽

Gradient Descent ◽

Classification Model ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Svm Algorithm ◽

Vector Machines ◽

Performance Patterns ◽

A Company

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Download Full-text

Fast Hyperparameter Tuning for Support Vector Machines with Stochastic Gradient Descent

Machine Learning, Optimization, and Data Science - Lecture Notes in Computer Science ◽

10.1007/978-3-030-64580-9_40 ◽

2020 ◽

pp. 481-493

Author(s):

Marcin Orchel ◽

Johan A. K. Suykens

Keyword(s):

Support Vector Machines ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Vector Machines

Download Full-text

Hyper-parameter optimization for support vector machines using stochastic gradient descent and dual coordinate descent

EURO Journal on Computational Optimization ◽

10.1007/s13675-019-00115-7 ◽

2019 ◽

Vol 8 (1) ◽

pp. 85-101 ◽

Cited By ~ 3

Author(s):

Wei Jiang ◽

Sauleh Siddiqui

Keyword(s):

Support Vector Machines ◽

Parameter Optimization ◽

Gradient Descent ◽

Coordinate Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Vector Machines ◽

Dual Coordinate Descent

Download Full-text

Parallel Implementation on FPGA of Support Vector Machines Using Stochastic Gradient Descent

Electronics ◽

10.3390/electronics8060631 ◽

2019 ◽

Vol 8 (6) ◽

pp. 631 ◽

Cited By ~ 7

Author(s):

Felipe F. Lopes ◽

João Canas Ferreira ◽

Marcelo A. C. Fernandes

Keyword(s):

Support Vector Machines ◽

Gradient Descent ◽

Parallel Implementation ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Data Set ◽

Viable Solution ◽

Vector Machines ◽

Field Programmable

Sequential Minimal Optimization (SMO) is the traditional training algorithm for Support Vector Machines (SVMs). However, SMO does not scale well with the size of the training set. For that reason, Stochastic Gradient Descent (SGD) algorithms, which have better scalability, are a better option for massive data mining applications. Furthermore, even with the use of SGD, training times can become extremely large depending on the data set. For this reason, accelerators such as Field-programmable Gate Arrays (FPGAs) are used. This work describes an implementation in hardware, using FPGA, of a fully parallel SVM using Stochastic Gradient Descent. The proposed FPGA implementation of an SVM with SGD presents speedups of more than 10,000× relative to software implementations running on a quad-core processor and up to 319× compared to state-of-the-art FPGA implementations while requiring fewer hardware resources. The results show that the proposed architecture is a viable solution for highly demanding problems such as those present in big data analysis.

Download Full-text

Will they repay their debt? Identification of borrowers likely to be charged off

Management & Marketing. Challenges for the Knowledge Society ◽

10.2478/mmcks-2020-0023 ◽

2020 ◽

Vol 15 (3) ◽

pp. 393-409

Author(s):

Raluca Dana Caplescu ◽

Ana-Maria Panaite ◽

Daniel Traian Pele ◽

Vasile Alecsandru Strat

Keyword(s):

Logistic Regression ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Sampled Data ◽

K Nearest Neighbors ◽

Vector Machines ◽

Income Ratio ◽

Strong Candidate

AbstractRecent increase in peer-to-peer lending prompted for development of models to separate good and bad clients to mitigate risks both for lenders and for the platforms. The rapidly increasing body of literature provides several comparisons between various models. Among the most frequently employed ones are logistic regression, Support Vector Machines, neural networks and decision tree-based models. Among them, logistic regression has proved to be a strong candidate both because its good performance and due to its high explainability. The present paper aims to compare four pairs of models (for imbalanced and under-sampled data) meant to predict charged off clients by optimizing F1 score. We found that, if the data is balanced, Logistic Regression, both simple and with Stochastic Gradient Descent, outperforms LightGBM and K-Nearest Neighbors in optimizing F1 score. We chose this metric as it provides balance between the interests of the lenders and those of the platform. Loan term, debt-to-income ratio and number of accounts were found to be important positively related predictors of risk of charge off. At the other end of the spectrum, by far the strongest impact on charge off probability is that of the FICO score. The final number of features retained by the two models differs very much, because, although both models use Lasso for feature selection, Stochastic Gradient Descent Logistic Regression uses a stronger regularization. The analysis was performed using Python (numpy, pandas, sklearn and imblearn).

Download Full-text

Stochastic gradient descent‐based support vector machines training optimization on Big Data and HPC frameworks

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6292 ◽

2021 ◽

Author(s):

Vibhatha Abeykoon ◽

Geoffrey Fox ◽

Minje Kim ◽

Saliya Ekanayake ◽

Supun Kamburugamuve ◽

...

Keyword(s):

Big Data ◽

Support Vector Machines ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Vector Machines

Download Full-text

Linear Support Vector Machine (SVM) with Stochastic Gradient Descent (SGD) training and multinomial Nave Bayes (NB) in News Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.360363 ◽

2019 ◽

Vol 7 (4) ◽

pp. 360-363

Author(s):

Feroz Ahmed ◽

Shabina Ghafir

Keyword(s):

Support Vector Machine ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Linear Support Vector Machine

Download Full-text

Creating a Chinese suicide dictionary for identifying suicide risk on social media

PeerJ ◽

10.7717/peerj.1455 ◽

2015 ◽

Vol 3 ◽

pp. e1455 ◽

Cited By ~ 10

Author(s):

Meizhen Lv ◽

Ang Li ◽

Tianli Liu ◽

Tingshao Zhu

Keyword(s):

Social Media ◽

Suicide Risk ◽

Classification Performance ◽

Support Vector ◽

Accurate Identification ◽

Vector Machines ◽

Social Media Service ◽

Linguistic Inquiry ◽

Suicide Prevention Programs ◽

Expert Ratings

Introduction.Suicide has become a serious worldwide epidemic. Early detection of individual suicide risk in population is important for reducing suicide rates. Traditional methods are ineffective in identifying suicide risk in time, suggesting a need for novel techniques. This paper proposes to detect suicide risk on social media using a Chinese suicide dictionary.Methods.To build the Chinese suicide dictionary, eight researchers were recruited to select initial words from 4,653 posts published on Sina Weibo (the largest social media service provider in China) and two Chinese sentiment dictionaries (HowNet and NTUSD). Then, another three researchers were recruited to filter out irrelevant words. Finally, remaining words were further expanded using a corpus-based method. After building the Chinese suicide dictionary, we tested its performance in identifying suicide risk on Weibo. First, we made a comparison of the performance in both detecting suicidal expression in Weibo posts and evaluating individual levels of suicide risk between the dictionary-based identifications and the expert ratings. Second, to differentiate between individuals with high and non-high scores on self-rating measure of suicide risk (Suicidal Possibility Scale, SPS), we built Support Vector Machines (SVM) models on the Chinese suicide dictionary and the Simplified Chinese Linguistic Inquiry and Word Count (SCLIWC) program, respectively. After that, we made a comparison of the classification performance between two types of SVM models.Results and Discussion.Dictionary-based identifications were significantly correlated with expert ratings in terms of both detecting suicidal expression (r= 0.507) and evaluating individual suicide risk (r= 0.455). For the differentiation between individuals with high and non-high scores on SPS, the Chinese suicide dictionary (t1:F1= 0.48; t2:F1= 0.56) produced a more accurate identification than SCLIWC (t1:F1= 0.41; t2:F1= 0.48) on different observation windows.Conclusions.This paper confirms that, using social media, it is possible to implement real-time monitoring individual suicide risk in population. Results of this study may be useful to improve Chinese suicide prevention programs and may be insightful for other countries.

Download Full-text

Hyperspectral Image Classification Using Stochastic Gradient Descent Based Support Vector Machine

Learning and Analytics in Intelligent Systems - Biologically Inspired Techniques in Many-Criteria Decision Making ◽

10.1007/978-3-030-39033-4_8 ◽

2020 ◽

pp. 78-84

Author(s):

Pattem Sampurnima ◽

Sandeep Kumar Satapathy ◽

Shruti Mishra ◽

Pradeep Kumar Mallick

Keyword(s):

Support Vector Machine ◽

Image Classification ◽

Gradient Descent ◽

Hyperspectral Image ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Hyperspectral Image Classification

Download Full-text

A Fast Detection Method for Wheat Mould Based on Biophotons

10.1101/2020.10.13.337246 ◽

2020 ◽

Author(s):

Gong Yue-hong ◽

Yang Tie-jun ◽

Liang Yi-tao ◽

Ge Hong-yi ◽

Chen Liang

Keyword(s):

Recognition Rate ◽

Implementation Process ◽

Approximate Entropy ◽

Classification Model ◽

Support Vector ◽

Fast Detection ◽

Materials Used ◽

Specific Implementation ◽

Wheat Kernels

AbstractMould is a common phenomenon in stored wheat. First, mould will decrease the quality of wheat kernels. Second, the mycotoxins metabolized by mycetes are very harmful for humans. Therefore, the fast and accurate examination of wheat mould is vitally important to evaluating its storage quality and subsequent processing safety. Existing methods for examining wheat mould mainly rely on chemical methods, which always involve complex and long pretreatment processes, and the auxiliary chemical materials used in these methods may pollute our environment. To improve the determination of wheat mould, this paper proposed a type of green and nondestructive determination method based on biophotons. The specific implementation process is as follows: first, the ultra-weak luminescence between healthy and mouldy wheat samples are measured repeatedly by a biophotonic analyser, and then, the approximate entropy and multiscale approximate entropy are separately introduced as the main classification features. Finally, the classification performances have been tested using the support vector machine(SVM). The ROC curve of the newly established classification model shows that the highest recognition rate can reach 93.6%, which shows that our proposed classification model is feasible and promising for detecting wheat mould.

Download Full-text

Identifying vulgarity in Bengali social media textual content

PeerJ Computer Science ◽

10.7717/peerj-cs.665 ◽

2021 ◽

Vol 7 ◽

pp. e665

Author(s):

Salim Sazzed

Keyword(s):

Social Media ◽

Gradient Descent ◽

Short Term Memory ◽

Stochastic Gradient Descent ◽

Media Content ◽

Short Term ◽

Long Short Term Memory ◽

Highly Correlated ◽

Negative Sentiment ◽

Textual Content

The presence of abusive and vulgar language in social media has become an issue of increasing concern in recent years. However, research pertaining to the prevalence and identification of vulgar language has remained largely unexplored in low-resource languages such as Bengali. In this paper, we provide the first comprehensive analysis on the presence of vulgarity in Bengali social media content. We develop two benchmark corpora consisting of 7,245 reviews collected from YouTube and manually annotate them into vulgar and non-vulgar categories. The manual annotation reveals the ubiquity of vulgar and swear words in Bengali social media content (i.e., in two corpora), ranging from 20% to 34%. To automatically identify vulgarity, we employ various approaches, such as classical machine learning (CML) classifiers, Stochastic Gradient Descent (SGD) optimizer, a deep learning (DL) based architecture, and lexicon-based methods. Although small in size, we find that the swear/vulgar lexicon is effective at identifying the vulgar language due to the high presence of some swear terms in Bengali social media. We observe that the performances of machine leanings (ML) classifiers are affected by the class distribution of the dataset. The DL-based BiLSTM (Bidirectional Long Short Term Memory) model yields the highest recall scores for identifying vulgarity in both datasets (i.e., in both original and class-balanced settings). Besides, the analysis reveals that vulgarity is highly correlated with negative sentiment in social media comments.

Download Full-text