Comparison of SVM, RF and SGD Methods for Determination of Programmer's Performance Classification Model in Social Media Activities

2020 ◽  
Vol 4 (2) ◽  
pp. 329-335
Author(s):  
Rusydi Umar ◽  
Imam Riadi ◽  
Purwono

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.

Electronics ◽  
2019 ◽  
Vol 8 (6) ◽  
pp. 631 ◽  
Author(s):  
Felipe F. Lopes ◽  
João Canas Ferreira ◽  
Marcelo A. C. Fernandes

Sequential Minimal Optimization (SMO) is the traditional training algorithm for Support Vector Machines (SVMs). However, SMO does not scale well with the size of the training set. For that reason, Stochastic Gradient Descent (SGD) algorithms, which have better scalability, are a better option for massive data mining applications. Furthermore, even with the use of SGD, training times can become extremely large depending on the data set. For this reason, accelerators such as Field-programmable Gate Arrays (FPGAs) are used. This work describes an implementation in hardware, using FPGA, of a fully parallel SVM using Stochastic Gradient Descent. The proposed FPGA implementation of an SVM with SGD presents speedups of more than 10,000× relative to software implementations running on a quad-core processor and up to 319× compared to state-of-the-art FPGA implementations while requiring fewer hardware resources. The results show that the proposed architecture is a viable solution for highly demanding problems such as those present in big data analysis.


2020 ◽  
Vol 15 (3) ◽  
pp. 393-409
Author(s):  
Raluca Dana Caplescu ◽  
Ana-Maria Panaite ◽  
Daniel Traian Pele ◽  
Vasile Alecsandru Strat

AbstractRecent increase in peer-to-peer lending prompted for development of models to separate good and bad clients to mitigate risks both for lenders and for the platforms. The rapidly increasing body of literature provides several comparisons between various models. Among the most frequently employed ones are logistic regression, Support Vector Machines, neural networks and decision tree-based models. Among them, logistic regression has proved to be a strong candidate both because its good performance and due to its high explainability. The present paper aims to compare four pairs of models (for imbalanced and under-sampled data) meant to predict charged off clients by optimizing F1 score. We found that, if the data is balanced, Logistic Regression, both simple and with Stochastic Gradient Descent, outperforms LightGBM and K-Nearest Neighbors in optimizing F1 score. We chose this metric as it provides balance between the interests of the lenders and those of the platform. Loan term, debt-to-income ratio and number of accounts were found to be important positively related predictors of risk of charge off. At the other end of the spectrum, by far the strongest impact on charge off probability is that of the FICO score. The final number of features retained by the two models differs very much, because, although both models use Lasso for feature selection, Stochastic Gradient Descent Logistic Regression uses a stronger regularization. The analysis was performed using Python (numpy, pandas, sklearn and imblearn).


PeerJ ◽  
2015 ◽  
Vol 3 ◽  
pp. e1455 ◽  
Author(s):  
Meizhen Lv ◽  
Ang Li ◽  
Tianli Liu ◽  
Tingshao Zhu

Introduction.Suicide has become a serious worldwide epidemic. Early detection of individual suicide risk in population is important for reducing suicide rates. Traditional methods are ineffective in identifying suicide risk in time, suggesting a need for novel techniques. This paper proposes to detect suicide risk on social media using a Chinese suicide dictionary.Methods.To build the Chinese suicide dictionary, eight researchers were recruited to select initial words from 4,653 posts published on Sina Weibo (the largest social media service provider in China) and two Chinese sentiment dictionaries (HowNet and NTUSD). Then, another three researchers were recruited to filter out irrelevant words. Finally, remaining words were further expanded using a corpus-based method. After building the Chinese suicide dictionary, we tested its performance in identifying suicide risk on Weibo. First, we made a comparison of the performance in both detecting suicidal expression in Weibo posts and evaluating individual levels of suicide risk between the dictionary-based identifications and the expert ratings. Second, to differentiate between individuals with high and non-high scores on self-rating measure of suicide risk (Suicidal Possibility Scale, SPS), we built Support Vector Machines (SVM) models on the Chinese suicide dictionary and the Simplified Chinese Linguistic Inquiry and Word Count (SCLIWC) program, respectively. After that, we made a comparison of the classification performance between two types of SVM models.Results and Discussion.Dictionary-based identifications were significantly correlated with expert ratings in terms of both detecting suicidal expression (r= 0.507) and evaluating individual suicide risk (r= 0.455). For the differentiation between individuals with high and non-high scores on SPS, the Chinese suicide dictionary (t1:F1= 0.48; t2:F1= 0.56) produced a more accurate identification than SCLIWC (t1:F1= 0.41; t2:F1= 0.48) on different observation windows.Conclusions.This paper confirms that, using social media, it is possible to implement real-time monitoring individual suicide risk in population. Results of this study may be useful to improve Chinese suicide prevention programs and may be insightful for other countries.


2020 ◽  
Author(s):  
Gong Yue-hong ◽  
Yang Tie-jun ◽  
Liang Yi-tao ◽  
Ge Hong-yi ◽  
Chen Liang

AbstractMould is a common phenomenon in stored wheat. First, mould will decrease the quality of wheat kernels. Second, the mycotoxins metabolized by mycetes are very harmful for humans. Therefore, the fast and accurate examination of wheat mould is vitally important to evaluating its storage quality and subsequent processing safety. Existing methods for examining wheat mould mainly rely on chemical methods, which always involve complex and long pretreatment processes, and the auxiliary chemical materials used in these methods may pollute our environment. To improve the determination of wheat mould, this paper proposed a type of green and nondestructive determination method based on biophotons. The specific implementation process is as follows: first, the ultra-weak luminescence between healthy and mouldy wheat samples are measured repeatedly by a biophotonic analyser, and then, the approximate entropy and multiscale approximate entropy are separately introduced as the main classification features. Finally, the classification performances have been tested using the support vector machine(SVM). The ROC curve of the newly established classification model shows that the highest recognition rate can reach 93.6%, which shows that our proposed classification model is feasible and promising for detecting wheat mould.


2021 ◽  
Vol 7 ◽  
pp. e665
Author(s):  
Salim Sazzed

The presence of abusive and vulgar language in social media has become an issue of increasing concern in recent years. However, research pertaining to the prevalence and identification of vulgar language has remained largely unexplored in low-resource languages such as Bengali. In this paper, we provide the first comprehensive analysis on the presence of vulgarity in Bengali social media content. We develop two benchmark corpora consisting of 7,245 reviews collected from YouTube and manually annotate them into vulgar and non-vulgar categories. The manual annotation reveals the ubiquity of vulgar and swear words in Bengali social media content (i.e., in two corpora), ranging from 20% to 34%. To automatically identify vulgarity, we employ various approaches, such as classical machine learning (CML) classifiers, Stochastic Gradient Descent (SGD) optimizer, a deep learning (DL) based architecture, and lexicon-based methods. Although small in size, we find that the swear/vulgar lexicon is effective at identifying the vulgar language due to the high presence of some swear terms in Bengali social media. We observe that the performances of machine leanings (ML) classifiers are affected by the class distribution of the dataset. The DL-based BiLSTM (Bidirectional Long Short Term Memory) model yields the highest recall scores for identifying vulgarity in both datasets (i.e., in both original and class-balanced settings). Besides, the analysis reveals that vulgarity is highly correlated with negative sentiment in social media comments.


Sign in / Sign up

Export Citation Format

Share Document