scholarly journals Creating a Chinese suicide dictionary for identifying suicide risk on social media

PeerJ ◽  
2015 ◽  
Vol 3 ◽  
pp. e1455 ◽  
Author(s):  
Meizhen Lv ◽  
Ang Li ◽  
Tianli Liu ◽  
Tingshao Zhu

Introduction.Suicide has become a serious worldwide epidemic. Early detection of individual suicide risk in population is important for reducing suicide rates. Traditional methods are ineffective in identifying suicide risk in time, suggesting a need for novel techniques. This paper proposes to detect suicide risk on social media using a Chinese suicide dictionary.Methods.To build the Chinese suicide dictionary, eight researchers were recruited to select initial words from 4,653 posts published on Sina Weibo (the largest social media service provider in China) and two Chinese sentiment dictionaries (HowNet and NTUSD). Then, another three researchers were recruited to filter out irrelevant words. Finally, remaining words were further expanded using a corpus-based method. After building the Chinese suicide dictionary, we tested its performance in identifying suicide risk on Weibo. First, we made a comparison of the performance in both detecting suicidal expression in Weibo posts and evaluating individual levels of suicide risk between the dictionary-based identifications and the expert ratings. Second, to differentiate between individuals with high and non-high scores on self-rating measure of suicide risk (Suicidal Possibility Scale, SPS), we built Support Vector Machines (SVM) models on the Chinese suicide dictionary and the Simplified Chinese Linguistic Inquiry and Word Count (SCLIWC) program, respectively. After that, we made a comparison of the classification performance between two types of SVM models.Results and Discussion.Dictionary-based identifications were significantly correlated with expert ratings in terms of both detecting suicidal expression (r= 0.507) and evaluating individual suicide risk (r= 0.455). For the differentiation between individuals with high and non-high scores on SPS, the Chinese suicide dictionary (t1:F1= 0.48; t2:F1= 0.56) produced a more accurate identification than SCLIWC (t1:F1= 0.41; t2:F1= 0.48) on different observation windows.Conclusions.This paper confirms that, using social media, it is possible to implement real-time monitoring individual suicide risk in population. Results of this study may be useful to improve Chinese suicide prevention programs and may be insightful for other countries.

2020 ◽  
Vol 4 (2) ◽  
pp. 329-335
Author(s):  
Rusydi Umar ◽  
Imam Riadi ◽  
Purwono

The failure of most startups in Indonesia is caused by team performance that is not solid and competent. Programmers are an integral profession in a startup team. The development of social media can be used as a strategic tool for recruiting the best programmer candidates in a company. This strategic tool is in the form of an automatic classification system of social media posting from prospective programmers. The classification results are expected to be able to predict the performance patterns of each candidate with a predicate of good or bad performance. The classification method with the best accuracy needs to be chosen in order to get an effective strategic tool so that a comparison of several methods is needed. This study compares classification methods including the Support Vector Machines (SVM) algorithm, Random Forest (RF) and Stochastic Gradient Descent (SGD). The classification results show the percentage of accuracy with k = 10 cross validation for the SVM algorithm reaches 81.3%, RF at 74.4%, and SGD at 80.1% so that the SVM method is chosen as a model of programmer performance classification on social media activities.


2018 ◽  
Vol 5 (9) ◽  
pp. 180160 ◽  
Author(s):  
Jian-hong Yang ◽  
Huai-ying Fang ◽  
Ren-cheng Zhang ◽  
Kai Yang

Arc faults in low-voltage electrical circuits are the main hidden cause of electric fires. Accurate identification of arc faults is essential for safe power consumption. In this paper, a detection algorithm for arc faults is tested in a low-voltage circuit. With capacitance coupling and a logarithmic detector, the high-frequency radiation characteristics of arc faults can be extracted. A rapid method for computing the current waveform slope characteristics of an arc fault provides another characteristic. Current waveform periodic integral characteristics can be extracted according to asymmetries of the arc faults. These three characteristics are used to develop a detection algorithm of arc faults based on multiinformation fusion and support vector machine learning models. The tests indicated that for series arc faults with single and combination loads and for parallel arc faults between metallic contacts and along carbonization paths, the recognition algorithm could effectively avoid the problems of crosstalk and signal loss during arc fault detection.


2021 ◽  
Vol 39 (4) ◽  
pp. 1190-1197
Author(s):  
Y. Ibrahim ◽  
E. Okafor ◽  
B. Yahaya

Manual grid-search tuning of machine learning hyperparameters is very time-consuming. Hence, to curb this problem, we propose the use of a genetic algorithm (GA) for the selection of optimal radial-basis-function based support vector machine (RBF-SVM) hyperparameters; regularization parameter C and cost-factor γ. The resulting optimal parameters were used during the training of face recognition models. To train the models, we independently extracted features from the ORL face image dataset using local binary patterns (handcrafted) and deep learning architectures (pretrained variants of VGGNet). The resulting features were passed as input to either linear-SVM or optimized RBF-SVM. The results show that the models from optimized RBFSVM combined with deep learning or hand-crafted features yielded performances that surpass models obtained from Linear-SVM combined with the aforementioned features in most of the data splits. The study demonstrated that it is profitable to optimize the hyperparameters of an SVM to obtain the best classification performance. Keywords: Face Recognition, Feature Extraction, Local Binary Patterns, Transfer Learning, Genetic Algorithm and Support Vector  Machines.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Abbas Akkasi ◽  
Ekrem Varoğlu ◽  
Nazife Dimililer

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.


2012 ◽  
Vol 9 (3) ◽  
pp. 33-43 ◽  
Author(s):  
Paulo Gaspar ◽  
Jaime Carbonell ◽  
José Luís Oliveira

Summary Classifying biological data is a common task in the biomedical context. Predicting the class of new, unknown information allows researchers to gain insight and make decisions based on the available data. Also, using classification methods often implies choosing the best parameters to obtain optimal class separation, and the number of parameters might be large in biological datasets.Support Vector Machines provide a well-established and powerful classification method to analyse data and find the minimal-risk separation between different classes. Finding that separation strongly depends on the available feature set and the tuning of hyper-parameters. Techniques for feature selection and SVM parameters optimization are known to improve classification accuracy, and its literature is extensive.In this paper we review the strategies that are used to improve the classification performance of SVMs and perform our own experimentation to study the influence of features and hyper-parameters in the optimization process, using several known kernels.


2013 ◽  
Vol 22 (01) ◽  
pp. 1250038 ◽  
Author(s):  
PEERAPON VATEEKUL ◽  
SAREEWAN DENDAMRONGVIT ◽  
MIROSLAV KUBAT

In “multi-label domains,” where the same example can simultaneously belong to two or more classes, it is customary to induce a separate binary classifier for each class, and then use them all in parallel. As a result, some of these classifiers are induced from imbalanced training sets where one class outnumbers the other – a circumstance known to hurt some machine learning paradigms. In the case of Support Vector Machines (SVM), this suboptimal behavior is explained by the fact that SVM seeks to minimize error rate, a criterion that is in domains of this type misleading. This is why several research groups have studied mechanisms to readjust the bias of SVM's hyperplane. The best of these achieves very good classification performance at the price of impractically high computational costs. We propose here an improvement where these cost are reduced to a small fraction without significantly impairing classification.


Author(s):  
Muhammet Sinan Basarslan ◽  
Fatih Kayaalp

Social media has become an important part of our everyday life due to the widespread use of the Internet. Of the social media services, Twitter is among the most used ones around the world. People share their opinions by writing tweets about numerous subjects, such as politics, sports, economy, etc. Millions of tweets per day create a huge dataset, which drew attention of the data scientists to focus on these data for sentiment analysis. The sentiment analysis focuses to identify the social media posts of users about a specific topic and categorize them as positive, negative or neutral. Thus, the study aims to investigate the effect of types of text representation on the performance of sentiment analysis. In this study, two datasets were used in the experiments. The first one is the user reviews about movies from the IMDB, which has been labeled by Kotzias, and the second one is the Twitter tweets, including the tweets of users about health topic in English in 2019, collected using the Twitter API. The Python programming language was used in the study both for implementing the classification models using the Naïve Bayes (NB), Support Vector Machines (SVM) and Artificial Neural Networks (ANN) algorithms, and for categorizing the sentiments as positive, negative and neutral. The feature extraction from the dataset was performed using Term Frequency-Inverse Document Frequency (TF-IDF) and Word2Vec (W2V) modeling techniques. The success percentages of the classification algorithms were compared at the end. According to the experimental results, Artificial Neural Network had the best accuracy performance in both datasets compared to the others.


2020 ◽  
Vol 10 (19) ◽  
pp. 6979
Author(s):  
Minho Ryu ◽  
Kichun Lee

Support vector machines (SVMs) are a well-known classifier due to their superior classification performance. They are defined by a hyperplane, which separates two classes with the largest margin. In the computation of the hyperplane, however, it is necessary to solve a quadratic programming problem. The storage cost of a quadratic programming problem grows with the square of the number of training sample points, and the time complexity is proportional to the cube of the number in general. Thus, it is worth studying how to reduce the training time of SVMs without compromising the performance to prepare for sustainability in large-scale SVM problems. In this paper, we proposed a novel data reduction method for reducing the training time by combining decision trees and relative support distance. We applied a new concept, relative support distance, to select good support vector candidates in each partition generated by the decision trees. The selected support vector candidates improved the training speed for large-scale SVM problems. In experiments, we demonstrated that our approach significantly reduced the training time while maintaining good classification performance in comparison with existing approaches.


Sign in / Sign up

Export Citation Format

Share Document