Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

Background Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. Objective The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. Methods There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric “Screening Efficiency” that were adopted to evaluate model effectiveness. Results Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media.

Download Full-text

Sentiment Analysis on Movie Reviews Using Twitter

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9326 ◽

2020 ◽

Vol 17 (7) ◽

pp. 2869-2875

Author(s):

Sajay Thomas Samuel ◽

Booma Poolan Marikannan

Keyword(s):

Machine Learning ◽

Social Media ◽

Sentiment Analysis ◽

Learning Algorithm ◽

Instant Messaging ◽

Machine Learning Algorithms ◽

Depth Information ◽

Implementation Phase ◽

Online Social Media ◽

Past Data

Machine learning can help people to perform complex tasks and solve problems as it uses historical data to learn its pattern and make predictions based on the past data. This research addresses the problem about movie reviews on social media specifically Twitter; where it will gather the tweets on movie reviews and display a rating based on the sentiment of the tweet. Twitter is an online social media website where people from all walks of life communicate by tweeting short updates without exceeding the character limit which is 240 characters. Twitter is continuously growing as a business and became one of the biggest platform for communication and instant messaging. Due to the large number of users, there are voluminous amounts of data available that can be used for more in depth information and insights and to get the sentiments from analysing the tweets. In today’s world, there are many applications that are using sentiment analysis in various fields such as to gets insights about a particular brand or product. To do sentiment analysis using the traditional ways can be time consuming and becomes very complex. The aim of this research is to investigate about the domain of sentiment analysis and incorporate a machine learning algorithm to create a system that is able to get and display the ratings of a particular movie. The machine learning algorithms used are Naïve Bayes Classifier and SVM. The algorithm with better accuracy will be chosen for the implementation phase.

Download Full-text

Acoustic and language analysis of speech for suicide ideation among US veterans

10.1101/2020.07.08.20147504 ◽

2020 ◽

Author(s):

Anas Belouali ◽

Samir Gupta ◽

Vaibhav Sourirajan ◽

Jiawei Yu ◽

Nathaniel Allen ◽

...

Keyword(s):

Machine Learning ◽

Suicidal Ideation ◽

High Risk ◽

Suicide Ideation ◽

Statistical Significance ◽

Real Life ◽

Mobile App ◽

Machine Learning Algorithms ◽

Self Report ◽

Linguistic Features

U.S. veterans are 1.5 times more likely to die by suicide than Americans who never served in the military. Considering such high rates, there is an urgent need to develop innovative approaches for objective and clinically applicable assessments to detect individuals at high risk. We hypothesize that speech in suicidal veterans has a range of distinctive acoustic and linguistic features. The purpose of this work is to build an automated machine learning and natural language processing tool to screen for suicidality. Veterans made 588 narrative audio recordings via a mobile app in a real-life setting. In addition, veterans completed self-report psychiatric scales and questionnaires. Recordings were analyzed to extract voice characteristics including prosodic, phonation, and glottal. The audios were also transcribed to extract textual features for linguistic analysis. We evaluated the acoustic and linguistic features using both statistical significance and ensemble feature selection. We also examined the performance of different machine learning algorithms on multiple combinations of features to classify suicidal and non-suicidal audios. Random Forest classifier correctly identified suicidal ideation in veterans based on the combined set of acoustic and linguistic features of speech with 86% sensitivity, 70% specificity, and an area under the receiver operating characteristic curve (AUC) of 80%. Speech analysis of audios collected from veterans in everyday life settings using smartphones is a promising approach for suicidal ideation detection. A machine learning classifier may eventually help clinicians identify and monitor high-risk veterans.

Download Full-text

Acoustic and language analysis of speech for suicidal ideation among US veterans

BioData Mining ◽

10.1186/s13040-021-00245-y ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Anas Belouali ◽

Samir Gupta ◽

Vaibhav Sourirajan ◽

Jiawei Yu ◽

Nathaniel Allen ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Suicidal Ideation ◽

High Risk ◽

Real Life ◽

Mobile App ◽

Machine Learning Algorithms ◽

Self Report ◽

Linguistic Features ◽

Us Veterans

Abstract Background Screening for suicidal ideation in high-risk groups such as U.S. veterans is crucial for early detection and suicide prevention. Currently, screening is based on clinical interviews or self-report measures. Both approaches rely on subjects to disclose their suicidal thoughts. Innovative approaches are necessary to develop objective and clinically applicable assessments. Speech has been investigated as an objective marker to understand various mental states including suicidal ideation. In this work, we developed a machine learning and natural language processing classifier based on speech markers to screen for suicidal ideation in US veterans. Methodology Veterans submitted 588 narrative audio recordings via a mobile app in a real-life setting. In addition, participants completed self-report psychiatric scales and questionnaires. Recordings were analyzed to extract voice characteristics including prosodic, phonation, and glottal. The audios were also transcribed to extract textual features for linguistic analysis. We evaluated the acoustic and linguistic features using both statistical significance and ensemble feature selection. We also examined the performance of different machine learning algorithms on multiple combinations of features to classify suicidal and non-suicidal audios. Results A combined set of 15 acoustic and linguistic features of speech were identified by the ensemble feature selection. Random Forest classifier, using the selected set of features, correctly identified suicidal ideation in veterans with 86% sensitivity, 70% specificity, and an area under the receiver operating characteristic curve (AUC) of 80%. Conclusions Speech analysis of audios collected from veterans in everyday life settings using smartphones offers a promising approach for suicidal ideation detection. A machine learning classifier may eventually help clinicians identify and monitor high-risk veterans.

Download Full-text

Feature-Based Comparative Study of Machine Learning Algorithms for Credibility Analysis of Online Social Media Content

10.1007/978-981-16-2641-8_2 ◽

2021 ◽

pp. 13-25

Author(s):

Utkarsh Sharma ◽

Shishir Kumar

Keyword(s):

Machine Learning ◽

Social Media ◽

Comparative Study ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Media Content ◽

Online Social Media ◽

Feature Based

Download Full-text

Machine-Learning-Based Radiomics MRI Model for Survival Prediction of Recurrent Glioblastomas Treated with Bevacizumab

Diagnostics ◽

10.3390/diagnostics11071263 ◽

2021 ◽

Vol 11 (7) ◽

pp. 1263

Author(s):

Samy Ammari ◽

Raoul Sallé de Chou ◽

Tarek Assi ◽

Mehdi Touat ◽

Emilie Chouzenoux ◽

...

Keyword(s):

Machine Learning ◽

Therapeutic Option ◽

Binary Classification ◽

Progression Free Survival ◽

Recurrent Glioblastoma ◽

Machine Learning Algorithms ◽

Survival Prediction ◽

Classification Models ◽

Angiogenic Therapy ◽

Recurrent Gbm

Anti-angiogenic therapy with bevacizumab is a widely used therapeutic option for recurrent glioblastoma (GBM). Nevertheless, the therapeutic response remains highly heterogeneous among GBM patients with discordant outcomes. Recent data have shown that radiomics, an advanced recent imaging analysis method, can help to predict both prognosis and therapy in a multitude of solid tumours. The objective of this study was to identify novel biomarkers, extracted from MRI and clinical data, which could predict overall survival (OS) and progression-free survival (PFS) in GBM patients treated with bevacizumab using machine-learning algorithms. In a cohort of 194 recurrent GBM patients (age range 18–80), radiomics data from pre-treatment T2 FLAIR and gadolinium-injected MRI images along with clinical features were analysed. Binary classification models for OS at 9, 12, and 15 months were evaluated. Our classification models successfully stratified the OS. The AUCs were equal to 0.78, 0.85, and 0.76 on the test sets (0.79, 0.82, and 0.87 on the training sets) for the 9-, 12-, and 15-month endpoints, respectively. Regressions yielded a C-index of 0.64 (0.74) for OS and 0.57 (0.69) for PFS. These results suggest that radiomics could assist in the elaboration of a predictive model for treatment selection in recurrent GBM patients.

Download Full-text

Hybrid features prediction model of movie quality using Multi-machine learning techniques for effective business resource planning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201844 ◽

2021 ◽

Vol 40 (5) ◽

pp. 9361-9382 ◽

Cited By ~ 1

Author(s):

Naeem Iqbal ◽

Rashid Ahmad ◽

Faisal Jamil ◽

Do-Hyeun Kim

Keyword(s):

Machine Learning ◽

Social Media ◽

Resource Planning ◽

Experimental Results ◽

Quality Prediction ◽

Classification Models ◽

Hybrid Features ◽

Social Media Data ◽

Media Data

Quality prediction plays an essential role in the business outcome of the product. Due to the business interest of the concept, it has extensively been studied in the last few years. Advancement in machine learning (ML) techniques and with the advent of robust and sophisticated ML algorithms, it is required to analyze the factors influencing the success of the movies. This paper presents a hybrid features prediction model based on pre-released and social media data features using multiple ML techniques to predict the quality of the pre-released movies for effective business resource planning. This study aims to integrate pre-released and social media data features to form a hybrid features-based movie quality prediction (MQP) model. The proposed model comprises of two different experimental models; (i) predict movies quality using the original set of features and (ii) develop a subset of features based on principle component analysis technique to predict movies success class. This work employ and implement different ML-based classification models, such as Decision Tree (DT), Support Vector Machines with the linear and quadratic kernel (L-SVM and Q-SVM), Logistic Regression (LR), Bagged Tree (BT) and Boosted Tree (BOT), to predict the quality of the movies. Different performance measures are utilized to evaluate the performance of the proposed ML-based classification models, such as Accuracy (AC), Precision (PR), Recall (RE), and F-Measure (FM). The experimental results reveal that BT and BOT classifiers performed accurately and produced high accuracy compared to other classifiers, such as DT, LR, LSVM, and Q-SVM. The BT and BOT classifiers achieved an accuracy of 90.1% and 89.7%, which shows an efficiency of the proposed MQP model compared to other state-of-art- techniques. The proposed work is also compared with existing prediction models, and experimental results indicate that the proposed MQP model performed slightly better compared to other models. The experimental results will help the movies industry to formulate business resources effectively, such as investment, number of screens, and release date planning, etc.

Download Full-text

Multivariate Analysis for the Classification of Chocolate According to its Percentage of Cocoa by Using Terahertz Time-Domain Spectroscopy (THz-TDS)

Proceedings ◽

10.3390/foods_2020-08029 ◽

2020 ◽

Vol 70 (1) ◽

pp. 109

Author(s):

Jimy Oblitas ◽

Jorge Ruiz

Keyword(s):

Machine Learning ◽

Time Domain ◽

Electromagnetic Pulse ◽

Machine Learning Algorithms ◽

Classification Models ◽

Terahertz Time Domain Spectroscopy ◽

Time Domain Spectroscopy ◽

Svm Algorithm ◽

Classification Of Images

Terahertz time-domain spectroscopy is a useful technique for determining some physical characteristics of materials, and is based on selective frequency absorption of a broad-spectrum electromagnetic pulse. In order to investigate the potential of this technology to classify cocoa percentages in chocolates, the terahertz spectra (0.5–10 THz) of five chocolate samples (50%, 60%, 70%, 80% and 90% of cocoa) were examined. The acquired data matrices were analyzed with the MATLAB 2019b application, from which the dielectric function was obtained along with the absorbance curves, and were classified by using 24 mathematical classification models, achieving differentiations of around 93% obtained by the Gaussian SVM algorithm model with a kernel scale of 0.35 and a one-against-one multiclass method. It was concluded that the combined processing and classification of images obtained from the terahertz time-domain spectroscopy and the use of machine learning algorithms can be used to successfully classify chocolates with different percentages of cocoa.

Download Full-text

Prediction of social media effects on students’ academic performance using Machine Learning Algorithms (MLAs)

Journal of Computers in Education ◽

10.1007/s40692-021-00201-z ◽

2021 ◽

Author(s):

Isaac Kofi Nti ◽

Samuel Akyeramfo-Sam ◽

Bright Bediako-Kyeremeh ◽

Sylvester Agyemang

Keyword(s):

Machine Learning ◽

Social Media ◽

Academic Performance ◽

Media Effects ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Classification of unlabeled online media

Scientific Reports ◽

10.1038/s41598-021-85608-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Sakthi Kumar Arul Prakash ◽

Conrad Tucker

Keyword(s):

Social Media ◽

Real World ◽

Graphical Model ◽

Ground Truth ◽

Classification Problem ◽

Machine Learning Algorithms ◽

Social Media Networks ◽

Online Social Media ◽

Wide Range

AbstractThis work investigates the ability to classify misinformation in online social media networks in a manner that avoids the need for ground truth labels. Rather than approach the classification problem as a task for humans or machine learning algorithms, this work leverages user–user and user–media (i.e.,media likes) interactions to infer the type of information (fake vs. authentic) being spread, without needing to know the actual details of the information itself. To study the inception and evolution of user–user and user–media interactions over time, we create an experimental platform that mimics the functionality of real-world social media networks. We develop a graphical model that considers the evolution of this network topology to model the uncertainty (entropy) propagation when fake and authentic media disseminates across the network. The creation of a real-world social media network enables a wide range of hypotheses to be tested pertaining to users, their interactions with other users, and with media content. The discovery that the entropy of user–user and user–media interactions approximate fake and authentic media likes, enables us to classify fake media in an unsupervised learning manner.

Download Full-text

Data-driven inferences of agency-level risk and response communication on COVID-19 through social media-based interactions

Journal of Emergency Management ◽

10.5055/jem.0589 ◽

2021 ◽

Vol 19 (7) ◽

pp. 59-82

Author(s):

Md Ashraf Ahmed, PhD Candidate ◽

Arif Mohaimin Sadri, PhD ◽

M. Hadi Amini, PhD, DEng

Keyword(s):

Public Health ◽

Social Media ◽

Information Dissemination ◽

Topic Model ◽

Face Mask ◽

Community Response ◽

Machine Learning Algorithms ◽

Data Driven ◽

Contact Tracing ◽

Online Social Media

Risk perception and risk averting behaviors of public agencies in the emergence and spread of COVID-19 can be retrieved through online social media (Twitter), and such interactions can be echoed in other information outlets. This study collected time-sensitive online social media data and analyzed patterns of health risk communication of public health and emergency agencies in the emergence and spread of novel coronavirus using data-driven methods. The major focus is toward understanding how policy-making agencies communicate risk and response information through social media during a pandemic and influence community response—ie, timing of lockdown, timing of reopening, etc.—and disease outbreak indicators—ie, number of confirmed cases and number of deaths. Twitter data of six major public organizations (1,000-4,500 tweets per organization) are collected from February 21, 2020 to June 6, 2020. Several machine learning algorithms, including dynamic topic model and sentiment analysis, are applied over time to identify the topic dynamics over the specific timeline of the pandemic. Organizations emphasized on various topics—eg, importance of wearing face mask, home quarantine, understanding the symptoms, social distancing and contact tracing, emerging community transmission, lack of personal protective equipment, COVID-19 testing and medical supplies, effect of tobacco, pandemic stress management, increasing hospitalization rate, upcoming hurricane season, use of convalescent plasma for COVID-19 treatment, maintaining hygiene, and the role of healthcare podcast in different timeline. The findings can benefit emergency management, policymakers, and public health agencies to identify targeted information dissemination policies for public with diverse needs based on how local, federal, and international agencies reacted to COVID-19.

Download Full-text