scholarly journals Detecting Recovery Problems Just in Time: Application of Automated Linguistic Analysis and Supervised Machine Learning to an Online Substance Abuse Forum (Preprint)

2018 ◽  
Author(s):  
Rachel Kornfield ◽  
Prathusha K Sarma ◽  
Dhavan V Shah ◽  
Fiona McTavish ◽  
Gina Landucci ◽  
...  

BACKGROUND Online discussion forums allow those in addiction recovery to seek help through text-based messages, including when facing triggers to drink or use drugs. Trained staff (or “moderators”) may participate within these forums to offer guidance and support when participants are struggling but must expend considerable effort to continually review new content. Demands on moderators limit the scalability of evidence-based digital health interventions. OBJECTIVE Automated identification of recovery problems could allow moderators to engage in more timely and efficient ways with participants who are struggling. This paper aimed to investigate whether computational linguistics and supervised machine learning can be applied to successfully flag, in real time, those discussion forum messages that moderators find most concerning. METHODS Training data came from a trial of a mobile phone-based health intervention for individuals in recovery from alcohol use disorder, with human coders labeling discussion forum messages according to whether or not authors mentioned problems in their recovery process. Linguistic features of these messages were extracted via several computational techniques: (1) a Bag-of-Words approach, (2) the dictionary-based Linguistic Inquiry and Word Count program, and (3) a hybrid approach combining the most important features from both Bag-of-Words and Linguistic Inquiry and Word Count. These features were applied within binary classifiers leveraging several methods of supervised machine learning: support vector machines, decision trees, and boosted decision trees. Classifiers were evaluated in data from a later deployment of the recovery support intervention. RESULTS To distinguish recovery problem disclosures, the Bag-of-Words approach relied on domain-specific language, including words explicitly linked to substance use and mental health (“drink,” “relapse,” “depression,” and so on), whereas the Linguistic Inquiry and Word Count approach relied on language characteristics such as tone, affect, insight, and presence of quantifiers and time references, as well as pronouns. A boosted decision tree classifier, utilizing features from both Bag-of-Words and Linguistic Inquiry and Word Count performed best in identifying problems disclosed within the discussion forum, achieving 88% sensitivity and 82% specificity in a separate cohort of patients in recovery. CONCLUSIONS Differences in language use can distinguish messages disclosing recovery problems from other message types. Incorporating machine learning models based on language use allows real-time flagging of concerning content such that trained staff may engage more efficiently and focus their attention on time-sensitive issues.

2021 ◽  
Vol 11 (15) ◽  
pp. 6728
Author(s):  
Muhammad Asfand Hafeez ◽  
Muhammad Rashid ◽  
Hassan Tariq ◽  
Zain Ul Abideen ◽  
Saud S. Alotaibi ◽  
...  

Classification and regression are the major applications of machine learning algorithms which are widely used to solve problems in numerous domains of engineering and computer science. Different classifiers based on the optimization of the decision tree have been proposed, however, it is still evolving over time. This paper presents a novel and robust classifier based on a decision tree and tabu search algorithms, respectively. In the aim of improving performance, our proposed algorithm constructs multiple decision trees while employing a tabu search algorithm to consistently monitor the leaf and decision nodes in the corresponding decision trees. Additionally, the used tabu search algorithm is responsible to balance the entropy of the corresponding decision trees. For training the model, we used the clinical data of COVID-19 patients to predict whether a patient is suffering. The experimental results were obtained using our proposed classifier based on the built-in sci-kit learn library in Python. The extensive analysis for the performance comparison was presented using Big O and statistical analysis for conventional supervised machine learning algorithms. Moreover, the performance comparison to optimized state-of-the-art classifiers is also presented. The achieved accuracy of 98%, the required execution time of 55.6 ms and the area under receiver operating characteristic (AUROC) for proposed method of 0.95 reveals that the proposed classifier algorithm is convenient for large datasets.


2019 ◽  
Vol 8 (11) ◽  
pp. e298111473
Author(s):  
Hugo Kenji Rodrigues Okada ◽  
Andre Ricardo Nascimento das Neves ◽  
Ricardo Shitsuka

Decision trees are data structures or computational methods that enable nonparametric supervised machine learning and are used in classification and regression tasks. The aim of this paper is to present a comparison between the decision tree induction algorithms C4.5 and CART. A quantitative study is performed in which the two methods are compared by analyzing the following aspects: operation and complexity. The experiments presented practically equal hit percentages in the execution time for tree induction, however, the CART algorithm was approximately 46.24% slower than C4.5 and was considered to be more effective.


2018 ◽  
Vol 5 (suppl_1) ◽  
pp. S618-S618
Author(s):  
Philip Zachariah ◽  
Elioth Mirsha Sanabria Buenaventura ◽  
Jianfang Liu ◽  
Bevin Cohen ◽  
David Yao ◽  
...  

Sensors ◽  
2020 ◽  
Vol 20 (3) ◽  
pp. 800 ◽  
Author(s):  
Irshad Khan ◽  
Seonhwa Choi ◽  
Young-Woo Kwon

Detecting earthquakes using smartphones or IoT devices in real-time is an arduous and challenging task, not only because it is constrained with the hard real-time issue but also due to the similarity of earthquake signals and the non-earthquake signals (i.e., noise or other activities). Moreover, the variety of human activities also makes it more difficult when a smartphone is used as an earthquake detecting sensor. To that end, in this article, we leverage a machine learning technique with earthquake features rather than traditional seismic methods. First, we split the detection task into two categories including static environment and dynamic environment. Then, we experimentally evaluate different features and propose the most appropriate machine learning model and features for the static environment to tackle the issue of noisy components and detect earthquakes in real-time with less false alarm rates. The experimental result of the proposed model shows promising results not only on the given dataset but also on the unseen data pointing to the generalization characteristics of the model. Finally, we demonstrate that the proposed model can be also used in the dynamic environment if it is trained with different dataset.


2015 ◽  
Vol 23 (e1) ◽  
pp. e2-e10 ◽  
Author(s):  
Sean Barnes ◽  
Eric Hamrock ◽  
Matthew Toerper ◽  
Sauleh Siddiqui ◽  
Scott Levin

Abstract Objective Hospitals are challenged to provide timely patient care while maintaining high resource utilization. This has prompted hospital initiatives to increase patient flow and minimize nonvalue added care time. Real-time demand capacity management (RTDC) is one such initiative whereby clinicians convene each morning to predict patients able to leave the same day and prioritize their remaining tasks for early discharge. Our objective is to automate and improve these discharge predictions by applying supervised machine learning methods to readily available health information. Materials and Methods The authors use supervised machine learning methods to predict patients’ likelihood of discharge by 2 p.m. and by midnight each day for an inpatient medical unit. Using data collected over 8000 patient stays and 20 000 patient days, the predictive performance of the model is compared to clinicians using sensitivity, specificity, Youden’s Index (i.e., sensitivity + specificity – 1), and aggregate accuracy measures. Results The model compared to clinician predictions demonstrated significantly higher sensitivity ( P  < .01), lower specificity ( P  < .01), and a comparable Youden Index ( P  > .10). Early discharges were less predictable than midnight discharges. The model was more accurate than clinicians in predicting the total number of daily discharges and capable of ranking patients closest to future discharge. Conclusions There is potential to use readily available health information to predict daily patient discharges with accuracies comparable to clinician predictions. This approach may be used to automate and support daily RTDC predictions aimed at improving patient flow.


2019 ◽  
Vol 41 (1) ◽  
pp. 37-52
Author(s):  
Tongxin Sun ◽  
Bu Zhong

A computer-aided semantic analysis (using Linguistic Inquiry and Word Count [LIWC]) examined how newspaper coverage of air pollution from 2014 to 2017 may affect the public agenda in four cities—Hong Kong, London, Pittsburgh, and Tianjin. Results show that after controlling for the real-time air quality, the agenda-setting effect was found in Hong Kong, London, and Pittsburgh, but not Tianjin. Tianjin’s reports also contained more future-framed words but fewer present-framed words than other cities.


2020 ◽  
Vol 12 (6) ◽  
pp. 970 ◽  
Author(s):  
Claudia Corradino ◽  
Gaetana Ganci ◽  
Annalisa Cappello ◽  
Giuseppe Bilotta ◽  
Sonia Calvari ◽  
...  

Detecting, locating and characterizing volcanic eruptions at an early stage provides the best means to plan and mitigate against potential hazards. Here, we present an automatic system which is able to recognize and classify the main types of eruptive activity occurring at Mount Etna by exploiting infrared images acquired using thermal cameras installed around the volcano. The system employs a machine learning approach based on a Decision Tree tool and a Bag of Words-based classifier. The Decision Tree provides information on the visibility level of the monitored area, while the Bag of Words-based classifier detects the onset of eruptive activity and recognizes the eruption type as either explosion and/or lava flow or plume degassing/ash. Applied in real-time to each image of each of the thermal cameras placed around Etna, the proposed system provides two outputs, namely, visibility level and recognized eruptive activity status. By merging these outcomes, the monitored phenomena can be fully described from different perspectives to acquire more in-depth information in real time and in an automatic way.


Sign in / Sign up

Export Citation Format

Share Document