scholarly journals Predicting Cardiovascular Risk Using Social Media Data: Performance Evaluation of Machine-Learning Models

JMIR Cardio ◽  
10.2196/24473 ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. e24473
Author(s):  
Anietie U Andy ◽  
Sharath C Guntuku ◽  
Srinath Adusumalli ◽  
David A Asch ◽  
Peter W Groeneveld ◽  
...  

Background Current atherosclerotic cardiovascular disease (ASCVD) predictive models have limitations; thus, efforts are underway to improve the discriminatory power of ASCVD models. Objective We sought to evaluate the discriminatory power of social media posts to predict the 10-year risk for ASCVD as compared to that of pooled cohort risk equations (PCEs). Methods We consented patients receiving care in an urban academic emergency department to share access to their Facebook posts and electronic medical records (EMRs). We retrieved Facebook status updates up to 5 years prior to study enrollment for all consenting patients. We identified patients (N=181) without a prior history of coronary heart disease, an ASCVD score in their EMR, and more than 200 words in their Facebook posts. Using Facebook posts from these patients, we applied a machine-learning model to predict 10-year ASCVD risk scores. Using a machine-learning model and a psycholinguistic dictionary, Linguistic Inquiry and Word Count, we evaluated if language from posts alone could predict differences in risk scores and the association of certain words with risk categories, respectively. Results The machine-learning model predicted the 10-year ASCVD risk scores for the categories <5%, 5%-7.4%, 7.5%-9.9%, and ≥10% with area under the curve (AUC) values of 0.78, 0.57, 0.72, and 0.61, respectively. The machine-learning model distinguished between low risk (<10%) and high risk (>10%) with an AUC of 0.69. Additionally, the machine-learning model predicted the ASCVD risk score with Pearson r=0.26. Using Linguistic Inquiry and Word Count, patients with higher ASCVD scores were more likely to use words associated with sadness (r=0.32). Conclusions Language used on social media can provide insights about an individual’s ASCVD risk and inform approaches to risk modification.


2020 ◽  
Author(s):  
Anietie Andy ◽  
Sharath Guntuku ◽  
Srinath Adusumalli ◽  
David Asch ◽  
Peter Groeneveld ◽  
...  

BACKGROUND Current Atherosclerotic cardiovascular disease (ASCVD) predictive models have limitations, efforts are underway to improve the discriminatory power of ASCVD models. OBJECTIVE We sought to evaluate the discriminatory power of using social media posts to predict 10-year risk for ASCVD as compared to the pooled cohort risk equations (PCEs) METHODS We consented patients receiving care in an urban academic emergency department to share access to their Facebook posts and electronic medical records (EMR). We retrieved Facebook status updates up to 5-years prior to study enrollment for all consenting patients. We identified patients (n=181) without a prior history of coronary heart disease, an ASCVD score in their EMR, and more than 200 words in their Facebook posts. Using Facebook posts from these patients, we applied a machine learning (ML) model to predict 10-year ASCVD risk scores. Using a ML model and a psycholinguistic dictionary, Linguistic Inquiry and Word Count (LIWC) we evaluated if language from posts alone could predict differences in risk scores and the association of certain words with risk categories, respectively. RESULTS A ML model predicted the 10-year ASCVD risk scores for these categories: <5%, 5% - 7.4%, 7.5% - 9.9%, and >=10% with AUC’s: 0.78, 0.57, 0.72, and 0.61, respectively. A ML model distinguished between low risk (<10%) and high risk (>10%) with an AUC of 0.69. Additionally, a ML model predicted the ASCVD risk score with Pearson’s r = 0.26. Using LIWC, patients with higher ASCVD scores were more likely to use words associated with sadness (Pearson’s r = 0.32). CONCLUSIONS Language used on social media can provide insights about an individual’s ASCVD risk and inform approaches to risk modification.



2021 ◽  
pp. 1-13
Author(s):  
C S Pavan Kumar ◽  
L D Dhinesh Babu

Sentiment analysis is widely used to retrieve the hidden sentiments in medical discussions over Online Social Networking platforms such as Twitter, Facebook, Instagram. People often tend to convey their feelings concerning their medical problems over social media platforms. Practitioners and health care workers have started to observe these discussions to assess the impact of health-related issues among the people. This helps in providing better care to improve the quality of life. Dementia is a serious disease in western countries like the United States of America and the United Kingdom, and the respective governments are providing facilities to the affected people. There is much chatter over social media platforms concerning the patients’ care, healthy measures to be followed to avoid disease, check early indications. These chatters have to be carefully monitored to help the officials take necessary precautions for the betterment of the affected. A novel Feature engineering architecture that involves feature-split for sentiment analysis of medical chatter over online social networks with the pipeline is proposed that can be used on any Machine Learning model. The proposed model used the fuzzy membership function in refining the outputs. The machine learning model has obtained sentiment score is subjected to fuzzification and defuzzification by using the trapezoid membership function and center of sums method, respectively. Three datasets are considered for comparison of the proposed and the regular model. The proposed approach delivered better results than the normal approach and is proved to be an effective approach for sentiment analysis of medical discussions over online social networks.



2020 ◽  
Vol 1 (2) ◽  
pp. 61-66
Author(s):  
Febri Astiko ◽  
Achmad Khodar

This study aims to design a machine learning model of sentiment analysis on Indosat Ooredoo service reviews on social media twitter using the Naive Bayes algorithm as a classifier of positive and negative labels. This sentiment analysis uses machine learning to get patterns an model that can be used again to predict new data.



2020 ◽  
Author(s):  
Athira B ◽  
Josette Jones ◽  
Sumam Mary Idicula ◽  
Anand Kulanthaivel ◽  
Sunandan Chakraborty ◽  
...  

BACKGROUND Widespread influence on social media has its ramifications on all walks of life over the last few decades. Interestingly enough, the healthcare sector is a significant beneficiary of the reports and pronouncements that appear on social media. Although medics and other health professionals are the final decision-makers, advice or recommendations from kindred patients has consequential role. In full appreciation of the current trend, the present paper explores the topics pertaining to the patients, diagnosed with breast cancer as well as the survivors, who are discussing on online fora. OBJECTIVE The study examines the online forum of Breast Cancer.org (BCO), automatically maps discussion entries to formal topics, and proposes a machine learning model to characterize the topics in the health-related discussion, so as to elicit meaningful deliberations. Therefore, the study of communication messages draws conclusions about what matters to the patients. METHODS Manual annotation was made in the posts of a few randomly selected forums. To explore the topics of breast cancer patients and survivors, 736 posts are selected for semantic annotation. The entire process was automated using machine learning model falling into category of supervised learning algorithms. The effectiveness of those algorithms used for above process has been compared. RESULTS The method could classify following 8-high level topics, such as writing medication reviews, explaining the adverse effects of medication, clinician knowledge, various treatment options, seeking and supporting various matters, diagnostic procedures, financial issues and implications in everyday life. The model viz. Ensembled Neural Network (ENN) achieved a promising predicted score of 83.4 % F1-score among four different models. CONCLUSIONS The research was able to segregate and name the posts all into a set of 8 classes and supported by the efficient scheme for encoding text to vectors, the current machine learning models are shown to give impressive performance in modelling the annotation process.



2021 ◽  
pp. 656-669
Author(s):  
David Langley ◽  
Caoimhe Reidy ◽  
Mark Towey ◽  
Manisha ◽  
Denis Dennehy


In recent years, digital platform forums where question and answers are being discussed are attracting more number of users. Many discussions on these forums would be repetitive nature. Such duplicate questions were provided by Quora as a competition on Kaggle. It is observed that the dataset provided by Quora, requires many modifications before training machine learning models to obtain a good accuracy. These modifications include feature extraction, vectorization and tokenization after which the data is ready for training desired models. While analyzing each model after prediction, it gives plenty of information about its efficiency and many other factors. Later, these information of different models are compared and helps to choose the best model. These models later can be combined and used as a single model with best accuracy. In this paper, a Machine Learning model which will predict duplicate questions is proposed



Author(s):  
K. Bret Staudt Willet ◽  
Brooks D. Willet

Twitter has become a hub for many different types of educational conversations, denoted by hashtags and organized by a variety of affinities. Researchers have described these educational conversations on Twitter as sites for teacher professional development. Here, we studied #Edchat—one of the oldest and busiest Twitter educational hashtags—to examine the content of contributions for evidence of professional purposes. We collected tweets containing the text “#edchat” from October 1, 2017 to June 5, 2018, resulting in a dataset of 1,228,506 unique tweets from 196,263 different contributors. Through initial human-coded content analysis, we sorted a stratified random sample of 1,000 tweets into four inductive categories: tweets demonstrating evidence of different professional purposes related to (a) self, (b) others, (c) mutual engagement, and (d) everything else. We found 65% of the tweets in our #Edchat sample demonstrated purposes related to others, 25% demonstrated purposes related to self, and 4% of tweets demonstrated purposes related to mutual engagement. Our initial method was too time intensive—it would be untenable to collect tweets from 339 known Twitter education hashtags and conduct human-coded content analysis of each. Therefore, we are developing a scalable machine-learning model—a multiclass logistic regression classifier using an input matrix of features such as tweet types, keywords, sentiment, word count, hashtags, hyperlinks, and tweet metadata. The anticipated product of this research—a successful, generalizable machine learning model—would help educators and researchers quickly evaluate Twitter educational hashtags to determine where they might want to engage.



Sign in / Sign up

Export Citation Format

Share Document