Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach

Jia Xue; Junxiang Chen; Ran Hu; Chen Chen; Chengda Zheng; Yue Su; Tingshao Zhu

doi:10.2196/20550

Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach

Journal of Medical Internet Research ◽

10.2196/20550 ◽

2020 ◽

Vol 22 (11) ◽

pp. e20550

Author(s):

Jia Xue ◽

Junxiang Chen ◽

Ran Hu ◽

Chen Chen ◽

Chengda Zheng ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Latent Dirichlet Allocation ◽

The United States ◽

Response Monitoring ◽

Learning Approach ◽

Learning Approaches ◽

Public Response ◽

The Public ◽

Machine Learning Approach

Background It is important to measure the public response to the COVID-19 pandemic. Twitter is an important data source for infodemiology studies involving public response monitoring. Objective The objective of this study is to examine COVID-19–related discussions, concerns, and sentiments using tweets posted by Twitter users. Methods We analyzed 4 million Twitter messages related to the COVID-19 pandemic using a list of 20 hashtags (eg, “coronavirus,” “COVID-19,” “quarantine”) from March 7 to April 21, 2020. We used a machine learning approach, Latent Dirichlet Allocation (LDA), to identify popular unigrams and bigrams, salient topics and themes, and sentiments in the collected tweets. Results Popular unigrams included “virus,” “lockdown,” and “quarantine.” Popular bigrams included “COVID-19,” “stay home,” “corona virus,” “social distancing,” and “new cases.” We identified 13 discussion topics and categorized them into 5 different themes: (1) public health measures to slow the spread of COVID-19, (2) social stigma associated with COVID-19, (3) COVID-19 news, cases, and deaths, (4) COVID-19 in the United States, and (5) COVID-19 in the rest of the world. Across all identified topics, the dominant sentiments for the spread of COVID-19 were anticipation that measures can be taken, followed by mixed feelings of trust, anger, and fear related to different topics. The public tweets revealed a significant feeling of fear when people discussed new COVID-19 cases and deaths compared to other topics. Conclusions This study showed that Twitter data and machine learning approaches can be leveraged for an infodemiology study, enabling research into evolving public discussions and sentiments during the COVID-19 pandemic. As the situation rapidly evolves, several topics are consistently dominant on Twitter, such as confirmed cases and death rates, preventive measures, health authorities and government policies, COVID-19 stigma, and negative psychological reactions (eg, fear). Real-time monitoring and assessment of Twitter discussions and concerns could provide useful data for public health emergency responses and planning. Pandemic-related fear, stigma, and mental health concerns are already evident and may continue to influence public trust when a second wave of COVID-19 occurs or there is a new surge of the current pandemic.

Download Full-text

Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach (Preprint)

10.2196/preprints.20550 ◽

2020 ◽

Author(s):

Jia Xue ◽

Junxiang Chen ◽

Ran Hu ◽

Chen Chen ◽

Chengda Zheng ◽

...

Keyword(s):

Public Health ◽

Machine Learning ◽

Latent Dirichlet Allocation ◽

The United States ◽

Response Monitoring ◽

Learning Approach ◽

Learning Approaches ◽

Public Response ◽

The Public ◽

Machine Learning Approach

BACKGROUND It is important to measure the public response to the COVID-19 pandemic. Twitter is an important data source for infodemiology studies involving public response monitoring. OBJECTIVE The objective of this study is to examine COVID-19–related discussions, concerns, and sentiments using tweets posted by Twitter users. METHODS We analyzed 4 million Twitter messages related to the COVID-19 pandemic using a list of 20 hashtags (eg, “coronavirus,” “COVID-19,” “quarantine”) from March 7 to April 21, 2020. We used a machine learning approach, Latent Dirichlet Allocation (LDA), to identify popular unigrams and bigrams, salient topics and themes, and sentiments in the collected tweets. RESULTS Popular unigrams included “virus,” “lockdown,” and “quarantine.” Popular bigrams included “COVID-19,” “stay home,” “corona virus,” “social distancing,” and “new cases.” We identified 13 discussion topics and categorized them into 5 different themes: (1) public health measures to slow the spread of COVID-19, (2) social stigma associated with COVID-19, (3) COVID-19 news, cases, and deaths, (4) COVID-19 in the United States, and (5) COVID-19 in the rest of the world. Across all identified topics, the dominant sentiments for the spread of COVID-19 were anticipation that measures can be taken, followed by mixed feelings of trust, anger, and fear related to different topics. The public tweets revealed a significant feeling of fear when people discussed new COVID-19 cases and deaths compared to other topics. CONCLUSIONS This study showed that Twitter data and machine learning approaches can be leveraged for an infodemiology study, enabling research into evolving public discussions and sentiments during the COVID-19 pandemic. As the situation rapidly evolves, several topics are consistently dominant on Twitter, such as confirmed cases and death rates, preventive measures, health authorities and government policies, COVID-19 stigma, and negative psychological reactions (eg, fear). Real-time monitoring and assessment of Twitter discussions and concerns could provide useful data for public health emergency responses and planning. Pandemic-related fear, stigma, and mental health concerns are already evident and may continue to influence public trust when a second wave of COVID-19 occurs or there is a new surge of the current pandemic.

Download Full-text

A machine learning approach to open public comments for policymaking

Information Polity ◽

10.3233/ip-200256 ◽

2020 ◽

Vol 25 (4) ◽

pp. 433-448 ◽

Cited By ~ 1

Author(s):

Alex Ingrams

Keyword(s):

Machine Learning ◽

Latent Dirichlet Allocation ◽

Public Information ◽

Statistical Modelling ◽

The United States ◽

Digital Data ◽

Airport Security ◽

Learning Approach ◽

Machine Learning Approach ◽

Proposed Regulation

In this paper, the author argues that the conflict between the copious amount of digital data processed by public organisations and the need for policy-relevant insights to aid public participation constitutes a ‘public information paradox’. Machine learning (ML) approaches may offer one solution to this paradox through algorithms that transparently collect and use statistical modelling to provide insights for policymakers. Such an approach is tested in this paper. The test involves applying an unsupervised machine learning approach with latent Dirichlet allocation (LDA) analysis of thousands of public comments submitted to the United States Transport Security Administration (TSA) on a 2013 proposed regulation for the use of new full body imaging scanners in airport security terminals. The analysis results in salient topic clusters that could be used by policymakers to understand large amounts of text such as in an open public comments process. The results are compared with the actual final proposed TSA rule, and the author reflects on new questions raised for transparency by the implementation of ML in open rule-making processes.

Download Full-text

Mol2vec: Unsupervised Machine Learning Approach with Chemical Intuition

10.26434/chemrxiv.5513581.v1 ◽

2017 ◽

Author(s):

Sabrina Jaeger ◽

Simone Fulle ◽

Samo Turk

Keyword(s):

Machine Learning ◽

Language Processing ◽

Supervised Machine Learning ◽

Learning Approach ◽

Learning Approaches ◽

Unsupervised Machine Learning ◽

Feature Representations ◽

Machine Learning Approach ◽

The Individual ◽

Vector Representations

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.

Download Full-text

Predictors of remission from body dysmorphic disorder after internet-delivered cognitive behavior therapy: a machine learning approach

10.31234/osf.io/eqcdx ◽

2019 ◽

Author(s):

Oskar Flygare ◽

Jesper Enander ◽

Erik Andersson ◽

Brjánn Ljótsson ◽

Volen Z Ivanov ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Clinical Utility ◽

Body Dysmorphic Disorder ◽

Prediction Models ◽

Behavioral Therapy ◽

Learning Approach ◽

Learning Approaches ◽

Machine Learning Approach

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.

Download Full-text

Validation of an Internationally Derived Patient Severity Phenotype to Support COVID-19 Analytics from Electronic Health Record Data

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocab018 ◽

2021 ◽

Author(s):

Jeffrey G Klann ◽

Griffin M Weber ◽

Hossein Estiri ◽

Bertrand Moal ◽

Paul Avillach ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Record ◽

Chart Review ◽

Learning Approach ◽

Health Record ◽

Learning Approaches ◽

Electronic Health Record Data ◽

Icu Admission ◽

Machine Learning Approach ◽

Electronic Health

Abstract Introduction The Consortium for Clinical Characterization of COVID-19 by EHR (4CE) is an international collaboration addressing COVID-19 with federated analyses of electronic health record (EHR) data. Objective We sought to develop and validate a computable phenotype for COVID-19 severity. Methods Twelve 4CE sites participated. First we developed an EHR-based severity phenotype consisting of six code classes, and we validated it on patient hospitalization data from the 12 4CE clinical sites against the outcomes of ICU admission and/or death. We also piloted an alternative machine-learning approach and compared selected predictors of severity to the 4CE phenotype at one site. Results The full 4CE severity phenotype had pooled sensitivity of 0.73 and specificity 0.83 for the combined outcome of ICU admission and/or death. The sensitivity of individual code categories for acuity had high variability - up to 0.65 across sites. At one pilot site, the expert-derived phenotype had mean AUC 0.903 (95% CI: 0.886, 0.921), compared to AUC 0.956 (95% CI: 0.952, 0.959) for the machine-learning approach. Billing codes were poor proxies of ICU admission, with as low as 49% precision and recall compared to chart review. Discussion We developed a severity phenotype using 6 code classes that proved resilient to coding variability across international institutions. In contrast, machine-learning approaches may overfit hospital-specific orders. Manual chart review revealed discrepancies even in the gold-standard outcomes, possibly due to heterogeneous pandemic conditions. Conclusion We developed an EHR-based severity phenotype for COVID-19 in hospitalized patients and validated it at 12 international sites.

Download Full-text

Analysis of Machine Learning Approach for the modemodel in SWC Mapping in Automotive Systems

Embedded Selforganising Systems ◽

10.14464/ess.v7i1.447 ◽

2021 ◽

Vol 7 (1) ◽

pp. 16-19

Author(s):

Owes Khan ◽

Geri Shahini ◽

Wolfram Hardt

Keyword(s):

Machine Learning ◽

Autonomous Driving ◽

Learning Approach ◽

Software Components ◽

Control Mechanisms ◽

Learning Approaches ◽

Software Applications ◽

Automotive Systems ◽

Machine Learning Approach ◽

Development Processes

Automotive technologies are ever-increasinglybecoming digital. Highly autonomous driving togetherwith digital E/E control mechanisms include thousandsof software applications which are called as software components. Together with the industry requirements, and rigorous software development processes, mappingof components as a software pool becomes very difficult.This article analyses and discusses the integration possiblilities of machine learning approaches to our previously introduced concept of mapping of software components through a common software pool.

Download Full-text

Assessing the Heterogeneity of Complaints Related to Tinnitus and Hyperacusis from an Unsupervised Machine Learning Approach: An Exploratory Study

Audiology and Neurotology ◽

10.1159/000504741 ◽

2020 ◽

Vol 25 (4) ◽

pp. 174-189 ◽

Cited By ~ 1

Author(s):

Guillaume Palacios ◽

Arnaud Noreña ◽

Alain Londero

Keyword(s):

Machine Learning ◽

Statistical Analysis ◽

Language Processing ◽

Exploratory Study ◽

Latent Dirichlet Allocation ◽

Suicide Attempts ◽

Real Life ◽

Supervised Machine Learning ◽

Learning Approach ◽

Machine Learning Approach

Introduction: Subjective tinnitus (ST) and hyperacusis (HA) are common auditory symptoms that may become incapacitating in a subgroup of patients who thereby seek medical advice. Both conditions can result from many different mechanisms, and as a consequence, patients may report a vast repertoire of associated symptoms and comorbidities that can reduce dramatically the quality of life and even lead to suicide attempts in the most severe cases. The present exploratory study is aimed at investigating patients’ symptoms and complaints using an in-depth statistical analysis of patients’ natural narratives in a real-life environment in which, thanks to the anonymization of contributions and the peer-to-peer interaction, it is supposed that the wording used is totally free of any self-limitation and self-censorship. Methods: We applied a purely statistical, non-supervised machine learning approach to the analysis of patients’ verbatim exchanged on an Internet forum. After automated data extraction, the dataset has been preprocessed in order to make it suitable for statistical analysis. We used a variant of the Latent Dirichlet Allocation (LDA) algorithm to reveal clusters of symptoms and complaints of HA patients (topics). The probability of distribution of words within a topic uniquely characterizes it. The convergence of the log-likelihood of the LDA-model has been reached after 2,000 iterations. Several statistical parameters have been tested for topic modeling and word relevance factor within each topic. Results: Despite a rather small dataset, this exploratory study demonstrates that patients’ free speeches available on the Internet constitute a valuable material for machine learning and statistical analysis aimed at categorizing ST/HA complaints. The LDA model with K = 15 topics seems to be the most relevant in terms of relative weights and correlations with the capability to individualizing subgroups of patients displaying specific characteristics. The study of the relevance factor may be useful to unveil weak but important signals that are present in patients’ narratives. Discussion/Conclusion: We claim that the LDA non-supervised approach would permit to gain knowledge on the patterns of ST- and HA-related complaints and on patients’ centered domains of interest. The merits and limitations of the LDA algorithms are compared with other natural language processing methods and with more conventional methods of qualitative analysis of patients’ output. Future directions and research topics emerging from this innovative algorithmic analysis are proposed.

Download Full-text

Diagnosing malaria from some symptoms: a machine learning approach and public health implications

Health and Technology ◽

10.1007/s12553-020-00488-5 ◽

2020 ◽

Author(s):

Hilary I. Okagbue ◽

Pelumi E. Oguntunde ◽

Emmanuela C. M. Obasi ◽

Patience I. Adamu ◽

Abiodun A. Opanuga

Keyword(s):

Public Health ◽

Machine Learning ◽

Learning Approach ◽

Health Implications ◽

Machine Learning Approach

Download Full-text

Effectiveness of Machine Learning Approaches Towards Credibility Assessment of Crowdfunding Projects for Reliable Recommendations

Applied Sciences ◽

10.3390/app10249062 ◽

2020 ◽

Vol 10 (24) ◽

pp. 9062

Author(s):

Wafa Shafqat ◽

Yung-Cheol Byun ◽

Namje Park

Keyword(s):

Machine Learning ◽

Latent Dirichlet Allocation ◽

Short Term Memory ◽

Research Work ◽

Learning Approaches ◽

Credibility Assessment ◽

User Interests ◽

Machine Learning Approach ◽

Hybrid Machine ◽

Numeric Data

Recommendation systems aim to decipher user interests, preferences, and behavioral patterns automatically. However, it becomes trickier to make the most trustworthy and reliable recommendation to users, especially when their hardest earned money is at risk. The credibility of the recommendation is of magnificent importance in crowdfunding project recommendations. This research work devises a hybrid machine learning-based approach for credible crowdfunding projects’ recommendations by wisely incorporating backers’ sentiments and other influential features. The proposed model has four modules: a feature extraction module, a hybrid LDA-LSTM (latent Dirichlet allocation and long short-term memory) based latent topics evaluation module, credibility formulation, and recommendation module. The credibility analysis proffers a process of correlating project creator’s proficiency, reviewers’ sentiments, and their influence to estimate a project’s authenticity level that makes our model robust to unauthentic and untrustworthy projects and profiles. The recommendation module selects projects based on the user’s interests with the highest credible scores and recommends them. The proposed recommendation method harnesses numeric data and sentiment expressions linked with comments, backers’ preferences, profile data, and the creator’s credibility for quantitative examination of several alternative projects. The proposed model’s evaluation depicts that credibility assessment based on the hybrid machine learning approach contributes efficient results (with 98% accuracy) than existing recommendation models. We have also evaluated our credibility assessment technique on different categories of the projects, i.e., suspended, canceled, delivered, and never delivered projects, and achieved satisfactory outcomes, i.e., 93%, 84%, 58%, and 93%, projects respectively accurately classify into our desired range of credibility.

Download Full-text

Retinal Area Segmentation using Adaptive Superpixalation and its Classification using RBFN

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i6.pp2674-2681 ◽

2016 ◽

Vol 6 (6) ◽

pp. 2674

Author(s):

Nimisha Singh ◽

Rana Gill

Keyword(s):

Machine Learning ◽

Retinal Disease ◽

Learning Approach ◽

Learning Approaches ◽

Retinal Area ◽

Original Image ◽

Feature Generation ◽

Medical Field ◽

Machine Learning Approach ◽

Image Pattern

<p class="Abstract">Retinal disease is the very important issue in medical field. To diagnose the disease, it needs to detect the true retinal area. Artefacts like eyelids and eyelashes are come along with retinal part so removal of artefacts is the big task for better diagnosis of disease into the retinal part. In this paper, we have proposed the segmentation and use machine learning approaches to detect the true retinal part. Preprocessing is done on the original image using Gamma Normalization which helps to enhance the image that can gives detail information about the image. Then the segmentation is performed on the Gamma Normalized image by Superpixel method. Superpixel is the group of pixel into different regions which is based on compactness and regional size. Superpixel is used to reduce the complexity of image processing task and provide suitable primitive image pattern. Then feature generation must be done and machine learning approach helps to extract true retinal area. The experimental evaluation gives the better result with accuracy of 96%.</p>

Download Full-text