scholarly journals COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

Author(s):  
Jim Samuel ◽  
G. G. Md. Nawaz Ali ◽  
Md. Mokhlesur Rahman ◽  
Ek Esawi ◽  
Yana Samuel

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fuelled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19's informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 314 ◽  
Author(s):  
Jim Samuel ◽  
G. G. Md. Nawaz Ali ◽  
Md. Mokhlesur Rahman ◽  
Ek Esawi ◽  
Yana Samuel

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.


2020 ◽  
Author(s):  
Jim Samuel ◽  
G. G. Md. Nawaz Ali ◽  
Md. Mokhlesur Rahman ◽  
Ek Esawi ◽  
Yana Samuel

AbstractAlong with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.


2020 ◽  
Author(s):  
Jim Samuel ◽  
Md. Mokhlesur Rahman ◽  
G.G.M.N. Ali ◽  
Ek Esawi ◽  
Y. Samuel

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19's informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91\% for short Tweets, with the Na\"ive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74\% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities


Author(s):  
Kusumanchi Naga Sireesha and Padala Srinivasa Reddy

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fuelled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis. The diverse use of social networking sites, like Twitter, speeds up the process of sharing information and having views on community events and health crises COVID-19 has been one of Twitter's trending areas. The Twitter messages created via Twitter are named Tweets. In this paper, we identify public sentiment associated with the pandemic using Coronavirus-specific Tweets and Python, along with its sentiment analysis packages. We provide an overview of two essential machine learning classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. This research provides insights into Coronavirus fear sentiment progression, associated methods, limitations, and different opportunities. In this project, we have designed a Sentiment analysis System that would identify the sentiment of a tweet and classify it into one of the five classes they include:”ExtremelyPositive”,“Positive”,”Neutral”, ”Negative” and “Extremely Negative”.


2020 ◽  
Vol 27 (8) ◽  
pp. 1891-1912
Author(s):  
Hengqin Wu ◽  
Geoffrey Shen ◽  
Xue Lin ◽  
Minglei Li ◽  
Boyu Zhang ◽  
...  

PurposeThis study proposes an approach to solve the fundamental problem in using query-based methods (i.e. searching engines and patent retrieval tools) to screen patents of information and communication technology in construction (ICTC). The fundamental problem is that ICTC incorporates various techniques and thus cannot be simply represented by man-made queries. To investigate this concern, this study develops a binary classifier by utilizing deep learning and NLP techniques to automatically identify whether a patent is relevant to ICTC, thus accurately screening a corpus of ICTC patents.Design/methodology/approachThis study employs NLP techniques to convert the textual data of patents into numerical vectors. Then, a supervised deep learning model is developed to learn the relations between the input vectors and outputs.FindingsThe validation results indicate that (1) the proposed approach has a better performance in screening ICTC patents than traditional machine learning methods; (2) besides the United States Patent and Trademark Office (USPTO) that provides structured and well-written patents, the approach could also accurately screen patents form Derwent Innovations Index (DIX), in which patents are written in different genres.Practical implicationsThis study contributes a specific collection for ICTC patents, which is not provided by the patent offices.Social implicationsThe proposed approach contributes an alternative manner in gathering a corpus of patents for domains like ICTC that neither exists as a searchable classification in patent offices, nor is accurately represented by man-made queries.Originality/valueA deep learning model with two layers of neurons is developed to learn the non-linear relations between the input features and outputs providing better performance than traditional machine learning models. This study uses advanced NLP techniques lemmatization and part-of-speech POS to process textual data of ICTC patents. This study contributes specific collection for ICTC patents which is not provided by the patent offices.


2020 ◽  
Author(s):  
Arielle Selya ◽  
Drake Anshutz ◽  
Emily Griese ◽  
Tess L Weber ◽  
Benson Hsu ◽  
...  

Abstract Background: Diabetes is common and an economic burden in the United States. In this study, a machine learning predictive model was developed to predict unplanned medical visits among patients with diabetes. Methods: Data were drawn from electronic medical records (EMRs) from a large healthcare organization in the Northern Plans region of the US, from adult (≥18 years old) patients with type 1 or type 2 diabetes who received care at least once during the 3 year period. A variety of machine-learning classification models were run using standard EMR variables as predictors (age, body mass index (BMI), Systolic blood pressure (BP), Diastolic BP, low-density lipoprotein (LDL), high-density lipoprotein (HDL), glycohemoglobin (A1C), smoking status, number of diagnoses and number of prescriptions). The best-performing model after cross-validation testing was analyzed to identify strongest predictors.Results: The best-performing model was a radial-basis support vector machine, which achieved a prediction accuracy (average of sensitivity and specificity) of 66.2%. This outperformed a conventional logistic regression by 1.5 percentage points. High BP and low HDL were identified as the strongest predictors, such that eliminating these from the model decreased its overall prediction accuracy by 1.9 and 1.8 percentage points, respectively.Conclusion: Our machine-learning predictive model more accurately predicted unplanned medical visits among patients with diabetes, relative to conventional models. Post-hoc analysis of the model was used for hypothesis generation, namely that HDL and BP are the strongest contributors to unplanned medical visits among patients with diabetes. In this way, this predictive model can be used in moving from prediction to implementation and improved diabetes care management in clinical settings.


2020 ◽  
Vol 12 (01) ◽  
pp. 2050003
Author(s):  
Ahmed Lasisi ◽  
Pengyu Li ◽  
Jian Chen

Highway-rail grade crossing (HRGC) accidents continue to be a major source of transportation casualties in the United States. This can be attributed to increased road and rail operations and/or lack of adequate safety programs based on comprehensive HRGC accidents analysis amidst other reasons. The focus of this study is to predict HRGC accidents in a given rail network based on a machine learning analysis of a similar network with cognate attributes. This study is an improvement on past studies that either attempt to predict accidents in a given HRGC or spatially analyze HRGC accidents for a particular rail line. In this study, a case for a hybrid machine learning and geographic information systems (GIS) approach is presented in a large rail network. The study involves collection and wrangling of relevant data from various sources; exploratory analysis, and supervised machine learning (classification and regression) of HRGC data from 2008 to 2017 in California. The models developed from this analysis were used to make binary predictions [98.9% accuracy & 0.9838 Receiver Operating Characteristic (ROC) score] and quantitative estimations of HRGC casualties in a similar network over the next 10 years. While results are spatially presented in GIS, this novel hybrid application of machine learning and GIS in HRGC accidents’ analysis will help stakeholders to pro-actively engage with casualties through addressing major accident causes as identified in this study. This paper is concluded with a Systems-Action-Management (SAM) approach based on text analysis of HRGC accident risk reports from Federal Railroad Administration.


2018 ◽  
Vol 21 ◽  
pp. 45-48
Author(s):  
Shilpa Balan ◽  
Sanchita Gawand ◽  
Priyanka Purushu

Cybersecurity plays a vital role in protecting the privacy and data of people. In the recent times, there have been several issues relating to cyber fraud, data breach and cyber theft. Many people in the United States have been a victim of identity theft. Thus, understanding of cybersecurity plays an important role in protecting their information and devices. As the adoption of smart devices and social networking are increasing, cybersecurity awareness needs to be spread. The research aims at building a classification machine learning algorithm to determine the awareness of cybersecurity by the common masses in the United States. We were able to attain a good F-measure score when evaluating the performance of the classification model built for this study.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Michelle A. Worthington ◽  
Amar Mandavia ◽  
Randall Richardson-Vejlgaard

Abstract Background Recent research has identified a number of pre-traumatic, peri-traumatic and post-traumatic psychological and ecological factors that put an individual at increased risk for developing PTSD following a life-threatening event. While these factors have been found to be associated with PTSD in univariate analyses, the complex interactions of these risk factors and how they contribute to individual trajectories of the illness are not yet well understood. In this study, we examine the impact of prior trauma, psychopathology, sociodemographic characteristics, community and environmental information, on PTSD onset in a nationally representative sample of adults in the United States, using machine learning methods to establish the relative contributions of each variable. Methods Individual risk factors identified in Waves 1 of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) were combined with community-level data for the years concurrent to the NESARC Wave 1 (n = 43,093) and 2 (n = 34,653) surveys. Machine learning feature selection and classification analyses were used at the national level to create models using individual- and community-level variables that would best predict the new onset of PTSD at Wave 2. Results Our classification algorithms yielded 89.7 to 95.6% accuracy for predicting new onset of PTSD at Wave 2. A prior diagnosis of DSM-IV-TR Borderline Personality Disorder, Major Depressive Disorder or Anxiety Disorder conferred the greatest relative influence in new diagnosis of PTSD. Distal risk factors such as prior psychiatric diagnosis accounted for significantly greater relative risk than proximal factors (such as adverse event exposure). Conclusions Our findings show that a machine learning classification approach can successfully integrate large numbers of known risk factors for PTSD into stronger models that account for high-dimensional interactions and collinearity between variables. We discuss the implications of these findings as pertaining to the targeted mobilization emergency mental health resources. These findings also inform the creation of a more comprehensive risk assessment profile to the likelihood of developing PTSD following an extremely adverse event.


Author(s):  
Jeremy M. Gernand

The safety of mining in the United States has improved significantly over the past few decades, although it remains one of the more dangerous occupations. Following the Sago mine disaster in January 2006, federal legislation (The Mine Improvement and New Emergency Response {MINER} Act of 2006) tightened regulations and sought to strengthen the authority and safety inspection practices of the Mine Safety and Health Administration (MSHA). While penalties and inspection frequency have increased, understanding of what types of inspection findings are most indicative of serious future incidents is limited. The most effective safety management and oversight could be accomplished by a thorough understanding of what types of infractions or safety inspection findings are most indicative of serious future personnel injuries. However, given the large number of potentially different and unique inspection findings, varied mine characteristics, and types of specific safety incidents, this question is complex in terms of the large number of potentially relevant input parameters. New regulations rely on increasing the frequency and severity of infraction penalties to encourage mining operations to improve worker safety, but without the knowledge of which specific infractions may truly be signaling a dangerous work environment. This paper seeks to inform the question, what types of inspection findings are most indicative of serious future incidents for specific types of mining operations? This analysis utilizes publicly available MSHA databases of cited infractions and reportable incidents. These inspection results are used to train machine learning Classification and Regression Tree (CART) and Random Forest (RF) models that divide the groups of mines into peer groups based on their recent infractions and other defining characteristics with the aim of predicting whether or not a fatal or serious disabling injury is more likely to occur in the following 12-month period. With these characteristics available, additional scrutiny may be appropriately directed at those mining operations at greatest risk of experiencing a worker fatality or disabling injury in the near future. Increased oversight and attention on these mines where workers are at greatest risk may more effectively reduce the likelihood of worker deaths and injuries than increased penalties and inspection frequency alone.


Sign in / Sign up

Export Citation Format

Share Document