COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

AbstractAlong with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

Download Full-text

COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

Information ◽

10.3390/info11060314 ◽

2020 ◽

Vol 11 (6) ◽

pp. 314 ◽

Cited By ~ 17

Author(s):

Jim Samuel ◽

G. G. Md. Nawaz Ali ◽

Md. Mokhlesur Rahman ◽

Ek Esawi ◽

Yana Samuel

Keyword(s):

Machine Learning ◽

The United States ◽

Classification Methods ◽

Reasonable Accuracy ◽

Bayes Method ◽

Inaccurate Information ◽

Public Sentiment ◽

Research Article ◽

Textual Data ◽

Data Visualizations

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

Download Full-text

COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

10.31234/osf.io/sw2dn ◽

2020 ◽

Cited By ~ 2

Author(s):

Jim Samuel ◽

Md. Mokhlesur Rahman ◽

G.G.M.N. Ali ◽

Ek Esawi ◽

Y. Samuel

Keyword(s):

Machine Learning ◽

The United States ◽

Classification Methods ◽

Reasonable Accuracy ◽

Bayes Method ◽

Inaccurate Information ◽

Public Sentiment ◽

Research Article ◽

Textual Data ◽

Data Visualizations

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19's informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91\% for short Tweets, with the Na\"ive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74\% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities

Download Full-text

COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

10.20944/preprints202005.0015.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jim Samuel ◽

G. G. Md. Nawaz Ali ◽

Md. Mokhlesur Rahman ◽

Ek Esawi ◽

Yana Samuel

Keyword(s):

Machine Learning ◽

The United States ◽

Reasonable Accuracy ◽

Bayes Method ◽

Machine Learning Classification ◽

Inaccurate Information ◽

Public Sentiment ◽

Research Article ◽

Textual Data ◽

Data Visualizations

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fuelled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19's informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

Download Full-text

COVID19 Sentiment Analysis using Machine Learning Classification Algorithms

International Journal for Modern Trends in Science and Technology - RTT2020 ◽

10.46501/ijmtst0709003 ◽

2021 ◽

Vol 7 (09) ◽

pp. 13-18

Author(s):

Kusumanchi Naga Sireesha and Padala Srinivasa Reddy

Keyword(s):

Machine Learning ◽

Social Networking ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Classification Algorithms ◽

Classification Methods ◽

Machine Learning Classification ◽

Inaccurate Information ◽

Public Sentiment ◽

Analysis System

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fuelled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis. The diverse use of social networking sites, like Twitter, speeds up the process of sharing information and having views on community events and health crises COVID-19 has been one of Twitter's trending areas. The Twitter messages created via Twitter are named Tweets. In this paper, we identify public sentiment associated with the pandemic using Coronavirus-specific Tweets and Python, along with its sentiment analysis packages. We provide an overview of two essential machine learning classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. This research provides insights into Coronavirus fear sentiment progression, associated methods, limitations, and different opportunities. In this project, we have designed a Sentiment analysis System that would identify the sentiment of a tweet and classify it into one of the five classes they include:”ExtremelyPositive”,“Positive”,”Neutral”, ”Negative” and “Extremely Negative”.

Download Full-text

Screening patents of ICT in construction using deep learning and NLP techniques

Engineering Construction & Architectural Management ◽

10.1108/ecam-09-2019-0480 ◽

2020 ◽

Vol 27 (8) ◽

pp. 1891-1912

Author(s):

Hengqin Wu ◽

Geoffrey Shen ◽

Xue Lin ◽

Minglei Li ◽

Boyu Zhang ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Fundamental Problem ◽

The United States ◽

Learning Model ◽

Content Type ◽

Patent Retrieval ◽

Part Of Speech ◽

Textual Data ◽

Deep Learning Model

PurposeThis study proposes an approach to solve the fundamental problem in using query-based methods (i.e. searching engines and patent retrieval tools) to screen patents of information and communication technology in construction (ICTC). The fundamental problem is that ICTC incorporates various techniques and thus cannot be simply represented by man-made queries. To investigate this concern, this study develops a binary classifier by utilizing deep learning and NLP techniques to automatically identify whether a patent is relevant to ICTC, thus accurately screening a corpus of ICTC patents.Design/methodology/approachThis study employs NLP techniques to convert the textual data of patents into numerical vectors. Then, a supervised deep learning model is developed to learn the relations between the input vectors and outputs.FindingsThe validation results indicate that (1) the proposed approach has a better performance in screening ICTC patents than traditional machine learning methods; (2) besides the United States Patent and Trademark Office (USPTO) that provides structured and well-written patents, the approach could also accurately screen patents form Derwent Innovations Index (DIX), in which patents are written in different genres.Practical implicationsThis study contributes a specific collection for ICTC patents, which is not provided by the patent offices.Social implicationsThe proposed approach contributes an alternative manner in gathering a corpus of patents for domains like ICTC that neither exists as a searchable classification in patent offices, nor is accurately represented by man-made queries.Originality/valueA deep learning model with two layers of neurons is developed to learn the non-linear relations between the input features and outputs providing better performance than traditional machine learning models. This study uses advanced NLP techniques lemmatization and part-of-speech POS to process textual data of ICTC patents. This study contributes specific collection for ICTC patents which is not provided by the patent offices.

Download Full-text

Review of classification studies for machine learning in the development of intelligent management decision support systems

Technology of technosphere safety ◽

10.25257/tts.2020.3.89.20-29 ◽

2020 ◽

Vol 89 ◽

pp. 20-29

Author(s):

Sh. K. Kadiev ◽

◽

R. Sh. Khabibulin ◽

P. P. Godlevskiy ◽

V. L. Semikov ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Mathematical Models ◽

Decision Support Systems ◽

Support Systems ◽

Management Decision ◽

Classification Methods ◽

Advantages And Disadvantages ◽

Intelligent Management ◽

Management Decision Support

Introduction. An overview of research in the field of classification as a method of machine learning is given. Articles containing mathematical models and algorithms for classification were selected. The use of classification in intelligent management decision support systems in various subject areas is also relevant. Goal and objectives. The purpose of the study is to analyze papers on the classification as a machine learning method. To achieve the objective, it is necessary to solve the following tasks: 1) to identify the most used classification methods in machine learning; 2) to highlight the advantages and disadvantages of each of the selected methods; 3) to analyze the possibility of using classification methods in intelligent systems to support management decisions to solve issues of forecasting, prevention and elimination of emergencies. Methods. To obtain the results, general scientific and special methods of scientific knowledge were used - analysis, synthesis, generalization, as well as the classification method. Results and discussion thereof. According to the results of the analysis, studies with a mathematical formulation and the availability of software developments were identified. The issues of classification in the implementation of machine learning in the development of intelligent decision support systems are considered. Conclusion. The analysis revealed that enough algorithms were used to perform the classification while sorting the acquired knowledge within the subject area. The implementation of an accurate classification is one of the fundamental problems in the development of management decision support systems, including for fire and emergency prevention and response. Timely and effective decision by officials of operational shifts for the disaster management is also relevant. Key words: decision support, analysis, classification, machine learning, algorithm, mathematical models.

Download Full-text

A Database for Counterfeit Electronics and Automatic Defect Detection Based on Image Processing and Machine Learning

ISTFA 2016: Conference Proceedings from the 42nd International Symposium for Testing and Failure Analysis ◽

10.31399/asm.cp.istfa2016p0580 ◽

2016 ◽

Author(s):

Navid Asadizanjani ◽

Sachin Gattigowda ◽

Mark Tehranipoor ◽

Domenic Forte ◽

Nathan Dunn

Keyword(s):

Machine Learning ◽

Image Processing ◽

Integrated Circuits ◽

Web Application ◽

The United States ◽

Global Market ◽

Online Database ◽

Ongoing Effort ◽

Time Period ◽

Physical Defects

Abstract Counterfeiting is an increasing concern for businesses and governments as greater numbers of counterfeit integrated circuits (IC) infiltrate the global market. There is an ongoing effort in experimental and national labs inside the United States to detect and prevent such counterfeits in the most efficient time period. However, there is still a missing piece to automatically detect and properly keep record of detected counterfeit ICs. Here, we introduce a web application database that allows users to share previous examples of counterfeits through an online database and to obtain statistics regarding the prevalence of known defects. We also investigate automated techniques based on image processing and machine learning to detect different physical defects and to determine whether or not an IC is counterfeit.

Download Full-text

Personalized stratification of back to work risk amidst COVID-19: A machine learning approach (Preprint)

10.2196/preprints.22030 ◽

2020 ◽

Author(s):

Carson Lam ◽

Jacob Calvert ◽

Gina Barnes ◽

Emily Pellegrini ◽

Anna Lynn-Palevsky ◽

...

Keyword(s):

Machine Learning ◽

High Risk ◽

Learning Algorithm ◽

Severe Disease ◽

High Specificity ◽

Population Level ◽

The United States ◽

Health Condition ◽

Available P ◽

Severe Illness

BACKGROUND In the wake of COVID-19, the United States has developed a three stage plan to outline the parameters to determine when states may reopen businesses and ease travel restrictions. The guidelines also identify subpopulations of Americans that should continue to stay at home due to being at high risk for severe disease should they contract COVID-19. These guidelines were based on population level demographics, rather than individual-level risk factors. As such, they may misidentify individuals at high risk for severe illness and who should therefore not return to work until vaccination or widespread serological testing is available. OBJECTIVE This study evaluated a machine learning algorithm for the prediction of serious illness due to COVID-19 using inpatient data collected from electronic health records. METHODS The algorithm was trained to identify patients for whom a diagnosis of COVID-19 was likely to result in hospitalization, and compared against four U.S policy-based criteria: age over 65, having a serious underlying health condition, age over 65 or having a serious underlying health condition, and age over 65 and having a serious underlying health condition. RESULTS This algorithm identified 80% of patients at risk for hospitalization due to COVID-19, versus at most 62% that are identified by government guidelines. The algorithm also achieved a high specificity of 95%, outperforming government guidelines. CONCLUSIONS This algorithm may help to enable a broad reopening of the American economy while ensuring that patients at high risk for serious disease remain home until vaccination and testing become available.

Download Full-text

Race and Gender

The Oxford Handbook of Ethics of AI ◽

10.1093/oxfordhb/9780190067397.013.16 ◽

2020 ◽

pp. 251-269 ◽

Cited By ~ 2

Author(s):

Timnit Gebru

Keyword(s):

Machine Learning ◽

Language Processing ◽

The United States ◽

Error Rates ◽

Political Factors ◽

Recidivism Rates ◽

Race And Gender ◽

Decision Tools ◽

And Gender ◽

Technical Solutions

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.

Download Full-text

Descriptors of Cytochrome Inhibitors and Useful Machine Learning Based Methods for the Design of Safer Drugs

Pharmaceuticals ◽

10.3390/ph14050472 ◽

2021 ◽

Vol 14 (5) ◽

pp. 472

Author(s):

Tyler C. Beck ◽

Kyle R. Beck ◽

Jordan Morningstar ◽

Menny M. Benjamin ◽

Russell A. Norris

Keyword(s):

United States ◽

Machine Learning ◽

Drug Interactions ◽

The United States ◽

Structural Features ◽

Physiochemical Properties ◽

Drug Dosing ◽

Therapeutic Outcomes ◽

Cyp Inhibition ◽

Cyp Inhibitors

Roughly 2.8% of annual hospitalizations are a result of adverse drug interactions in the United States, representing more than 245,000 hospitalizations. Drug–drug interactions commonly arise from major cytochrome P450 (CYP) inhibition. Various approaches are routinely employed in order to reduce the incidence of adverse interactions, such as altering drug dosing schemes and/or minimizing the number of drugs prescribed; however, often, a reduction in the number of medications cannot be achieved without impacting therapeutic outcomes. Nearly 80% of drugs fail in development due to pharmacokinetic issues, outlining the importance of examining cytochrome interactions during preclinical drug design. In this review, we examined the physiochemical and structural properties of small molecule inhibitors of CYPs 3A4, 2D6, 2C19, 2C9, and 1A2. Although CYP inhibitors tend to have distinct physiochemical properties and structural features, these descriptors alone are insufficient to predict major cytochrome inhibition probability and affinity. Machine learning based in silico approaches may be employed as a more robust and accurate way of predicting CYP inhibition. These various approaches are highlighted in the review.

Download Full-text