scholarly journals COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

Author(s):  
Jim Samuel ◽  
G. G. Md. Nawaz Ali ◽  
Md. Mokhlesur Rahman ◽  
Ek Esawi ◽  
Yana Samuel

AbstractAlong with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

Information ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 314 ◽  
Author(s):  
Jim Samuel ◽  
G. G. Md. Nawaz Ali ◽  
Md. Mokhlesur Rahman ◽  
Ek Esawi ◽  
Yana Samuel

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.


2020 ◽  
Author(s):  
Jim Samuel ◽  
Md. Mokhlesur Rahman ◽  
G.G.M.N. Ali ◽  
Ek Esawi ◽  
Y. Samuel

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19's informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91\% for short Tweets, with the Na\"ive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74\% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities


Author(s):  
Jim Samuel ◽  
G. G. Md. Nawaz Ali ◽  
Md. Mokhlesur Rahman ◽  
Ek Esawi ◽  
Yana Samuel

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fuelled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19's informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naive Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.


Author(s):  
Kusumanchi Naga Sireesha and Padala Srinivasa Reddy

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fuelled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis. The diverse use of social networking sites, like Twitter, speeds up the process of sharing information and having views on community events and health crises COVID-19 has been one of Twitter's trending areas. The Twitter messages created via Twitter are named Tweets. In this paper, we identify public sentiment associated with the pandemic using Coronavirus-specific Tweets and Python, along with its sentiment analysis packages. We provide an overview of two essential machine learning classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. This research provides insights into Coronavirus fear sentiment progression, associated methods, limitations, and different opportunities. In this project, we have designed a Sentiment analysis System that would identify the sentiment of a tweet and classify it into one of the five classes they include:”ExtremelyPositive”,“Positive”,”Neutral”, ”Negative” and “Extremely Negative”.


2020 ◽  
Vol 27 (8) ◽  
pp. 1891-1912
Author(s):  
Hengqin Wu ◽  
Geoffrey Shen ◽  
Xue Lin ◽  
Minglei Li ◽  
Boyu Zhang ◽  
...  

PurposeThis study proposes an approach to solve the fundamental problem in using query-based methods (i.e. searching engines and patent retrieval tools) to screen patents of information and communication technology in construction (ICTC). The fundamental problem is that ICTC incorporates various techniques and thus cannot be simply represented by man-made queries. To investigate this concern, this study develops a binary classifier by utilizing deep learning and NLP techniques to automatically identify whether a patent is relevant to ICTC, thus accurately screening a corpus of ICTC patents.Design/methodology/approachThis study employs NLP techniques to convert the textual data of patents into numerical vectors. Then, a supervised deep learning model is developed to learn the relations between the input vectors and outputs.FindingsThe validation results indicate that (1) the proposed approach has a better performance in screening ICTC patents than traditional machine learning methods; (2) besides the United States Patent and Trademark Office (USPTO) that provides structured and well-written patents, the approach could also accurately screen patents form Derwent Innovations Index (DIX), in which patents are written in different genres.Practical implicationsThis study contributes a specific collection for ICTC patents, which is not provided by the patent offices.Social implicationsThe proposed approach contributes an alternative manner in gathering a corpus of patents for domains like ICTC that neither exists as a searchable classification in patent offices, nor is accurately represented by man-made queries.Originality/valueA deep learning model with two layers of neurons is developed to learn the non-linear relations between the input features and outputs providing better performance than traditional machine learning models. This study uses advanced NLP techniques lemmatization and part-of-speech POS to process textual data of ICTC patents. This study contributes specific collection for ICTC patents which is not provided by the patent offices.


2020 ◽  
Vol 89 ◽  
pp. 20-29
Author(s):  
Sh. K. Kadiev ◽  
◽  
R. Sh. Khabibulin ◽  
P. P. Godlevskiy ◽  
V. L. Semikov ◽  
...  

Introduction. An overview of research in the field of classification as a method of machine learning is given. Articles containing mathematical models and algorithms for classification were selected. The use of classification in intelligent management decision support systems in various subject areas is also relevant. Goal and objectives. The purpose of the study is to analyze papers on the classification as a machine learning method. To achieve the objective, it is necessary to solve the following tasks: 1) to identify the most used classification methods in machine learning; 2) to highlight the advantages and disadvantages of each of the selected methods; 3) to analyze the possibility of using classification methods in intelligent systems to support management decisions to solve issues of forecasting, prevention and elimination of emergencies. Methods. To obtain the results, general scientific and special methods of scientific knowledge were used - analysis, synthesis, generalization, as well as the classification method. Results and discussion thereof. According to the results of the analysis, studies with a mathematical formulation and the availability of software developments were identified. The issues of classification in the implementation of machine learning in the development of intelligent decision support systems are considered. Conclusion. The analysis revealed that enough algorithms were used to perform the classification while sorting the acquired knowledge within the subject area. The implementation of an accurate classification is one of the fundamental problems in the development of management decision support systems, including for fire and emergency prevention and response. Timely and effective decision by officials of operational shifts for the disaster management is also relevant. Key words: decision support, analysis, classification, machine learning, algorithm, mathematical models.


Author(s):  
Navid Asadizanjani ◽  
Sachin Gattigowda ◽  
Mark Tehranipoor ◽  
Domenic Forte ◽  
Nathan Dunn

Abstract Counterfeiting is an increasing concern for businesses and governments as greater numbers of counterfeit integrated circuits (IC) infiltrate the global market. There is an ongoing effort in experimental and national labs inside the United States to detect and prevent such counterfeits in the most efficient time period. However, there is still a missing piece to automatically detect and properly keep record of detected counterfeit ICs. Here, we introduce a web application database that allows users to share previous examples of counterfeits through an online database and to obtain statistics regarding the prevalence of known defects. We also investigate automated techniques based on image processing and machine learning to detect different physical defects and to determine whether or not an IC is counterfeit.


2020 ◽  
Author(s):  
Carson Lam ◽  
Jacob Calvert ◽  
Gina Barnes ◽  
Emily Pellegrini ◽  
Anna Lynn-Palevsky ◽  
...  

BACKGROUND In the wake of COVID-19, the United States has developed a three stage plan to outline the parameters to determine when states may reopen businesses and ease travel restrictions. The guidelines also identify subpopulations of Americans that should continue to stay at home due to being at high risk for severe disease should they contract COVID-19. These guidelines were based on population level demographics, rather than individual-level risk factors. As such, they may misidentify individuals at high risk for severe illness and who should therefore not return to work until vaccination or widespread serological testing is available. OBJECTIVE This study evaluated a machine learning algorithm for the prediction of serious illness due to COVID-19 using inpatient data collected from electronic health records. METHODS The algorithm was trained to identify patients for whom a diagnosis of COVID-19 was likely to result in hospitalization, and compared against four U.S policy-based criteria: age over 65, having a serious underlying health condition, age over 65 or having a serious underlying health condition, and age over 65 and having a serious underlying health condition. RESULTS This algorithm identified 80% of patients at risk for hospitalization due to COVID-19, versus at most 62% that are identified by government guidelines. The algorithm also achieved a high specificity of 95%, outperforming government guidelines. CONCLUSIONS This algorithm may help to enable a broad reopening of the American economy while ensuring that patients at high risk for serious disease remain home until vaccination and testing become available.


Author(s):  
Timnit Gebru

This chapter discusses the role of race and gender in artificial intelligence (AI). The rapid permeation of AI into society has not been accompanied by a thorough investigation of the sociopolitical issues that cause certain groups of people to be harmed rather than advantaged by it. For instance, recent studies have shown that commercial automated facial analysis systems have much higher error rates for dark-skinned women, while having minimal errors on light-skinned men. Moreover, a 2016 ProPublica investigation uncovered that machine learning–based tools that assess crime recidivism rates in the United States are biased against African Americans. Other studies show that natural language–processing tools trained on news articles exhibit societal biases. While many technical solutions have been proposed to alleviate bias in machine learning systems, a holistic and multifaceted approach must be taken. This includes standardization bodies determining what types of systems can be used in which scenarios, making sure that automated decision tools are created by people from diverse backgrounds, and understanding the historical and political factors that disadvantage certain groups who are subjected to these tools.


2021 ◽  
Vol 14 (5) ◽  
pp. 472
Author(s):  
Tyler C. Beck ◽  
Kyle R. Beck ◽  
Jordan Morningstar ◽  
Menny M. Benjamin ◽  
Russell A. Norris

Roughly 2.8% of annual hospitalizations are a result of adverse drug interactions in the United States, representing more than 245,000 hospitalizations. Drug–drug interactions commonly arise from major cytochrome P450 (CYP) inhibition. Various approaches are routinely employed in order to reduce the incidence of adverse interactions, such as altering drug dosing schemes and/or minimizing the number of drugs prescribed; however, often, a reduction in the number of medications cannot be achieved without impacting therapeutic outcomes. Nearly 80% of drugs fail in development due to pharmacokinetic issues, outlining the importance of examining cytochrome interactions during preclinical drug design. In this review, we examined the physiochemical and structural properties of small molecule inhibitors of CYPs 3A4, 2D6, 2C19, 2C9, and 1A2. Although CYP inhibitors tend to have distinct physiochemical properties and structural features, these descriptors alone are insufficient to predict major cytochrome inhibition probability and affinity. Machine learning based in silico approaches may be employed as a more robust and accurate way of predicting CYP inhibition. These various approaches are highlighted in the review.


Sign in / Sign up

Export Citation Format

Share Document