scholarly journals Tracking COVID-19 vaccine hesitancy and logistical challenges: A machine learning approach

PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0252332
Author(s):  
Shantanu Dutta ◽  
Ashok Kumar ◽  
Moumita Dutta ◽  
Caolan Walsh

In this study, we use an effective word embedding model (word2vec) to systematically track ’vaccine hesitancy’ and ’logistical challenges’ associated with the Covid-19 vaccines, in the USA. To that effect, we use news articles from reputed media sources and create dictionaries to estimate different aspects of vaccine hesitancy and logistical challenges. Using machine learning and natural language processing techniques, we have developed (i) three sub-dictionaries that indicate vaccine hesitancy, and (ii) another dictionary for logistical challenges associated with vaccine production and distribution. Vaccine hesitancy dictionaries capture three aspects: (a) general vaccine related concerns, mistrusts, skepticisms, and hesitancy, (b) discussions on symptoms and side-effects, and (c) discussions on vaccine related physical effects. The dictionary on logistical challenges includes the words and phrases related to the production, storage, and distribution of vaccines. Our results show that over time, as vaccine developers complete different phase trials and get approval for their respective vaccines, the number of vaccine related news articles increases sharply. Accordingly, we also see a sharp increase in vaccine hesitancy related topics in news articles. However, in January 2021, there has been a decrease in the vaccine hesitancy score, which will give some relief to the health administrators and regulators. Our findings further show that as we get closer to the breakthrough of effective Covid-19 vaccines, new logistical challenges continue to rise, even in recent months.

2017 ◽  
Author(s):  
Sabrina Jaeger ◽  
Simone Fulle ◽  
Samo Turk

Inspired by natural language processing techniques we here introduce Mol2vec which is an unsupervised machine learning approach to learn vector representations of molecular substructures. Similarly, to the Word2vec models where vectors of closely related words are in close proximity in the vector space, Mol2vec learns vector representations of molecular substructures that are pointing in similar directions for chemically related substructures. Compounds can finally be encoded as vectors by summing up vectors of the individual substructures and, for instance, feed into supervised machine learning approaches to predict compound properties. The underlying substructure vector embeddings are obtained by training an unsupervised machine learning approach on a so-called corpus of compounds that consists of all available chemical matter. The resulting Mol2vec model is pre-trained once, yields dense vector representations and overcomes drawbacks of common compound feature representations such as sparseness and bit collisions. The prediction capabilities are demonstrated on several compound property and bioactivity data sets and compared with results obtained for Morgan fingerprints as reference compound representation. Mol2vec can be easily combined with ProtVec, which employs the same Word2vec concept on protein sequences, resulting in a proteochemometric approach that is alignment independent and can be thus also easily used for proteins with low sequence similarities.


IoT ◽  
2020 ◽  
Vol 1 (2) ◽  
pp. 218-239 ◽  
Author(s):  
Ravikumar Patel ◽  
Kalpdrum Passi

In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural language processing techniques, sentiment polarity was calculated based on the emotion words detected in the user tweets. The dataset is normalized to be used by machine learning algorithms and prepared using natural language processing techniques like word tokenization, stemming and lemmatization, part-of-speech (POS) tagger, name entity recognition (NER), and parser to extract emotions for the textual data from each tweet. This approach is implemented using Python programming language and Natural Language Toolkit (NLTK). A derived algorithm extracts emotional words using WordNet with its POS (part-of-speech) for the word in a sentence that has a meaning in the current context, and is assigned sentiment polarity using the SentiWordNet dictionary or using a lexicon-based method. The resultant polarity assigned is further analyzed using naïve Bayes, support vector machine (SVM), K-nearest neighbor (KNN), and random forest machine learning algorithms and visualized on the Weka platform. Naïve Bayes gives the best accuracy of 88.17% whereas random forest gives the best area under the receiver operating characteristics curve (AUC) of 0.97.


2020 ◽  
Vol 25 (4) ◽  
pp. 174-189 ◽  
Author(s):  
Guillaume  Palacios ◽  
Arnaud Noreña ◽  
Alain Londero

Introduction: Subjective tinnitus (ST) and hyperacusis (HA) are common auditory symptoms that may become incapacitating in a subgroup of patients who thereby seek medical advice. Both conditions can result from many different mechanisms, and as a consequence, patients may report a vast repertoire of associated symptoms and comorbidities that can reduce dramatically the quality of life and even lead to suicide attempts in the most severe cases. The present exploratory study is aimed at investigating patients’ symptoms and complaints using an in-depth statistical analysis of patients’ natural narratives in a real-life environment in which, thanks to the anonymization of contributions and the peer-to-peer interaction, it is supposed that the wording used is totally free of any self-limitation and self-censorship. Methods: We applied a purely statistical, non-supervised machine learning approach to the analysis of patients’ verbatim exchanged on an Internet forum. After automated data extraction, the dataset has been preprocessed in order to make it suitable for statistical analysis. We used a variant of the Latent Dirichlet Allocation (LDA) algorithm to reveal clusters of symptoms and complaints of HA patients (topics). The probability of distribution of words within a topic uniquely characterizes it. The convergence of the log-likelihood of the LDA-model has been reached after 2,000 iterations. Several statistical parameters have been tested for topic modeling and word relevance factor within each topic. Results: Despite a rather small dataset, this exploratory study demonstrates that patients’ free speeches available on the Internet constitute a valuable material for machine learning and statistical analysis aimed at categorizing ST/HA complaints. The LDA model with K = 15 topics seems to be the most relevant in terms of relative weights and correlations with the capability to individualizing subgroups of patients displaying specific characteristics. The study of the relevance factor may be useful to unveil weak but important signals that are present in patients’ narratives. Discussion/Conclusion: We claim that the LDA non-supervised approach would permit to gain knowledge on the patterns of ST- and HA-related complaints and on patients’ centered domains of interest. The merits and limitations of the LDA algorithms are compared with other natural language processing methods and with more conventional methods of qualitative analysis of patients’ output. Future directions and research topics emerging from this innovative algorithmic analysis are proposed.


Author(s):  
Anurag Langan

Grading student answers is a tedious and time-consuming task. A study had found that almost on average around 25% of a teacher's time is spent in scoring the answer sheets of students. This time could be utilized in much better ways if computer technology could be used to score answers. This system will aim to grade student answers using the various Natural Language processing techniques and Machine Learning algorithms available today.


2021 ◽  
Vol 9 (2) ◽  
pp. 313-317
Author(s):  
Vanitha kakollu, Et. al.

Today we have large amounts of textual data to be processed and the procedure involved in classifying text is called natural language processing. The basic goal is to identify whether the text is positive or negative. This process is also called as opinion mining. In this paper, we consider three different data sets and perform sentiment analysis to find the test accuracy. We have three different cases- 1. If the text contains more positive data than negative data then the overall result leans towards positive. 2. If the text contains more negative data than positive data then the overall result leans towards negative. 3. In the final case the number or positive and negative data is nearly equal then we have a neutral output. For sentiment analysis we have several steps like term extraction, feature selection, sentiment classification etc. In this paper the key point of focus is on sentiment analysis by comparing the machine learning approach and lexicon-based approach and their respective accuracy loss graphs.


Author(s):  
Gleb Danilov ◽  
Alexandra Kosyrkova ◽  
Maria Shults ◽  
Semen Melchenko ◽  
Tatyana Tsukanova ◽  
...  

Unstructured medical text labeling technologies are expected to be highly demanded since the interest in artificial intelligence and natural language processing arises in the medical domain. Our study aimed to assess the agreement between experts who judged on the fact of pulmonary embolism (PE) in neurosurgical cases retrospectively based on electronic health records and assess the utility of the machine learning approach to automate this process. We observed a moderate agreement between 3 independent raters on PE detection (Light’s kappa = 0.568, p = 0). Labeling sentences with the method we proposed earlier might improve the machine learning results (accuracy = 0.97, ROC AUC = 0.98) even in those cases that could not be agreed between 3 independent raters. Medical text labeling techniques might be more efficient when strict rules and semi-automated approaches are implemented. Machine learning might be a good option for unstructured text labeling when the reliability of textual data is properly addressed. This project was supported by the RFBR grant 18-29-22085.


Author(s):  
Mathias-Felipe de-Lima-Santos ◽  
Wilson Ceron

In recent years, news media has been greatly disrupted by the potential of technologically driven approaches in the creation, production, and distribution of news products and services. Artificial intelligence (AI) has emerged from the realm of science fiction and has become a very real tool that can aid society in addressing many issues, including the challenges faced by the news industry. The ubiquity of computing has become apparent and has demonstrated the different approaches that can be achieved using AI. We analyzed the news industry’s AI adoption based on the seven subfields of AI: (i) machine learning; (ii) computer vision (CV); (iii) speech recognition; (iv) natural language processing (NLP); (v) planning, scheduling, and optimization; (vi) expert systems; and (vii) robotics. Our findings suggest that three subfields are being developed more in the news media: machine learning, computer vision, as well as planning, scheduling, and optimization. Other areas have not been fully deployed in the journalistic field. Most AI news projects rely on funds from tech companies such as Google. This limits AI’s potential to a small number of players in the news industry. We make conclusions by providing examples of how these subfields are being developed in journalism and present an agenda for future research.


Author(s):  
Charan Lokku

Abstract: To avoid fraudulent Job postings on the internet, we target to minimize the number of such frauds through the Machine Learning approach to predict the chances of a job being fake so that the candidate can stay alert and make informed decisions if required. The model will use NLP to analyze the sentiments and pattern in the job posting and TF-IDF vectorizer for feature extraction. In this model, we are going to use Synthetic Minority Oversampling Technique (SMOTE) to balance the data and for classification, we used Random Forest to predict output with high accuracy, even for the large dataset it runs efficiently, and it enhances the accuracy of the model and prevents the overfitting issue. The final model will take in any relevant job posting data and produce a result determining whether the job is real or fake. Keywords: Natural Language Processing (NLP), Term Frequency-Inverse Document Frequency (TF-IDF), Synthetic Minority Oversampling Technique (SMOTE), Random Forest.


Sign in / Sign up

Export Citation Format

Share Document