Evaluation of Twitter data for an emerging crisis: an application to the first wave of COVID-19 in the UK

AbstractIn the absence of nationwide mass testing for an emerging health crisis, alternative approaches could provide necessary information efficiently to aid policy makers and health bodies when dealing with a pandemic. The following work presents a methodology by which Twitter data surrounding the first wave of the COVID-19 pandemic in the UK is harvested and analysed using two main approaches. The first is an investigation into localized outbreak predictions by developing a prototype early-warning system using the distribution of total tweet volume. The temporal lag between the rises in the number of COVID-19 related tweets and officially reported deaths by Public Health England (PHE) is observed to be 6–27 days for various UK cities which matches the temporal lag values found in the literature. To better understand the topics of discussion and attitudes of people surrounding the pandemic, the second approach is an in-depth behavioural analysis assessing the public opinion and response to government policies such as the introduction of face-coverings. Using topic modelling, nine distinct topics are identified within the corpus of COVID-19 tweets, of which the themes ranged from retail to government bodies. Sentiment analysis on a subset of mask related tweets revealed sentiment spikes corresponding to major news and announcements. A Named Entity Recognition (NER) algorithm is trained and applied in a semi-supervised manner to recognise tweets containing location keywords within the unlabelled corpus and achieved a precision of 81.6%. Overall, these approaches allowed extraction of temporal trends relating to PHE case numbers, popular locations in relation to the use of face-coverings, and attitudes towards face-coverings, vaccines and the national ‘Test and Trace’ scheme.

Download Full-text

Simple Features for Strong Performance on Named Entity Recognition in Code-Switched Twitter Data

10.18653/v1/w18-3213 ◽

2018 ◽

Author(s):

Devanshu Jain ◽

Maria Kustikova ◽

Mayank Darbari ◽

Rishabh Gupta ◽

Stephen Mayhew

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Twitter Data ◽

Strong Performance ◽

Simple Features

Download Full-text

Developing and Deploying Algorithms for Information Extraction using Classification Measures for Named Entity Recognition

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i10.235248 ◽

2018 ◽

Vol 6 (10) ◽

pp. 235-248

Author(s):

Rehan Khan ◽

A.J. Singh

Keyword(s):

Information Extraction ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

Arabic named entity recognition using optimized feature sets

10.3115/1613715.1613755 ◽

2008 ◽

Cited By ~ 38

Author(s):

Yassine Benajiba ◽

Mona Diab ◽

Paolo Rosso

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Feature Sets ◽

Named Entity

Download Full-text

Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition

10.3115/v1/p14-5003 ◽

2014 ◽

Cited By ~ 25

Author(s):

Jana Straková ◽

Milan Straka ◽

Jan Hajič

Keyword(s):

Open Source ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Pos Tagging

Download Full-text

Semantische Suche nach wissenschaftlichen Videos – Automatische Verschlagwortung durch Named Entity Recognition

Zeitschrift für Bibliothekswesen und Bibliographie ◽

10.3196/18642950146145154 ◽

2014 ◽

Vol 61 (4-5) ◽

pp. 254-258 ◽

Cited By ~ 2

Author(s):

Margret Plank ◽

Sven Strobel

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

Developing a RadLex-based Named Entity Recognition Tool for Mining Textual Radiology Reports (Preprint)

10.2196/preprints.25378 ◽

2020 ◽

Author(s):

Shintaro Tsuji ◽

Andrew Wen ◽

Naoki Takahashi ◽

Hongjian Zhang ◽

Katsuhiko Ogasawara ◽

...

Keyword(s):

Named Entity Recognition ◽

Noun Phrases ◽

General Purpose ◽

Entity Recognition ◽

Free Text ◽

Clinical Text ◽

Named Entity ◽

Radiology Reports ◽

Two Measures ◽

F Measure

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.

Download Full-text

Arabic Named Entity Recognition on Social Media based on feature selection techniques usi ng SVM-RFE

2020 Fourth International Conference On Intelligent Computing in Data Sciences (ICDS) ◽

10.1109/icds50568.2020.9268762 ◽

2020 ◽

Author(s):

Brahim AIT BEN ALI ◽

Soukaina MIHI ◽

Ismail EL BAZI ◽

Nabil LAACHFOUBI

Keyword(s):

Social Media ◽

Feature Selection ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Feature Selection Techniques

Download Full-text

Multilingual Named Entity Recognition Model for Indonesian Health Insurance Question Answering System

2020 3rd International Conference on Information and Communications Technology (ICOIACT) ◽

10.1109/icoiact50329.2020.9332027 ◽

2020 ◽

Author(s):

Budi Sulistiyo Jati ◽

ST Widyawan ◽

S.T. Muhammad Nur Rizal

Keyword(s):

Health Insurance ◽

Question Answering ◽

Named Entity Recognition ◽

Entity Recognition ◽

Question Answering System ◽

Recognition Model ◽

Named Entity

Download Full-text

Identification of Food Quality Descriptors in Customer Chat Conversations using Named Entity Recognition

8th ACM IKDD CODS and 26th COMAD ◽

10.1145/3430984.3431041 ◽

2020 ◽

Author(s):

Aditya Kiran Brahma ◽

Prathyush Potluri ◽

Meghana Kanapaneni ◽

Sumanth Prabhu ◽

Sundeep Teki

Keyword(s):

Food Quality ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

An Annotated Corpus of Crime-Related Portuguese Documents for NLP and Machine Learning Processing

Data ◽

10.3390/data6070071 ◽

2021 ◽

Vol 6 (7) ◽

pp. 71

Author(s):

Gonçalo Carnaz ◽

Mário Antunes ◽

Vitor Beires Nogueira

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Automatic Identification ◽

Named Entities ◽

Related Data ◽

Named Entity ◽

Chain Of Custody ◽

Evidence Collection

Criminal investigations collect and analyze the facts related to a crime, from which the investigators can deduce evidence to be used in court. It is a multidisciplinary and applied science, which includes interviews, interrogations, evidence collection, preservation of the chain of custody, and other methods and techniques of investigation. These techniques produce both digital and paper documents that have to be carefully analyzed to identify correlations and interactions among suspects, places, license plates, and other entities that are mentioned in the investigation. The computerized processing of these documents is a helping hand to the criminal investigation, as it allows the automatic identification of entities and their relations, being some of which difficult to identify manually. There exists a wide set of dedicated tools, but they have a major limitation: they are unable to process criminal reports in the Portuguese language, as an annotated corpus for that purpose does not exist. This paper presents an annotated corpus, composed of a collection of anonymized crime-related documents, which were extracted from official and open sources. The dataset was produced as the result of an exploratory initiative to collect crime-related data from websites and conditioned-access police reports. The dataset was evaluated and a mean precision of 0.808, recall of 0.722, and F1-score of 0.733 were obtained with the classification of the annotated named-entities present in the crime-related documents. This corpus can be employed to benchmark Machine Learning (ML) and Natural Language Processing (NLP) methods and tools to detect and correlate entities in the documents. Some examples are sentence detection, named-entity recognition, and identification of terms related to the criminal domain.

Download Full-text