scholarly journals From SARS to COVID-19: A Bibliometric study on Emerging Infectious Diseases with Natural Language Processing technologies

Author(s):  
Yinjun Hu ◽  
Mengmeng Chen ◽  
Qian Wang ◽  
Yue Zhu ◽  
Bei Wang ◽  
...  

Abstract [Background] On January 7, 2020, the novel coronavirus named "COVID-19" aroused worldwide concern was identified by Chinese scientists. Many related research works were developed for the emerging, rapidly evolving situation of this epidemic. This study aimed to analyze the research literatures on SARS, MERS and COVID-19 to retrieve important information for virologists, epidemiologist and policy decision makers. [Methods] In this study, we collected data from multi data sources and compared bibliometrics indices among COVID-19, Severe Acute Respiratory Syndrome (SARS), and Middle East Respiratory Syndrome (MERS) up to March 25, 2020. In purpose to extract data in corresponding quantity and scale, the volume of search results will be balance with the limitation of publication years. For further analysis, we extracted 1,480 documents from 1,671 candidates with Natural Language Processing technologies. [Results] In total, 13,945 research literatures of 7 datasets were selected for analysis. Unlike other topics, research passion on epidemic may reach its peak at the first year the outbreak happens. The document type distribution of SARS, MERS and COVID-19 are nearly the same (less than 6 point difference for each type), however, there were notable growth in the research qualities during these three epidemics (3.68, 6.63 and 11.35 for Field-Weighted Citation Impact scores). Asian countries has less international collaboration (less than 35.1\%) than the Occident (more than 49.5\%), which should be noticed as same as research itself. [Conclusions] We found that research passion on epidemics may always reach its peak at the first year after outburst, however, the peak of research on MERS appeared at the third year because of its outburst of reproduction in 2015. For the research quality, although we did better in research qualities than before especially on COVID-19, research on epidemics not started from our own country should not be looked down. Another important effective strategy for enhancing epidemic prevention for China and other Asian countries is to continue strengthening international collaboration.

2018 ◽  
Vol 11 (3) ◽  
pp. 1-25
Author(s):  
Leonel Figueiredo de Alencar ◽  
Bruno Cuconato ◽  
Alexandre Rademaker

ABSTRACT: One of the prerequisites for many natural language processing technologies is the availability of large lexical resources. This paper reports on MorphoBr, an ongoing project aiming at building a comprehensive full-form lexicon for morphological analysis of Portuguese. A first version of the resource is already freely available online under an open source, free software license. MorphoBr combines analogous free resources, correcting several thousand errors and gaps, and systematically adding new entries. In comparison to the integrated resources, lexical entries in MorphoBr follow a more user-friendly format, which can be straightforwardly compiled into finite-state transducers for morphological analysis, e.g. in the context of syntactic parsing with a grammar in the LFG formalism using the XLE system. MorphoBr results from a combination of computational techniques. Errors and the more obvious gaps in the integrated resources were automatically corrected with scripts. However, MorphoBr's main contribution is the expansion in the inventory of nouns and adjectives. This was carried out by systematically modeling diminutive formation in the paradigm of finite-state morphology. This allowed MorphoBr to significantly outperform analogous resources in the coverage of diminutives. The first evaluation results show MorphoBr to be a promising initiative which will directly contribute to the development of more robust natural language processing tools and applications which depend on wide-coverage morphological analysis.KEYWORDS: computational linguistics; natural language processing; morphological analysis; full-form lexicon; diminutive formation. RESUMO: Um dos pré-requisitos para muitas tecnologias de processamento de linguagem natural é a disponibilidade de vastos recursos lexicais. Este artigo trata do MorphoBr, um projeto em desenvolvimento voltado para a construção de um léxico de formas plenas abrangente para a análise morfológica do português. Uma primeira versão do recurso já está disponível gratuitamente on-line sob uma licença de software livre e de código aberto. MorphoBr combina recursos livres análogos, corrigindo vários milhares de erros e lacunas. Em comparação com os recursos integrados, as entradas lexicais do MorphoBr seguem um formato mais amigável, o qual pode ser compilado diretamente em transdutores de estados finitos para análise morfológica, por exemplo, no contexto do parsing sintático com uma gramática no formalismo da LFG usando o sistema XLE. MorphoBr resulta de uma combinação de técnicas computacionais. Erros e lacunas mais óbvias nos recursos integrados foram automaticamente corrigidos com scripts. No entanto, a principal contribuição de MorphoBr é a expansão no inventário de substantivos e adjetivos. Isso foi alcançado pela modelação sistemática da formação de diminutivos no paradigma da morfologia de estados finitos. Isso possibilitou a MorphoBr superar de forma significativa recursos análogos na cobertura de diminutivos. Os primeiros resultados de avaliação mostram que o MorphoBr constitui uma iniciativa promissora que contribuirá de forma direta para conferir robustez a ferramentas e aplicações de processamento de linguagem natural que dependem de análise morfológica de ampla cobertura.PALAVRAS-CHAVE: linguística computacional; processamento de linguagem natural; análise morfológica; léxico de formas plenas; formação de diminutivos.


Author(s):  
Sungkyu Park ◽  
Sungwon Han ◽  
Jeongwook Kim ◽  
Mir Majid Molaie ◽  
Hoang Dieu Vu ◽  
...  

BACKGROUND The novel coronavirus disease (hereafter COVID-19) caused by severe acute respiratory coronavirus 2 (SARS-CoV-2) has caused a global pandemic. During this time, a plethora of information regarding COVID-19 containing both false information (misinformation) and accurate information circulated on social media. The World Health Organization has declared a need to fight not only the pandemic but also the infodemic (a portmanteau of information and pandemic). In this context, it is critical to analyze the quality and veracity of information shared on social media and the evolution of discussions on major topics regarding COVID-19. OBJECTIVE This research characterizes risk communication patterns by analyzing public discourse on the novel coronavirus in four Asian countries that suffered outbreaks of varying degrees of severity: South Korea, Iran, Vietnam, and India. METHODS We collect tweets on COVID-19 posted from the four Asian countries from the start of their respective COVID-19 outbreaks in January until March 2020. We consult with locals and utilize relevant keywords from the local languages, following each country's tweet conventions. We then utilize a natural language processing (NLP) method to learn topics in an unsupervised fashion automatically. Finally, we qualitatively label the extracted topics to comprehend their semantic meanings. RESULTS We find that the official phases of the epidemic, as announced by the governments of the studied countries, do not align well with the online attention paid to COVID-19. Motivated by this misalignment, we develop a new natural language processing method to identify the transitions in topic phases and compare the identified topics across the four Asian countries. We examine the time lag between social media attention and confirmed patient counts. We confirm an inverse relationship between the tweet count and topic diversity. CONCLUSIONS Through the current research, we observe similarities and differences in the social media discourse on the pandemic in different Asian countries. We observe that once the daily tweet count hits its peak, the successive tweet count trend tends to decrease for all countries. This phenomenon aligns with the dynamics of the issue-attention cycle, an existing construct from communication theory conceptualizing how an issue rises and falls from public attention. Little work has been performed to identify topics in online risk communication by collectively considering temporal tweet trends in different countries. In this regard, if a critical piece of misinformation can be detected at an early stage in one country, it can be reported to prevent the spread of misinformation in other countries. Therefore, this work can help social media services, social media communicators, journalists, policymakers, and medical professionals fight the infodemic on a global scale. CLINICALTRIAL N/A


2002 ◽  
Vol 9 (5) ◽  
pp. 131-148
Author(s):  
HIROMI ITOH OZAKU ◽  
MASAO UTIYAMA ◽  
MASAKI MURATA ◽  
KIYOTAKA UCHIMOTO ◽  
HITOSHI ISAHARA

2010 ◽  
Vol 31 (3) ◽  
pp. 439-462 ◽  
Author(s):  
NICHOLAS D. DURAN ◽  
CHARLES HALL ◽  
PHILIP M. MCCARTHY ◽  
DANIELLE S. MCNAMARA

ABSTRACTThe words people use and the way they use them can reveal a great deal about their mental states when they attempt to deceive. The challenge for researchers is how to reliably distinguish the linguistic features that characterize these hidden states. In this study, we use a natural language processing tool called Coh-Metrix to evaluate deceptive and truthful conversations that occur within a context of computer-mediated communication. Coh-Metrix is unique in that it tracks linguistic features based on cognitive and social factors that are hypothesized to influence deception. The results from Coh-Metrix are compared to linguistic features reported in previous independent research, which used a natural language processing tool called Linguistic Inquiry and Word Count. The comparison reveals converging and contrasting alignment for several linguistic features and establishes new insights on deceptive language and its use in conversation.


2015 ◽  
Vol 1 (1) ◽  
Author(s):  
Keith W. Kintigh

AbstractTo address archaeology’s most pressing substantive challenges, researchers must discover, access, and extract information contained in the reports and articles that codify so much of archaeology’s knowledge. These efforts will require application of existing and emerging natural language processing technologies to extensive digital corpora. Automated classification can enable development of metadata needed for the discovery of relevant documents. Although it is even more technically challenging, automated extraction of and reasoning with information from texts can provide urgently needed access to contextualized information within documents. Effective automated translation is needed for scholars to benefit from research published in other languages.


Author(s):  
Alexey Kolesnikov ◽  
Pavel Kikin ◽  
Giovanni Niko ◽  
Elena Komissarova

Modern natural language processing technologies allow you to work with texts without being a specialist in linguistics. The use of popular data processing platforms for the development and use of linguistic models provides an opportunity to implement them in popular geographic information systems. This feature allows you to significantly expand the functionality and improve the accuracy of standard geocoding functions. The article provides a comparison of the most popular methods and software implemented on their basis, using the example of solving the problem of extracting geographical names from plain text. This option is an extended version of the geocoding operation, since the result also includes the coordinates of the point features of interest, but there is no need to separately extract the addresses or geographical names of the objects in advance from the text. In computer linguistics, this problem is solved by the methods of extracting named entities (Eng. named entity recognition). Among the most modern approaches to the final implementation, the authors of the article have chosen algorithms based on rules, models of maximum entropy and convolutional neural networks. The selected algorithms and methods were evaluated not only from the point of view of the accuracy of searching for geographical objects in the text, but also from the point of view of simplicity of refinement of the basic rules or mathematical models using their own text bodies. Reports on technological violations, accidents and incidents at the facilities of the heat and power complex of the Ministry of Energy of the Russian Federation were selected as the initial data for testing the abovementioned methods and software solutions. Also, a study is presented on a method for improving the quality of recognition of named entities based on additional training of a neural network model using a specialized text corpus.


Sign in / Sign up

Export Citation Format

Share Document