scholarly journals deepMINE - Natural Language Processing based Automatic Literature Mining and Research Summarization for Early-Stage Comprehension in Pandemic Situations specifically for COVID-19

2020 ◽  
Author(s):  
Bhrugesh Joshi ◽  
Vishvajit Bakarola ◽  
Parth Shah ◽  
Ramar Krishnamurthy

AbstractThe recent pandemic created due to Novel Coronavirus (nCOV-2019) from Wuhan, China demanding a large scale of a general health emergency. This demands novel research on the vaccine to fight against this pandemic situation, re-purposing of the existing drugs, phylogenetic analysis to identify the origin and determine the similarity with other known viruses, etc. The very preliminary task from the research community is to analyze the wide verities of existing related research articles, which is very much time-consuming in such situations where each minute counts for saving hundreds of human lives. The entire manual processing is even lower down the efficiency in mining the information. We have developed a complete automatic literature mining system that delivers efficient and fast mining from existing biomedical literature databases. With the help of modern-day deep learning algorithms, our system also delivers a summarization of important research articles that provides ease and fast comprehension of critical research articles. The system is currently scanning nearly 1,46,115,136 English words from 29,315 research articles in not greater than 1.5 seconds with multiple search keywords. Our research article presents the criticality of literature mining, especially in pandemic situations with the implementation and online deployment of the system.

2017 ◽  
Vol 15 (05) ◽  
pp. 1740005 ◽  
Author(s):  
Dongdong Sun ◽  
Minghui Wang ◽  
Ao Li

Due to the importance of post-translational modifications (PTMs) in human health and diseases, PTMs are regularly reported in the biomedical literature. However, the continuing and rapid pace of expansion of this literature brings a huge challenge for researchers and database curators. Therefore, there is a pressing need to aid them in identifying relevant PTM information more efficiently by using a text mining system. So far, only a few web servers are available for mining information of a very limited number of PTMs, which are based on simple pattern matching or pre-defined rules. In our work, in order to help researchers and database curators easily find and retrieve PTM information from available text, we have developed a text mining tool called MPTM, which extracts and organizes valuable knowledge about 11 common PTMs from abstracts in PubMed by using relations extracted from dependency parse trees and a heuristic algorithm. It is the first web server that provides literature mining service for hydroxylation, myristoylation and GPI-anchor. The tool is also used to find new publications on PTMs from PubMed and uncovers potential PTM information by large-scale text analysis. MPTM analyzes text sentences to identify protein names including substrates and protein-interacting enzymes, and automatically associates them with the UniProtKB protein entry. To facilitate further investigation, it also retrieves PTM-related information, such as human diseases, Gene Ontology terms and organisms from the input text and related databases. In addition, an online database (MPTMDB) with extracted PTM information and a local MPTM Lite package are provided on the MPTM website. MPTM is freely available online at http://bioinformatics.ustc.edu.cn/mptm/ and the source codes are hosted on GitHub: https://github.com/USTC-HILAB/MPTM .


2021 ◽  
Vol 7 ◽  
Author(s):  
Md Nizamul Hoque Mojumder ◽  
Md Ashraf Ahmed ◽  
Arif Mohaimin Sadri

The outbreak and emergence of the novel coronavirus (COVID-19) pandemic affected every aspect of human activity, especially the transportation sector. Many cities adopted unprecedented lockdown strategies that resulted in significant nonessential mobility restrictions; hence, transportation network companies (TNCs) have experienced major shifts in their operation. Millions of people alone in the USA have filed for unemployment in the early stage of the COVID-19 outbreak, many belonging to self-employed groups such as Uber/Lyft drivers. Due to unprecedented scenarios, both drivers and passengers experienced overwhelming challenges that might elongate the recovery process. The goal of this study is to understand the risk, response, and challenges associated with ridesharing (TNCs, drivers, and passengers) during the COVID-19 pandemic situation. As such, large-scale crowdsourced data were collected from online ridesharing forums (i.e., Uber Drivers) since the emergence of COVID-19 (January 25–May 10, 2020). Word bigrams, word frequency heatmaps, and topic models are among the different natural language processing and text-mining techniques used to preprocess the data and classify risk perception, risk-taking, or risk-averting behaviors associated with ridesharing during a major disease outbreak. Results indicate higher levels of concern about economic disruption, availability of stimulus checks, new employment opportunities, hospitalization, pandemic, personal hygiene, and staying at home. In addition, unprecedented challenges due to unemployment and the risk and uncertainties in the required personal protective actions against spreading the disease due to sharing are among the major interactions. The proposed text-based data analytics of the ridesharing risk communication dynamics during this pandemic will help to identify unobserved factors inadvertently affecting the TNCs as well as the users (drivers and passengers) and identify more efficient strategies and alternatives for the forthcoming “new normal” of the current pandemic and the ones in the future. The study will also guide us toward understanding how efficiently online social interaction outlets can be designed and implemented more effectively during a major crisis and how to leverage such platforms for providing guidelines during emergencies to minimize transmission of disease due to shared travel.


2021 ◽  
Author(s):  
Dong Liu ◽  
Chi Kong Tse ◽  
Rosa H. M. Chan ◽  
Choujun Zhan

Abstract Approval of emergency use of the Novel Coronavirus Disease 2019 (COVID-19) vaccines in many countries has brought hope to ending the COVID-19 pandemic sooner. Considering the limited vaccine supply in the early stage of COVID-19 vaccination programs in most countries, a highly relevant question to ask is: who should get vaccinated first? In this article we propose a network information- driven vaccination strategy where a small number of people in a network (population) are categorized, according to a few key network properties, into priority groups. Using a network-based SEIR model for simulating the pandemic progression, the network information-driven vaccination strategy is compared with a random vaccination strategy. Results for both large-scale synthesized networks and real social networks have demonstrated that the network information-driven vaccination strategy can significantly reduce the cumulative number of infected individuals and lead to a more rapid containment of the pandemic. The results provide insight for policymakers in designing an effective early-stage vaccination plan.


Author(s):  
M. Narayanaswamy ◽  
K. E. Ravikumar ◽  
Z. Z. Hu ◽  
K. Vijay-Shanker ◽  
C. H. Wu

Protein posttranslational modification (PTM) is a fundamental biological process, and currently few text mining systems focus on PTM information extraction. A rule-based text mining system, RLIMS-P (Rule-based LIterature Mining System for Protein Phosphorylation), was recently developed by our group to extract protein substrate, kinase and phosphorylated residue/sites from MEDLINE abstracts. This chapter covers the evaluation and benchmarking of RLIMS-P and highlights some novel and unique features of the system. The extraction patterns of RLIMS-P capture a range of lexical, syntactic and semantic constraints found in sentences expressing phosphorylation information. RLIMS-P also has a second phase that puts together information extracted from different sentences. This is an important feature since it is not common to find the kinase, substrate and site of phosphorylation to be mentioned in the same sentence. Small modifications to the rules for extraction of phosphorylation information have also allowed us to develop systems for extraction of two other PTMs, acetylation and methylation. A thorough evaluation of these two systems needs to be completed. Finally, an online version of RLIMSP with enhanced functionalities, namely, phosphorylation annotation ranking, evidence tagging, and protein entity mapping, has been developed and is publicly accessible.


2021 ◽  
Author(s):  
Ziheng Zhang ◽  
Feng Han ◽  
Hongjian Zhang ◽  
Tomohiro Aoki ◽  
Katsuhiko Ogasawara

BACKGROUND Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, serve as the foundation for various natural language processing (NLP) applications, such as biomedical information retrieval, relation extraction, and recommendation systems. OBJECTIVE The objective of this study is to examine how changes in the ratio of biomedical domain to general domain data in the corpus affect the extraction of similar biomedical terms using Word2vec. METHODS We downloaded abstracts of 214892 articles from PubMed Central (PMC) and the 3.9 GB Billion Word (BW) benchmark corpus from the computer science community. The datasets were preprocessed and grouped into 11 corpora based on the ratio of BW to PMC, ranging from 0:10 to 10:0, and then Word2vec models were trained on these corpora. The cosine similarities between the biomedical terms obtained from the Word2vec models were then compared in each model. RESULTS The results indicated that the models trained with both BW and PMC data outperformed the model trained only with medical data. The similarity between the biomedical terms extracted by the Word2vec model increased, when the ratio of biomedical domain to general domain data was 3: 7 to 5: 5. CONCLUSIONS This study allows NLP researchers to apply Word2vec based on more information and increase the similarity of extracted biomedical terms to improve their effectiveness in NLP applications, such as biomedical information extraction.


2014 ◽  
Vol 687-691 ◽  
pp. 1149-1152
Author(s):  
Jing Peng ◽  
Hong Min Sun

The number of biomedical literatures is growing rapidly, and biomedical literature mining is becoming essential. An approach for article processing in text preprocessing is proposed in order to improve the performance of biomedical literature mining. This approach combines the Web and corpus counts in order to eliminate the limitations of noise data of the Web. We experimentally showed that the performance of the combination models is the best comparing to the pure Web and corpus models. We achieve the best precision of 89.1% on all article forms and 88.7% article loss class.


Names ◽  
2021 ◽  
Vol 69 (3) ◽  
pp. 16-27
Author(s):  
Rogelio Nazar ◽  
Irene Renau ◽  
Nicolas Acosta ◽  
Hernan Robledo ◽  
Maha Soliman ◽  
...  

This paper presents a series of methods for automatically determining the gender of proper names, based on their co-occurrence with words and grammatical features in a large corpus. Although the results obtained were for Spanish given names, the method presented here can be easily replicated and used for names in other languages. Most methods reported in the literature use pre-existing lists of first names that require costly manual processing and tend to become quickly outdated. Instead, we propose using corpora. Doing so offers the possibility of obtaining real and up-to-date name-gender links. To test the effectiveness of our method, we explored various machine-learning methods as well as another method based on simple frequency of co-occurrence. The latter produced the best results: 93% precision and 88% recall on a database of ca. 10,000 mixed names. Our method can be applied to a variety of natural language processing tasks such as information extraction, machine translation, anaphora resolution or large-scale delivery or email correspondence, among others.


2021 ◽  
Author(s):  
Dong Liu ◽  
Chi K. Tse ◽  
Rosa Ho Man Chan ◽  
Choujun Zhan

Approval of emergency use of the Novel Coronavirus Disease 2019 (COVID-19) vaccines in many countries has brought hope to ending the COVID-19 pandemic sooner. Considering the limited vaccine supply in the early stage of COVID-19 vaccination programs in most countries, a highly relevant question to ask is: who should get vaccinated first? In this article we propose a network information-driven vaccination strategy where a small number of people in a network (population) are categorized, according to a few key network properties, into priority groups. Using a network-based SEIR model for simulating the pandemic progression, the network information-driven vaccination strategy is compared with a random vaccination strategy. Results for both large-scale synthesized networks and real social networks have demonstrated that the network information-driven vaccination strategy can significantly reduce the cumulative number of infected individuals and lead to a more rapid containment of the pandemic. The results provide insight for policymakers in designing an effective early-stage vaccination plan.


Author(s):  
Prashant Srivastava ◽  
Saptarshi Bej ◽  
Kristina Yordanova ◽  
Olaf Wolkenhauer

For any molecule, network, or process of interest, to keep up with new publications on these, is becoming increasingly difficult. For many cellular processes, molecules and their interactions that need to be considered can be very large. Automated mining of publications can support large scale molecular interaction maps and database curation. Text mining and Natural Language Processing (NLP)-based techniques are finding their applications in mining the biological literature, handling problems such as Named Entity Recognition (NER) and Relationship Extraction (RE). Both rule-based and machine learning (ML)-based NLP approaches have been popular in this context, with multiple research and review articles examining the scope of such models in Biological Literature Mining (BLM). In this review article, we explore self-attention based models, a special type of neural network (NN)-based architectures that have recently revitalized the field of NLP, applied to biological texts. We cover self-attention models operating either at a sentence level or an abstract level, in the context of molecular interaction extraction, published from 2019 onwards. We conduct a comparative study of the models in terms of their architecture. Moreover, we also discuss some limitations in the field of BLM that identifies opportunities for the extraction of molecular interactions from biological text.


Sign in / Sign up

Export Citation Format

Share Document