scholarly journals Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

Author(s):  
Michał Łopuszyński ◽  
Łukasz Bolikowski
2020 ◽  
Vol 34 (08) ◽  
pp. 13369-13381
Author(s):  
Shivashankar Subramanian ◽  
Ioana Baldini ◽  
Sushma Ravichandran ◽  
Dmitriy A. Katz-Rogozhnikov ◽  
Karthikeyan Natesan Ramamurthy ◽  
...  

More than 200 generic drugs approved by the U.S. Food and Drug Administration for non-cancer indications have shown promise for treating cancer. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents a major opportunity to rapidly improve outcomes for cancer patients and reduce healthcare costs. In many cases, there is already evidence of efficacy for cancer, but trying to manually extract such evidence from the scientific literature is intractable. In this emerging applications paper, we introduce a system to automate non-cancer generic drug evidence extraction from PubMed abstracts. Our primary contribution is to define the natural language processing pipeline required to obtain such evidence, comprising the following modules: querying, filtering, cancer type entity extraction, therapeutic association classification, and study type classification. Using the subject matter expertise on our team, we create our own datasets for these specialized domain-specific tasks. We obtain promising performance in each of the modules by utilizing modern language processing techniques and plan to treat them as baseline approaches for future improvement of individual components.


Author(s):  
Toluwase Asubiaro

This study investigated if there is a difference in the number of articles, datasets and computer codes that foreign and Nigerian authors of scientific publications on natural language processing (NLP) of Nigerian languages deposited in digital archives. Relevant articles were systematically retrieved from Google, Web of Science and Scopus. Authorship type and data archiving information was extracted from the full text of the relevant publications. Result shows that papers with foreign authorship (80.4%) published their articles in non-commercial repositories, more than papers with Nigerian authorship (55.3%). Similarly, few papers with foreign authorship deposited research data (19.1%) and computer codes (10.4%), while none of the papers with Nigerian authorship did. It was recommended that librarians in Nigeria should create awareness on the benefits of digital archiving and open science. Cette étude a eximané les différences dans le nombre d'articles, d'ensembles de données et de codes informatiques dans les articles scientifiques sur le traitement du langage naturel que les auteurs nigériens et les auteurs étrangers ont soumis dans les dépôts d'autoarchivage. Les articles pertinents ont été systématiquement extraits de Google, Web of Science et Scopus. Les informations relatives au type d'auteur et à l'archivage des données ont été extraites du texte intégral des publications pertinentes. Les résultats montrent que les articles écrits par des auteurs étrangers ont davantage publié leurs articles dans des dépôts non commerciaux (80,4%) que les auteurs nigériens (55,3%). Peu d'auteurs étrangers ont déposé des données de recherche (19,1%) et des codes informatiques (10,4%) tandis qu'aucun auteur nigérien ne l'a fait. Ces résultats démontrent l'importance de la sensibilisation aux avantages des dépôt d'archivage et de la science ouverte pour les bibliothécaires nigériens.


Author(s):  
Nidheesh Melethadathil ◽  
Jaap Heringa ◽  
Bipin Nair ◽  
Shyam Diwakar

<strong>With the rapid growth in the numbers of scientific publications in domains such as neuroscience and medicine, visually interlinking documents in online databases such as PubMed with the purpose of indicating the context of a query results can improve the multi-disciplinary relevance of the search results. Translational medicine and systems biology rely on studies relating basic sciences to applications, often going through multiple disciplinary domains. This paper focuses on the design and development of a new scientific document visualization platform, which allows inferring translational aspects in biosciences within published articles using machine learning and natural language processing (NLP) methods. From online databases, this software platform effectively extracted relationship connections between multiple sub-domains within neuroscience derived from abstracts related to user query. In our current implementation, the document visualization platform employs two clustering algorithms namely Suffix Tree Clustering (STC) and LINGO. Clustering quality was improved by mapping top-ranked cluster labels derived from an UMLS-Metathesaurus using a scoring function. To avoid non-clustered documents, an iterative scheme, called auto-clustering was developed and this allowed mapping previously uncategorized documents during the initial grouping process to relevant clusters. The efficacy of this document clustering and visualization platform was evaluated by expert-based validation of clustering results obtained with unique search terms.  Compared to normal clustering, auto-clustering demonstrated better efficacy by generating larger numbers of unique and relevant cluster labels. Using this implementation, a Parkinson’s disease systems theory model was developed and studies based on user queries related to neuroscience and oncology have been showcased as applications.</strong>


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Sign in / Sign up

Export Citation Format

Share Document