Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

More than 200 generic drugs approved by the U.S. Food and Drug Administration for non-cancer indications have shown promise for treating cancer. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents a major opportunity to rapidly improve outcomes for cancer patients and reduce healthcare costs. In many cases, there is already evidence of efficacy for cancer, but trying to manually extract such evidence from the scientific literature is intractable. In this emerging applications paper, we introduce a system to automate non-cancer generic drug evidence extraction from PubMed abstracts. Our primary contribution is to define the natural language processing pipeline required to obtain such evidence, comprising the following modules: querying, filtering, cancer type entity extraction, therapeutic association classification, and study type classification. Using the subject matter expertise on our team, we create our own datasets for these specialized domain-specific tasks. We obtain promising performance in each of the modules by utilizing modern language processing techniques and plan to treat them as baseline approaches for future improvement of individual components.

Download Full-text

Digital Archiving by Nigerian and Foreign Authors in a Low Resource Context: A Content Analysis of Publications on Natural Language Processing of Nigerian Languages

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais1175 ◽

2020 ◽

Author(s):

Toluwase Asubiaro

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Of Science ◽

Open Science ◽

Digital Archives ◽

Scientific Publications ◽

Digital Archiving ◽

Data Archiving ◽

Computer Codes

This study investigated if there is a difference in the number of articles, datasets and computer codes that foreign and Nigerian authors of scientific publications on natural language processing (NLP) of Nigerian languages deposited in digital archives. Relevant articles were systematically retrieved from Google, Web of Science and Scopus. Authorship type and data archiving information was extracted from the full text of the relevant publications. Result shows that papers with foreign authorship (80.4%) published their articles in non-commercial repositories, more than papers with Nigerian authorship (55.3%). Similarly, few papers with foreign authorship deposited research data (19.1%) and computer codes (10.4%), while none of the papers with Nigerian authorship did. It was recommended that librarians in Nigeria should create awareness on the benefits of digital archiving and open science. Cette étude a eximané les différences dans le nombre d'articles, d'ensembles de données et de codes informatiques dans les articles scientifiques sur le traitement du langage naturel que les auteurs nigériens et les auteurs étrangers ont soumis dans les dépôts d'autoarchivage. Les articles pertinents ont été systématiquement extraits de Google, Web of Science et Scopus. Les informations relatives au type d'auteur et à l'archivage des données ont été extraites du texte intégral des publications pertinentes. Les résultats montrent que les articles écrits par des auteurs étrangers ont davantage publié leurs articles dans des dépôts non commerciaux (80,4%) que les auteurs nigériens (55,3%). Peu d'auteurs étrangers ont déposé des données de recherche (19,1%) et des codes informatiques (10,4%) tandis qu'aucun auteur nigérien ne l'a fait. Ces résultats démontrent l'importance de la sensibilisation aux avantages des dépôt d'archivage et de la science ouverte pour les bibliothécaires nigériens.

Download Full-text

Towards robust tags for scientific publications from natural language processing tools and Wikipedia

International Journal on Digital Libraries ◽

10.1007/s00799-014-0132-0 ◽

2014 ◽

Vol 16 (1) ◽

pp. 25-36 ◽

Cited By ~ 1

Author(s):

Michał Łopuszyński ◽

Łukasz Bolikowski

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Scientific Publications

Download Full-text

Mining Inter-Relationships in Online Scientific Articles and its Visualization: Natural Language Processing for Systems Biology Modeling

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v15i02.9432 ◽

2019 ◽

Vol 15 (02) ◽

pp. 39

Author(s):

Nidheesh Melethadathil ◽

Jaap Heringa ◽

Bipin Nair ◽

Shyam Diwakar

Keyword(s):

Natural Language Processing ◽

Systems Biology ◽

Natural Language ◽

Language Processing ◽

Clustering Algorithms ◽

Scientific Publications ◽

User Query ◽

Online Databases ◽

Clustering Quality ◽

Document Visualization

<strong>With the rapid growth in the numbers of scientific publications in domains such as neuroscience and medicine, visually interlinking documents in online databases such as PubMed with the purpose of indicating the context of a query results can improve the multi-disciplinary relevance of the search results. Translational medicine and systems biology rely on studies relating basic sciences to applications, often going through multiple disciplinary domains. This paper focuses on the design and development of a new scientific document visualization platform, which allows inferring translational aspects in biosciences within published articles using machine learning and natural language processing (NLP) methods. From online databases, this software platform effectively extracted relationship connections between multiple sub-domains within neuroscience derived from abstracts related to user query. In our current implementation, the document visualization platform employs two clustering algorithms namely Suffix Tree Clustering (STC) and LINGO. Clustering quality was improved by mapping top-ranked cluster labels derived from an UMLS-Metathesaurus using a scoring function. To avoid non-clustered documents, an iterative scheme, called auto-clustering was developed and this allowed mapping previously uncategorized documents during the initial grouping process to relevant clusters. The efficacy of this document clustering and visualization platform was evaluated by expert-based validation of clustering results obtained with unique search terms. Compared to normal clustering, auto-clustering demonstrated better efficacy by generating larger numbers of unique and relevant cluster labels. Using this implementation, a Parkinson’s disease systems theory model was developed and studies based on user queries related to neuroscience and oncology have been showcased as applications.</strong>

Download Full-text

Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

Communications in Computer and Information Science - Theory and Practice of Digital Libraries -- TPDL 2013 Selected Workshops ◽

10.1007/978-3-319-14226-5_3 ◽

2014 ◽

pp. 16-27

Author(s):

Michał Łopuszyński ◽

Łukasz Bolikowski

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Scientific Publications

Download Full-text

AN ANALYSIS OF SCIENTIFIC PUBLICATIONS ON "DECISION SUPPORT SYSTEMS" AND "BUSINESS INTELLIGENCE" REGARDING RELATED CONCEPTS USING NATURAL LANGUAGE PROCESSING TOOLS

10.12948/ie2019.03.03 ◽

2019 ◽

Cited By ~ 1

Author(s):

Daniel HOMOCIANU ◽

Alexandru SIRETEANU ◽

Octavian DOSPINESCU ◽

Dinu AIRINEI

Keyword(s):

Natural Language Processing ◽

Decision Support ◽

Natural Language ◽

Decision Support Systems ◽

Language Processing ◽

Business Intelligence ◽

Support Systems ◽

Scientific Publications

Download Full-text

Panorama do tema qualidade de vida nas publicações científicas em oncologia nos últimos 10 anos: um estudo de análise de redes e processamento de linguagem natural / Overview of the quality of life theme in scientific publications in oncology in the last 10 years: a network analysis and natural language processing study

Brazilian Journal of Health Review ◽

10.34119/bjhrv4n2-415 ◽

2021 ◽

Vol 4 (2) ◽

pp. 9103-9731

Author(s):

Bruno Santos Wance De Souza ◽

Diego Luis Pereira De Oliveira ◽

Hugo Azevedo Bergamaschi ◽

André Luiz Lopes De Azevedo ◽

Lucas de Jesus Matias ◽

...

Keyword(s):

Quality Of Life ◽

Natural Language Processing ◽

Network Analysis ◽

Natural Language ◽

Language Processing ◽

Scientific Publications

Download Full-text

Natural Language Processing and Enhanced Clinical Decision Making Radiology and VINCI

PsycEXTRA Dataset ◽

10.1037/e615572012-015 ◽

2012 ◽

Author(s):

Eliot Siegel

Keyword(s):

Decision Making ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Decision Making ◽

Clinical Decision

Download Full-text

Natural Language Processing in the Clinical Setting

PsycEXTRA Dataset ◽

10.1037/e615572012-013 ◽

2012 ◽

Author(s):

Thomas H. Payne

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Clinical Setting

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text