Mining Inter-Relationships in Online Scientific Articles and its Visualization: Natural Language Processing for Systems Biology Modeling

<strong>With the rapid growth in the numbers of scientific publications in domains such as neuroscience and medicine, visually interlinking documents in online databases such as PubMed with the purpose of indicating the context of a query results can improve the multi-disciplinary relevance of the search results. Translational medicine and systems biology rely on studies relating basic sciences to applications, often going through multiple disciplinary domains. This paper focuses on the design and development of a new scientific document visualization platform, which allows inferring translational aspects in biosciences within published articles using machine learning and natural language processing (NLP) methods. From online databases, this software platform effectively extracted relationship connections between multiple sub-domains within neuroscience derived from abstracts related to user query. In our current implementation, the document visualization platform employs two clustering algorithms namely Suffix Tree Clustering (STC) and LINGO. Clustering quality was improved by mapping top-ranked cluster labels derived from an UMLS-Metathesaurus using a scoring function. To avoid non-clustered documents, an iterative scheme, called auto-clustering was developed and this allowed mapping previously uncategorized documents during the initial grouping process to relevant clusters. The efficacy of this document clustering and visualization platform was evaluated by expert-based validation of clustering results obtained with unique search terms. Compared to normal clustering, auto-clustering demonstrated better efficacy by generating larger numbers of unique and relevant cluster labels. Using this implementation, a Parkinson’s disease systems theory model was developed and studies based on user queries related to neuroscience and oncology have been showcased as applications.</strong>

Download Full-text

Natural language processing methods for knowledge management—Applying document clustering for fast search and grouping of engineering documents

Concurrent Engineering ◽

10.1177/1063293x20982973 ◽

2021 ◽

pp. 1063293X2098297

Author(s):

Ivar Örn Arnarsson ◽

Otto Frost ◽

Emil Gustavsson ◽

Mats Jirstrand ◽

Johan Malmqvist

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

Clustering Algorithms ◽

Document Clustering ◽

Unstructured Data ◽

Free Text ◽

Engineering Change ◽

Engineering Documents

Product development companies collect data in form of Engineering Change Requests for logged design issues, tests, and product iterations. These documents are rich in unstructured data (e.g. free text). Previous research affirms that product developers find that current IT systems lack capabilities to accurately retrieve relevant documents with unstructured data. In this research, we demonstrate a method using Natural Language Processing and document clustering algorithms to find structurally or contextually related documents from databases containing Engineering Change Request documents. The aim is to radically decrease the time needed to effectively search for related engineering documents, organize search results, and create labeled clusters from these documents by utilizing Natural Language Processing algorithms. A domain knowledge expert at the case company evaluated the results and confirmed that the algorithms we applied managed to find relevant document clusters given the queries tested.

Download Full-text

A Natural Language Processing System for Extracting Evidence of Drug Repurposing from Scientific Publications

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i08.7052 ◽

2020 ◽

Vol 34 (08) ◽

pp. 13369-13381

Author(s):

Shivashankar Subramanian ◽

Ioana Baldini ◽

Sushma Ravichandran ◽

Dmitriy A. Katz-Rogozhnikov ◽

Karthikeyan Natesan Ramamurthy ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Generic Drugs ◽

Low Cost ◽

Processing System ◽

Drug Repurposing ◽

Cancer Type ◽

Entity Extraction ◽

Scientific Publications

More than 200 generic drugs approved by the U.S. Food and Drug Administration for non-cancer indications have shown promise for treating cancer. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents a major opportunity to rapidly improve outcomes for cancer patients and reduce healthcare costs. In many cases, there is already evidence of efficacy for cancer, but trying to manually extract such evidence from the scientific literature is intractable. In this emerging applications paper, we introduce a system to automate non-cancer generic drug evidence extraction from PubMed abstracts. Our primary contribution is to define the natural language processing pipeline required to obtain such evidence, comprising the following modules: querying, filtering, cancer type entity extraction, therapeutic association classification, and study type classification. Using the subject matter expertise on our team, we create our own datasets for these specialized domain-specific tasks. We obtain promising performance in each of the modules by utilizing modern language processing techniques and plan to treat them as baseline approaches for future improvement of individual components.

Download Full-text

Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

Communications in Computer and Information Science - Theory and Practice of Digital Libraries -- TPDL 2013 Selected Workshops ◽

10.1007/978-3-319-08425-1_3 ◽

2014 ◽

pp. 16-27 ◽

Cited By ~ 2

Author(s):

Michał Łopuszyński ◽

Łukasz Bolikowski

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Scientific Publications

Download Full-text

Digital Archiving by Nigerian and Foreign Authors in a Low Resource Context: A Content Analysis of Publications on Natural Language Processing of Nigerian Languages

Proceedings of the Annual Conference of CAIS / Actes du congrès annuel de l'ACSI ◽

10.29173/cais1175 ◽

2020 ◽

Author(s):

Toluwase Asubiaro

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Web Of Science ◽

Open Science ◽

Digital Archives ◽

Scientific Publications ◽

Digital Archiving ◽

Data Archiving ◽

Computer Codes

This study investigated if there is a difference in the number of articles, datasets and computer codes that foreign and Nigerian authors of scientific publications on natural language processing (NLP) of Nigerian languages deposited in digital archives. Relevant articles were systematically retrieved from Google, Web of Science and Scopus. Authorship type and data archiving information was extracted from the full text of the relevant publications. Result shows that papers with foreign authorship (80.4%) published their articles in non-commercial repositories, more than papers with Nigerian authorship (55.3%). Similarly, few papers with foreign authorship deposited research data (19.1%) and computer codes (10.4%), while none of the papers with Nigerian authorship did. It was recommended that librarians in Nigeria should create awareness on the benefits of digital archiving and open science. Cette étude a eximané les différences dans le nombre d'articles, d'ensembles de données et de codes informatiques dans les articles scientifiques sur le traitement du langage naturel que les auteurs nigériens et les auteurs étrangers ont soumis dans les dépôts d'autoarchivage. Les articles pertinents ont été systématiquement extraits de Google, Web of Science et Scopus. Les informations relatives au type d'auteur et à l'archivage des données ont été extraites du texte intégral des publications pertinentes. Les résultats montrent que les articles écrits par des auteurs étrangers ont davantage publié leurs articles dans des dépôts non commerciaux (80,4%) que les auteurs nigériens (55,3%). Peu d'auteurs étrangers ont déposé des données de recherche (19,1%) et des codes informatiques (10,4%) tandis qu'aucun auteur nigérien ne l'a fait. Ces résultats démontrent l'importance de la sensibilisation aux avantages des dépôt d'archivage et de la science ouverte pour les bibliothécaires nigériens.

Download Full-text