scholarly journals DISNET: A framework for extracting phenotypic disease information from public sources

2018 ◽  
Author(s):  
Gerardo Lagunes-García ◽  
Alejandro Rodríguez-González ◽  
Lucía Prieto-Santamaría ◽  
Eduardo P. García del Valle ◽  
Massimiliano Zanin ◽  
...  

AbstractWithin the global endeavour of improving population health, one major challenge is the increasingly high cost associated with drug development. Drug repositioning, i.e. finding new uses for existing drugs, is a promising alternative; yet, its effectiveness has hitherto been hindered by our limited knowledge about diseases and their relationships. In this paper, we present DISNET (disnet.ctb.upm.es), a web-based system designed to extract knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia, PubMed and Mayo Clinic; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. We further present a validation of the processing performed by the system; and describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses.

PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8580 ◽  
Author(s):  
Gerardo Lagunes-García ◽  
Alejandro Rodríguez-González ◽  
Lucía Prieto-Santamaría ◽  
Eduardo P. García del Valle ◽  
Massimiliano Zanin ◽  
...  

Background Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. Methods We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. Results We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output includes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system’s API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. Discussion DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system’s reliability.


2011 ◽  
pp. 503-521
Author(s):  
Flavius Frasincar ◽  
Jethro Borsje ◽  
Leonard Levering

This article proposes Hermes, a Semantic Webbased framework for building personalized news services. It makes use of ontologies for knowledge representation, natural language processing techniques for semantic text analysis, and semantic query languages for specifying wanted information. Hermes is supported by an implementation of the framework, the Hermes News Portal, a tool which allows users to have a personalized online access to news items. The Hermes framework and its associated implementation aim at advancing the state-of-the-art of semantic approaches for personalized news services by employing Semantic Web standards, exploiting domain information, using a word sense disambiguation procedure, and being able to express temporal constraints for the desired news items.


2019 ◽  
Vol 28 (02) ◽  
pp. 1950008
Author(s):  
Aleš Horák ◽  
Vít Baisa ◽  
Adam Rambousek ◽  
Vít Suchomel

This paper describes a new system for semi-automatically building, extending and managing a terminological thesaurus — a multilingual terminology dictionary enriched with relationships between the terms themselves to form a thesaurus. The system allows to radically enhance the workow of current terminology expert groups, where most of the editing decisions still come from introspection. The presented system supplements the lexicographic process with natural language processing techniques, which are seamlessly integrated to the thesaurus editing environment. The system’s methodology and the resulting thesaurus are closely connected to new domain corpora in the six languages involved. They are used for term usage examples as well as for the automatic extraction of new candidate terms. The terminological thesaurus is now accessible via a web-based application, which (a) presents rich detailed information on each term, (b) visualizes term relations, and (c) displays real-life usage examples of the term in the domain-related documents and in the context-based similar terms. Furthermore, the specialized corpora are used to detect candidate translations of terms from the central language (Czech) to the other languages (English, French, German, Russian and Slovak) as well as to detect broader Czech terms, which help to place new terms in the actual thesaurus hierarchy. This project has been realized as a terminological thesaurus of land surveying, but the presented tools and methodology are reusable for other terminology domains.


2021 ◽  
Author(s):  
Nathan Ji ◽  
Yu Sun

The digital age gives us access to a multitude of both information and mediums in which we can interpret information. A majority of the time, many people find interpreting such information difficult as the medium may not be as user friendly as possible. This project has examined the inquiry of how one can identify specific information in a given text based on a question. This inquiry is intended to streamline one's ability to determine the relevance of a given text relative to his objective. The project has an overall 80% success rate given 10 articles with three questions asked per article. This success rate indicates that this project is likely applicable to those who are asking for content level questions within an article.


Author(s):  
Ramin Sabbagh ◽  
Farhad Ameri

The descriptions of capabilities of manufacturing companies can be found in multiple locations including company websites, legacy system databases, and ad hoc documents and spreadsheets. The capability descriptions are often represented using natural language. To unlock the value of unstructured capability information and learn from it, there is a need for developing advanced quantitative methods supported by machine learning and natural language processing techniques. This research proposes a multi-step unsupervised learning methodology using K-means clustering and topic modeling techniques in order to build clusters of suppliers based on their capabilities, extract and organize the manufacturing capability terminology, and discover nontrivial patterns in manufacturing capability corpora. The capability data is extracted either directly from the website of manufacturing firms or from their profiles in e-sourcing portals and directories. Feature extraction and dimensionality reduction process in this work in supported by Ngram extraction and Latent Semantic Analysis (LSA) methods. The proposed clustering method is validated experimentally based a dataset composed of 150 capability descriptions collected from web-based sourcing directories such as the Thomas Net directory for manufacturing companies. The results of the experiment show that the proposed method creates supplier cluster with high accuracy.


Sensi Journal ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. 236-246
Author(s):  
Ilamsyah Ilamsyah ◽  
Yulianto Yulianto ◽  
Tri Vita Febriani

The right and appropriate system of receiving and transferring goods is needed by the company. In the process of receiving and transferring goods from the central warehouse to the branch warehouse at PDAM Tirta Kerta Raharja, Tangerang Regency, which is currently done manually is still ineffective and inaccurate because the Head of Subdivision uses receipt documents, namely PPBP and mutation of goods, namely MPPW in the form of paper as a submission media. The Head of Subdivision enters the data of receipt and mutation of goods manually and requires a relatively long time because at the time of demand for the transfer of goods the Head of Subdivision must check the inventory of goods in the central warehouse first. Therefore, it is necessary to hold a design of information systems for the receipt and transfer of goods from the central warehouse to a web-based branch warehouse that is already database so that it is more effective, efficient and accurate. With the web-based system of receiving and transferring goods that are already datatabed, it can facilitate the Head of Subdivision in inputing data on the receipt and transfer of goods and control of stock inventory so that the Sub Head of Subdivision can do it periodically to make it more effective, efficient and accurate. The method of data collection is done by observing, interviewing and studying literature from various previous studies, while the system analysis method uses the Waterfall method which aims to solve a problem and uses design methods with visual modeling that is object oriented with UML while programming using PHP and MySQL as a database.


2018 ◽  
Vol 23 (3) ◽  
pp. 175-191
Author(s):  
Anneke Annassia Putri Siswadi ◽  
Avinanta Tarigan

To fulfill the prospective student's information need about student admission, Gunadarma University has already many kinds of services which are time limited, such as website, book, registration place, Media Information Center, and Question Answering’s website (UG-Pedia). It needs a service that can serve them anytime and anywhere. Therefore, this research is developing the UGLeo as a web based QA intelligence chatbot application for Gunadarma University's student admission portal. UGLeo is developed by MegaHal style which implements the Markov Chain method. In this research, there are some modifications in MegaHal style, those modifications are the structure of natural language processing and the structure of database. The accuracy of UGLeo reply is 65%. However, to increase the accuracy there are some improvements to be applied in UGLeo system, both improvement in natural language processing and improvement in MegaHal style.


Information ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 204
Author(s):  
Charlyn Villavicencio ◽  
Julio Jerison Macrohon ◽  
X. Alphonse Inbaraj ◽  
Jyh-Horng Jeng ◽  
Jer-Guang Hsieh

A year into the COVID-19 pandemic and one of the longest recorded lockdowns in the world, the Philippines received its first delivery of COVID-19 vaccines on 1 March 2021 through WHO’s COVAX initiative. A month into inoculation of all frontline health professionals and other priority groups, the authors of this study gathered data on the sentiment of Filipinos regarding the Philippine government’s efforts using the social networking site Twitter. Natural language processing techniques were applied to understand the general sentiment, which can help the government in analyzing their response. The sentiments were annotated and trained using the Naïve Bayes model to classify English and Filipino language tweets into positive, neutral, and negative polarities through the RapidMiner data science software. The results yielded an 81.77% accuracy, which outweighs the accuracy of recent sentiment analysis studies using Twitter data from the Philippines.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 664
Author(s):  
Nikos Kanakaris ◽  
Nikolaos Giarelis ◽  
Ilias Siachos ◽  
Nikos Karacapilidis

We consider the prediction of future research collaborations as a link prediction problem applied on a scientific knowledge graph. To the best of our knowledge, this is the first work on the prediction of future research collaborations that combines structural and textual information of a scientific knowledge graph through a purposeful integration of graph algorithms and natural language processing techniques. Our work: (i) investigates whether the integration of unstructured textual data into a single knowledge graph affects the performance of a link prediction model, (ii) studies the effect of previously proposed graph kernels based approaches on the performance of an ML model, as far as the link prediction problem is concerned, and (iii) proposes a three-phase pipeline that enables the exploitation of structural and textual information, as well as of pre-trained word embeddings. We benchmark the proposed approach against classical link prediction algorithms using accuracy, recall, and precision as our performance metrics. Finally, we empirically test our approach through various feature combinations with respect to the link prediction problem. Our experimentations with the new COVID-19 Open Research Dataset demonstrate a significant improvement of the abovementioned performance metrics in the prediction of future research collaborations.


Sign in / Sign up

Export Citation Format

Share Document