Semantic-Based Indexing Approaches for Medical Document Clustering Using Cognitive Search

Author(s):  
Logeswari Shanmugam ◽  
Premalatha K.

Biomedical literature is the primary repository of biomedical knowledge in which PubMed is the most absolute database for collecting, organizing and analyzing textual knowledge. The high dimensionality of the natural language text makes the text data quite noisy and sparse in the vector space. Hence, the data preprocessing and feature selection are important processes for the text processing issues. Ontologies select the meaningful terms semantically associated with the concepts from a document to reduce the dimensionality of the original text. In this chapter, semantic-based indexing approaches are proposed with cognitive search which makes use of domain ontology to extract relevant information from big and diverse data sets for users.

2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Lejun Gong ◽  
Xingxing Zhang ◽  
Tianyin Chen ◽  
Li Zhang

Disease relevant entities are an important task in mining unstructured text data from the biomedical literature for achieving biomedical knowledge. Autism spectrum disorder (ASD) is a disease related to a neurological and developmental disorder characterized by deficits in communication and social interaction and by repetitive behaviour. However, this kind of disease remains unclear to date. In this study, it identifies entities associated with disease using the machine learning of a computational way from text data collection for molecular mechanisms related to ASD. Entities related to disease are extracted from the biomedical literature related to autism by using deep learning with bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) model. Compared other previous works, the approach is promising for identifying entities related to disease. The proposed approach including five types of molecular entities is evaluated by GENIA corpus to obtain an F-score of 76.81%. The work has extracted 9146 proteins, 145 RNAs, 7680 DNAs, 1058 cell-types, and 981 cell-lines from the autism biomedical literature after removing repeated molecular entities. Finally, we perform GO and KEGG analyses of the test dataset. This study could serve as a reference for further studies on the etiology of disease on the basis of molecular mechanisms and provide a way to explore disease genetic information.


Author(s):  
Poonam Jatwani ◽  
Pradeep Tomar ◽  
Vandana Dhingra

Web documents display information in the form of natural language text which is not understandable by machines. To search specific information from sea of web documents has become very challenging as it shows many unwanted non relevant documents along with relevant documents. To retrieve relevant information semantic knowledge can be stored in the domain specific ontology which helps in understanding user’s need to retrieve relevant information. Intensive research has been going on in the field of text processing to develop ontologies using NLP technique. The proposed technique is another effort in this direction. In this method to extract syntactic structure we have used Stanford parser which complete tokenization of text, parsing as well as morphological analysis. Semantic rules are defined manually to identify valid concepts and relation among them. Once concepts, properties and relationship among concepts are identified, extracted information is visualized in the form of ontology.


2018 ◽  
Author(s):  
Albert Moreira ◽  
Raul Alonso-Calvo ◽  
Alberto Muñoz ◽  
Jose Crespo

BACKGROUND Internet and Social media is an enormous source of information. Health Social Networks and online collaborative environments enable users to create shared content that afterwards can be discussed. While social media discussions for health related matters constitute a potential source of knowledge, characterizing the relevance of participations from different users is a challenging task. OBJECTIVE The aim of this paper is to present a methodology designed for quantifying relevant information provided by different participants in clinical online discussions. METHODS A set of key indicators for different aspects of clinical conversations and specific clinical contributions within a discussion have been defined. These indicators make use of biomedical knowledge extraction based on standard terminologies and ontologies. These indicators allow measuring the relevance of information of each participant of the clinical conversation. RESULTS Proposed indicators have been applied to two discussions extracted from PatientsLikeMe, as well as to two real clinical cases from the Sanar collaborative discussion system. Results obtained from indicators in the tested cases have been compared with clinical expert opinions to check indicators validity. CONCLUSIONS The methodology has been successfully used for describing participant interactions in real clinical cases belonging to a collaborative clinical case discussion tool and from a conversation from a Health Social Network.


Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Valerio Arnaboldi ◽  
Jaehyoung Cho ◽  
Paul W Sternberg

Abstract Finding relevant information from newly published scientific papers is becoming increasingly difficult due to the pace at which articles are published every year as well as the increasing amount of information per paper. Biocuration and model organism databases provide a map for researchers to navigate through the complex structure of the biomedical literature by distilling knowledge into curated and standardized information. In addition, scientific search engines such as PubMed and text-mining tools such as Textpresso allow researchers to easily search for specific biological aspects from newly published papers, facilitating knowledge transfer. However, digesting the information returned by these systems—often a large number of documents—still requires considerable effort. In this paper, we present Wormicloud, a new tool that summarizes scientific articles in a graphical way through word clouds. This tool is aimed at facilitating the discovery of new experimental results not yet curated by model organism databases and is designed for both researchers and biocurators. Wormicloud is customized for the Caenorhabditis  elegans literature and provides several advantages over existing solutions, including being able to perform full-text searches through Textpresso, which provides more accurate results than other existing literature search engines. Wormicloud is integrated through direct links from gene interaction pages in WormBase. Additionally, it allows analysis on the gene sets obtained from literature searches with other WormBase tools such as SimpleMine and Gene Set Enrichment. Database URL: https://wormicloud.textpressolab.com


2011 ◽  
Vol 84 (8) ◽  
Author(s):  
Tracy Holsclaw ◽  
Ujjaini Alam ◽  
Bruno Sansó ◽  
Herbie Lee ◽  
Katrin Heitmann ◽  
...  

1993 ◽  
Vol 72 (2) ◽  
pp. 517-518 ◽  
Author(s):  
Philip Langer ◽  
Verne Keenan

Research on the effects of sentence-order feedback on text processing has shown that agreement between the order of original text and either (1) the order of reconstructed text or (2) recall of text does not influence amount of recall. Students' processing of text is a function of too many uncertain variables to permit endorsements of simple association models of instructional assistance.


2017 ◽  
Vol 6 (2) ◽  
pp. 12
Author(s):  
Abhith Pallegar

The objective of the paper is to elucidate how interconnected biological systems can be better mapped and understood using the rapidly growing area of Big Data. We can harness network efficiencies by analyzing diverse medical data and probe how we can effectively lower the economic cost of finding cures for rare diseases. Most rare diseases are due to genetic abnormalities, many forms of cancers develop due to genetic mutations. Finding cures for rare diseases requires us to understand the biology and biological processes of the human body. In this paper, we explore what the historical shift of focus from pharmacology to biotechnology means for accelerating biomedical solutions. With biotechnology playing a leading role in the field of medical research, we explore how network efficiencies can be harnessed by strengthening the existing knowledge base. Studying rare or orphan diseases provides rich observable statistical data that can be leveraged for finding solutions. Network effects can be squeezed from working with diverse data sets that enables us to generate the highest quality medical knowledge with the fewest resources. This paper examines gene manipulation technologies like Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) that can prevent diseases of genetic variety. We further explore the role of the emerging field of Big Data in analyzing large quantities of medical data with the rapid growth of computing power and some of the network efficiencies gained from this endeavor. 


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 146 ◽  
Author(s):  
Guanming Wu ◽  
Eric Dawson ◽  
Adrian Duong ◽  
Robin Haw ◽  
Lincoln Stein

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.


2019 ◽  
Vol 18 (2) ◽  
pp. 390-415
Author(s):  
Andrei Vorobev ◽  
Gulnara Vorobeva ◽  
Nafisa Yusupova

. As is known, today the problem of geomagnetic field and its variations parameters monitoring is solved mainly by a network of magnetic observatories and variational stations, but a significant obstacle in the processing and analysis of the data thus obtained, along with their spatial anisotropy, are omissions or reliable inconsistency with the established format. Heterogeneity and anomalousness of the data excludes (significantly complicates) the possibility of their automatic integration and the application of frequency analysis tools to them. Known solutions for the integration of heterogeneous geomagnetic data are mainly based on the consolidation model and only partially solve the problem. The resulting data sets, as a rule, do not meet the requirements for real-time information systems, may include outliers, and omissions in the time series of geomagnetic data are eliminated by excluding missing or anomalous values from the final sample, which can obviously lead to both to the loss of relevant information, violation of the discretization step, and to heterogeneity of the time series. The paper proposes an approach to creating an integrated space of geomagnetic data based on a combination of consolidation and federalization models, including preliminary processing of the original time series with an optionally available procedure for their recovery and verification, focused on the use of cloud computing technologies and hierarchical format and processing speed of large amounts of data and, as a result, providing users with better and more homogeneous data.


2020 ◽  
Author(s):  
Annika Tjuka ◽  
Robert Forkel ◽  
Johann-Mattis List

Psychologists and linguists have collected a great diversity of data for word and concept properties. In psychology, many studies accumulate norms and ratings such as word frequencies or age-of-acquisition often for a large number of words. Linguistics, on the other hand, provides valuable insights into relations of word meanings. We present a collection of those data sets for norms, ratings, and relations that cover different languages: ‘NoRaRe.’ To enable a comparison between the diverse data types, we established workflows that facilitate the expansion of the database. A web application allows convenient access to the data (https://digling.org/norare/). Furthermore, a software API ensures consistent data curation by providing tests to validate the data sets. The NoRaRe collection is linked to the database curated by the Concepticon project (https://concepticon.clld.org) which offers a reference catalog of unified concept sets. The link between words in the data sets and the Concepticon concept sets makes a cross-linguistic comparison possible. In three case studies, we test the validity of our approach, the accuracy of our workflow, and the applicability of our database. The results indicate that the NoRaRe database can be applied for the study of word properties across multiple languages. The data can be used by psychologists and linguists to benefit from the knowledge rooted in both research disciplines.


Sign in / Sign up

Export Citation Format

Share Document