Semantic-Based Indexing Approaches for Medical Document Clustering Using Cognitive Search

Advances in Social Networking and Online Communities - Cognitive Social Mining Applications in Data Analytics and Forensics ◽

10.4018/978-1-5225-7522-1.ch003 ◽

2019 ◽

pp. 41-64

Author(s):

Logeswari Shanmugam ◽

Premalatha K.

Keyword(s):

Text Processing ◽

Relevant Information ◽

Biomedical Literature ◽

Data Sets ◽

Original Text ◽

Biomedical Knowledge ◽

Text Data ◽

Natural Language Text ◽

Diverse Data ◽

Medical Document

Biomedical literature is the primary repository of biomedical knowledge in which PubMed is the most absolute database for collecting, organizing and analyzing textual knowledge. The high dimensionality of the natural language text makes the text data quite noisy and sparse in the vector space. Hence, the data preprocessing and feature selection are important processes for the text processing issues. Ontologies select the meaningful terms semantically associated with the concepts from a document to reduce the dimensionality of the original text. In this chapter, semantic-based indexing approaches are proposed with cognitive search which makes use of domain ontology to extract relevant information from big and diverse data sets for users.

Download Full-text

Recognition of Disease Genetic Information from Unstructured Text Data Based on BiLSTM-CRF for Molecular Mechanisms

Security and Communication Networks ◽

10.1155/2021/6635027 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Lejun Gong ◽

Xingxing Zhang ◽

Tianyin Chen ◽

Li Zhang

Keyword(s):

Genetic Information ◽

Molecular Mechanisms ◽

Short Term Memory ◽

Conditional Random Field ◽

Autism Spectrum ◽

Biomedical Literature ◽

Repetitive Behaviour ◽

Biomedical Knowledge ◽

Text Data ◽

Unstructured Text

Disease relevant entities are an important task in mining unstructured text data from the biomedical literature for achieving biomedical knowledge. Autism spectrum disorder (ASD) is a disease related to a neurological and developmental disorder characterized by deficits in communication and social interaction and by repetitive behaviour. However, this kind of disease remains unclear to date. In this study, it identifies entities associated with disease using the machine learning of a computational way from text data collection for molecular mechanisms related to ASD. Entities related to disease are extracted from the biomedical literature related to autism by using deep learning with bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) model. Compared other previous works, the approach is promising for identifying entities related to disease. The proposed approach including five types of molecular entities is evaluated by GENIA corpus to obtain an F-score of 76.81%. The work has extracted 9146 proteins, 145 RNAs, 7680 DNAs, 1058 cell-types, and 981 cell-lines from the autism biomedical literature after removing repeated molecular entities. Finally, we perform GO and KEGG analyses of the test dataset. This study could serve as a reference for further studies on the etiology of disease on the basis of molecular mechanisms and provide a way to explore disease genetic information.

Download Full-text

Design of Relation Extraction Framework to develop Knowledge Base

International Journal of Sensors Wireless Communications and Control ◽

10.2174/2210327909666190408125818 ◽

2019 ◽

Vol 09 ◽

Author(s):

Poonam Jatwani ◽

Pradeep Tomar ◽

Vandana Dhingra

Keyword(s):

Text Processing ◽

Syntactic Structure ◽

Relation Extraction ◽

Relevant Information ◽

Semantic Knowledge ◽

Specific Information ◽

Web Documents ◽

Natural Language Text ◽

Domain Specific ◽

Language Text

Web documents display information in the form of natural language text which is not understandable by machines. To search specific information from sea of web documents has become very challenging as it shows many unwanted non relevant documents along with relevant documents. To retrieve relevant information semantic knowledge can be stored in the domain specific ontology which helps in understanding user’s need to retrieve relevant information. Intensive research has been going on in the field of text processing to develop ontologies using NLP technique. The proposed technique is another effort in this direction. In this method to extract syntactic structure we have used Stanford parser which complete tokenization of text, parsing as well as morphological analysis. Semantic rules are defined manually to identify valid concepts and relation among them. Once concepts, properties and relationship among concepts are identified, extracted information is visualized in the form of ontology.

Download Full-text

Measuring Relevant Information in Health Social Network Conversations and Clinical Diagnosis Cases (Preprint)

10.2196/preprints.11475 ◽

2018 ◽

Author(s):

Albert Moreira ◽

Raul Alonso-Calvo ◽

Alberto Muñoz ◽

Jose Crespo

Keyword(s):

Social Media ◽

Social Network ◽

Relevant Information ◽

Online Discussions ◽

Biomedical Knowledge ◽

Expert Opinions ◽

Health Related ◽

Health Social Network ◽

Discussion System ◽

Clinical Cases

BACKGROUND Internet and Social media is an enormous source of information. Health Social Networks and online collaborative environments enable users to create shared content that afterwards can be discussed. While social media discussions for health related matters constitute a potential source of knowledge, characterizing the relevance of participations from different users is a challenging task. OBJECTIVE The aim of this paper is to present a methodology designed for quantifying relevant information provided by different participants in clinical online discussions. METHODS A set of key indicators for different aspects of clinical conversations and specific clinical contributions within a discussion have been defined. These indicators make use of biomedical knowledge extraction based on standard terminologies and ontologies. These indicators allow measuring the relevance of information of each participant of the clinical conversation. RESULTS Proposed indicators have been applied to two discussions extracted from PatientsLikeMe, as well as to two real clinical cases from the Sanar collaborative discussion system. Results obtained from indicators in the tested cases have been compared with clinical expert opinions to check indicators validity. CONCLUSIONS The methodology has been successfully used for describing participant interactions in real clinical cases belonging to a collaborative clinical case discussion tool and from a conversation from a Health Social Network.

Download Full-text

Wormicloud: a new text summarization tool based on word clouds to explore the C. elegans literature

Database ◽

10.1093/database/baab015 ◽

2021 ◽

Vol 2021 ◽

Author(s):

Valerio Arnaboldi ◽

Jaehyoung Cho ◽

Paul W Sternberg

Keyword(s):

Search Engines ◽

Complex Structure ◽

Model Organism ◽

Gene Interaction ◽

Relevant Information ◽

Biomedical Literature ◽

C Elegans ◽

Word Clouds ◽

Scientific Papers ◽

Model Organism Databases

Abstract Finding relevant information from newly published scientific papers is becoming increasingly difficult due to the pace at which articles are published every year as well as the increasing amount of information per paper. Biocuration and model organism databases provide a map for researchers to navigate through the complex structure of the biomedical literature by distilling knowledge into curated and standardized information. In addition, scientific search engines such as PubMed and text-mining tools such as Textpresso allow researchers to easily search for specific biological aspects from newly published papers, facilitating knowledge transfer. However, digesting the information returned by these systems—often a large number of documents—still requires considerable effort. In this paper, we present Wormicloud, a new tool that summarizes scientific articles in a graphical way through word clouds. This tool is aimed at facilitating the discovery of new experimental results not yet curated by model organism databases and is designed for both researchers and biocurators. Wormicloud is customized for the Caenorhabditis elegans literature and provides several advantages over existing solutions, including being able to perform full-text searches through Textpresso, which provides more accurate results than other existing literature search engines. Wormicloud is integrated through direct links from gene interaction pages in WormBase. Additionally, it allows analysis on the gene sets obtained from literature searches with other WormBase tools such as SimpleMine and Gene Set Enrichment. Database URL: https://wormicloud.textpressolab.com

Download Full-text

Nonparametric reconstruction of the dark energy equation of state from diverse data sets

Physical Review D ◽

10.1103/physrevd.84.083501 ◽

2011 ◽

Vol 84 (8) ◽

Cited By ~ 43

Author(s):

Tracy Holsclaw ◽

Ujjaini Alam ◽

Bruno Sansó ◽

Herbie Lee ◽

Katrin Heitmann ◽

...

Keyword(s):

Dark Energy ◽

Equation Of State ◽

Energy Equation ◽

Data Sets ◽

Diverse Data

Download Full-text

Feedback, Concordance, and Text Comprehension

Psychological Reports ◽

10.2466/pr0.1993.72.2.517 ◽

1993 ◽

Vol 72 (2) ◽

pp. 517-518 ◽

Cited By ~ 1

Author(s):

Philip Langer ◽

Verne Keenan

Keyword(s):

Text Comprehension ◽

Text Processing ◽

Original Text ◽

Association Models ◽

Uncertain Variables ◽

Sentence Order

Research on the effects of sentence-order feedback on text processing has shown that agreement between the order of original text and either (1) the order of reconstructed text or (2) recall of text does not influence amount of recall. Students' processing of text is a function of too many uncertain variables to permit endorsements of simple association models of instructional assistance.

Download Full-text

Future of Medical Research in Rare Diseases and Cancers: Shift from Pharma to Biotech and the Golden Age of Medical Advancement

Cancer and Clinical Oncology ◽

10.5539/cco.v6n2p12 ◽

2017 ◽

Vol 6 (2) ◽

pp. 12

Author(s):

Abhith Pallegar

Keyword(s):

Big Data ◽

Medical Research ◽

Rare Diseases ◽

Network Effects ◽

Medical Knowledge ◽

Economic Cost ◽

Medical Data ◽

Data Sets ◽

Leading Role ◽

Diverse Data

The objective of the paper is to elucidate how interconnected biological systems can be better mapped and understood using the rapidly growing area of Big Data. We can harness network efficiencies by analyzing diverse medical data and probe how we can effectively lower the economic cost of finding cures for rare diseases. Most rare diseases are due to genetic abnormalities, many forms of cancers develop due to genetic mutations. Finding cures for rare diseases requires us to understand the biology and biological processes of the human body. In this paper, we explore what the historical shift of focus from pharmacology to biotechnology means for accelerating biomedical solutions. With biotechnology playing a leading role in the field of medical research, we explore how network efficiencies can be harnessed by strengthening the existing knowledge base. Studying rare or orphan diseases provides rich observable statistical data that can be leveraged for finding solutions. Network effects can be squeezed from working with diverse data sets that enables us to generate the highest quality medical knowledge with the fewest resources. This paper examines gene manipulation technologies like Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) that can prevent diseases of genetic variety. We further explore the role of the emerging field of Big Data in analyzing large quantities of medical data with the rapid growth of computing power and some of the network efficiencies gained from this endeavor.

Download Full-text

ReactomeFIViz: the Reactome FI Cytoscape app for pathway and network-based data analysis

F1000Research ◽

10.12688/f1000research.4431.1 ◽

2014 ◽

Vol 3 ◽

pp. 146 ◽

Cited By ~ 2

Author(s):

Guanming Wu ◽

Eric Dawson ◽

Adrian Duong ◽

Robin Haw ◽

Lincoln Stein

Keyword(s):

Experimental Data ◽

Data Analysis ◽

Graphical Models ◽

High Throughput ◽

Interaction Network ◽

Large Data ◽

Relevant Information ◽

Data Sets ◽

Data Types ◽

Biological Studies

High-throughput experiments are routinely performed in modern biological studies. However, extracting meaningful results from massive experimental data sets is a challenging task for biologists. Projecting data onto pathway and network contexts is a powerful way to unravel patterns embedded in seemingly scattered large data sets and assist knowledge discovery related to cancer and other complex diseases. We have developed a Cytoscape app called “ReactomeFIViz”, which utilizes a highly reliable gene functional interaction network and human curated pathways from Reactome and other pathway databases. This app provides a suite of features to assist biologists in performing pathway- and network-based data analysis in a biologically intuitive and user-friendly way. Biologists can use this app to uncover network and pathway patterns related to their studies, search for gene signatures from gene expression data sets, reveal pathways significantly enriched by genes in a list, and integrate multiple genomic data types into a pathway context using probabilistic graphical models. We believe our app will give researchers substantial power to analyze intrinsically noisy high-throughput experimental data to find biologically relevant information.

Download Full-text

Conception of Geomagnetic Data Integrated Space

SPIIRAS Proceedings ◽

10.15622/sp.18.2.390-415 ◽

2019 ◽

Vol 18 (2) ◽

pp. 390-415

Author(s):

Andrei Vorobev ◽

Gulnara Vorobeva ◽

Nafisa Yusupova

Keyword(s):

Time Series ◽

Relevant Information ◽

Data Sets ◽

Spatial Anisotropy ◽

Automatic Integration ◽

Final Sample ◽

Geomagnetic Data ◽

Consolidation Model ◽

Discretization Step ◽

Original Time

. As is known, today the problem of geomagnetic field and its variations parameters monitoring is solved mainly by a network of magnetic observatories and variational stations, but a significant obstacle in the processing and analysis of the data thus obtained, along with their spatial anisotropy, are omissions or reliable inconsistency with the established format. Heterogeneity and anomalousness of the data excludes (significantly complicates) the possibility of their automatic integration and the application of frequency analysis tools to them. Known solutions for the integration of heterogeneous geomagnetic data are mainly based on the consolidation model and only partially solve the problem. The resulting data sets, as a rule, do not meet the requirements for real-time information systems, may include outliers, and omissions in the time series of geomagnetic data are eliminated by excluding missing or anomalous values from the final sample, which can obviously lead to both to the loss of relevant information, violation of the discretization step, and to heterogeneity of the time series. The paper proposes an approach to creating an integrated space of geomagnetic data based on a combination of consolidation and federalization models, including preliminary processing of the original time series with an optionally available procedure for their recovery and verification, focused on the use of cloud computing technologies and hierarchical format and processing speed of large amounts of data and, as a result, providing users with better and more homogeneous data.

Download Full-text

Linking Norms, Ratings, and Relations of Words and Concepts Across Multiple Language Varieties

10.31234/osf.io/tgw3z ◽

2020 ◽

Author(s):

Annika Tjuka ◽

Robert Forkel ◽

Johann-Mattis List

Keyword(s):

Web Application ◽

Age Of Acquisition ◽

Data Curation ◽

Data Sets ◽

Data Types ◽

Word Meanings ◽

Language Varieties ◽

Diverse Data ◽

Multiple Languages ◽

Word Frequencies

Psychologists and linguists have collected a great diversity of data for word and concept properties. In psychology, many studies accumulate norms and ratings such as word frequencies or age-of-acquisition often for a large number of words. Linguistics, on the other hand, provides valuable insights into relations of word meanings. We present a collection of those data sets for norms, ratings, and relations that cover different languages: ‘NoRaRe.’ To enable a comparison between the diverse data types, we established workflows that facilitate the expansion of the database. A web application allows convenient access to the data (https://digling.org/norare/). Furthermore, a software API ensures consistent data curation by providing tests to validate the data sets. The NoRaRe collection is linked to the database curated by the Concepticon project (https://concepticon.clld.org) which offers a reference catalog of unified concept sets. The link between words in the data sets and the Concepticon concept sets makes a cross-linguistic comparison possible. In three case studies, we test the validity of our approach, the accuracy of our workflow, and the applicability of our database. The results indicate that the NoRaRe database can be applied for the study of word properties across multiple languages. The data can be used by psychologists and linguists to benefit from the knowledge rooted in both research disciplines.

Download Full-text