scholarly journals eGARD: Extracting associations between genomic anomalies and drug responses from text

2017 ◽  
Author(s):  
A. S. M. Ashique Mahmood ◽  
Shruti Rao ◽  
Peter McGarvey ◽  
Cathy Wu ◽  
Subha Madhavan ◽  
...  

AbstractTumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for ‘best-fit’ therapies and readily generate hypotheses for new clinical trials.

2015 ◽  
Vol 33 (28_suppl) ◽  
pp. 82-82
Author(s):  
Sangeeta Aggarwal ◽  
Mingfeng Liu ◽  
Rishi Sharma ◽  
Fred Yang ◽  
Ankit Gupta ◽  
...  

82 Background: Patients undergoing breast cancer treatment often report Symptoms of Cognitive Deficit (SCD). Many of them share their experiences on online forums, which contain millions of freely shared messages that can be used to analyze these SCD. Unfortunately, this data is unstructured, making it difficult to analyze. In this project we organize this data using methods from Big Data Science (BDS) and analyze it by creating a Decision Support System (DSS): an interface that can be used by patients and providers to understand how SCD are associated with specific types – hormonal only (HT), chemo only (CT), or both (CT/HT) – of breast cancer therapies. Methods: We collected 3.5 million unique messages from 20 unrestricted breast cancer forums that provide clinically relevant information. We next built custom ontologies for breast cancer treatments, SCD, and supportive therapies. Then, we created a DSS using methods from BDS, including topic modeling, information retrieval, and natural language processing to extract the relevant data from these messages. We also used token windows and co-occurrence-based algorithms to associate treatment with SCD and supportive therapies. To use this system, a user provides disease-related parameters and the treatment. The DSS then gives the percentage of messages discussing SCD for a similar cohort of patients and the percentage of messages that discuss supportive therapies for each of these SCD. Results: We found 15719 messages that had strong association of SCD with treatments. 3355 messages were from HT patients, 5740 messages were from CT patients, and 9095 messages were from CT/HT patients. Among HT, 28.18% patients taking aromatase inhibitors and 19.20% taking tamoxifen associated SCD to HT. Among CT, 35.26% patient receiving taxane containing chemo associated SCD to CT. SCD worsened during HT for CT/HT patients. Suggestive therapy: 80 messages found Vitamin B12 and B6 useful, 65 suggested Acetyle-L-Carnitine, and 50 suggested playing word games. Conclusions: Using methods from BDS, our DSS reliably associates SCD with HT, CT and CT/CT, and suggests supportive therapies. More research is needed to evaluate the role of supportive therapy for SCD.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Wahiba Ben Abdessalem Karaa ◽  
Eman H. Alkhammash ◽  
Aida Bchir

Extracting the relations between medical concepts is very valuable in the medical domain. Scientists need to extract relevant information and semantic relations between medical concepts, including protein and protein, gene and protein, drug and drug, and drug and disease. These relations can be extracted from biomedical literature available on various databases. This study examines the extraction of semantic relations that can occur between diseases and drugs. Findings will help specialists make good decisions when administering a medication to a patient and will allow them to continuously be up to date in their field. The objective of this work is to identify different features related to drugs and diseases from medical texts by applying Natural Language Processing (NLP) techniques and UMLS ontology. The Support Vector Machine classifier uses these features to extract valuable semantic relationships among text entities. The contributing factor of this research is the combination of the strength of a suggested NLP technique, which takes advantage of UMLS ontology and enables the extraction of correct and adequate features (frequency features, lexical features, morphological features, syntactic features, and semantic features), and Support Vector Machines with polynomial kernel function. These features are manipulated to pinpoint the relations between drug and disease. The proposed approach was evaluated using a standard corpus extracted from MEDLINE. The finding considerably improves the performance and outperforms similar works, especially the f-score for the most important relation “cure,” which is equal to 98.19%. The accuracy percentage is better than those in all the existing works for all the relations.


2016 ◽  
Vol 23 (4) ◽  
pp. 758-765 ◽  
Author(s):  
Safa Fathiamini ◽  
Amber M Johnson ◽  
Jia Zeng ◽  
Alejandro Araya ◽  
Vijaykumar Holla ◽  
...  

Abstract Introduction Genomic profiling information is frequently available to oncologists, enabling targeted cancer therapy. Because clinically relevant information is rapidly emerging in the literature and elsewhere, there is a need for informatics technologies to support targeted therapies. To this end, we have developed a system for Automated Identification of Molecular Effects of Drugs, to help biomedical scientists curate this literature to facilitate decision support. Objectives To create an automated system to identify assertions in the literature concerning drugs targeting genes with therapeutic implications and characterize the challenges inherent in automating this process in rapidly evolving domains. Methods We used subject-predicate-object triples (semantic predications) and co-occurrence relations generated by applying the SemRep Natural Language Processing system to MEDLINE abstracts and ClinicalTrials.gov descriptions. We applied customized semantic queries to find drugs targeting genes of interest. The results were manually reviewed by a team of experts. Results Compared to a manually curated set of relationships, recall, precision, and F2 were 0.39, 0.21, and 0.33, respectively, which represents a 3- to 4-fold improvement over a publically available set of predications (SemMedDB) alone. Upon review of ostensibly false positive results, 26% were considered relevant additions to the reference set, and an additional 61% were considered to be relevant for review. Adding co-occurrence data improved results for drugs in early development, but not their better-established counterparts. Conclusions Precision medicine poses unique challenges for biomedical informatics systems that help domain experts find answers to their research questions. Further research is required to improve the performance of such systems, particularly for drugs in development.


Author(s):  
Mario Jojoa Acosta ◽  
Gema Castillo-Sánchez ◽  
Begonya Garcia-Zapirain ◽  
Isabel de la Torre Díez ◽  
Manuel Franco-Martín

The use of artificial intelligence in health care has grown quickly. In this sense, we present our work related to the application of Natural Language Processing techniques, as a tool to analyze the sentiment perception of users who answered two questions from the CSQ-8 questionnaires with raw Spanish free-text. Their responses are related to mindfulness, which is a novel technique used to control stress and anxiety caused by different factors in daily life. As such, we proposed an online course where this method was applied in order to improve the quality of life of health care professionals in COVID 19 pandemic times. We also carried out an evaluation of the satisfaction level of the participants involved, with a view to establishing strategies to improve future experiences. To automatically perform this task, we used Natural Language Processing (NLP) models such as swivel embedding, neural networks, and transfer learning, so as to classify the inputs into the following three categories: negative, neutral, and positive. Due to the limited amount of data available—86 registers for the first and 68 for the second—transfer learning techniques were required. The length of the text had no limit from the user’s standpoint, and our approach attained a maximum accuracy of 93.02% and 90.53%, respectively, based on ground truth labeled by three experts. Finally, we proposed a complementary analysis, using computer graphic text representation based on word frequency, to help researchers identify relevant information about the opinions with an objective approach to sentiment. The main conclusion drawn from this work is that the application of NLP techniques in small amounts of data using transfer learning is able to obtain enough accuracy in sentiment analysis and text classification stages.


Database ◽  
2021 ◽  
Vol 2021 ◽  
Author(s):  
Valerio Arnaboldi ◽  
Jaehyoung Cho ◽  
Paul W Sternberg

Abstract Finding relevant information from newly published scientific papers is becoming increasingly difficult due to the pace at which articles are published every year as well as the increasing amount of information per paper. Biocuration and model organism databases provide a map for researchers to navigate through the complex structure of the biomedical literature by distilling knowledge into curated and standardized information. In addition, scientific search engines such as PubMed and text-mining tools such as Textpresso allow researchers to easily search for specific biological aspects from newly published papers, facilitating knowledge transfer. However, digesting the information returned by these systems—often a large number of documents—still requires considerable effort. In this paper, we present Wormicloud, a new tool that summarizes scientific articles in a graphical way through word clouds. This tool is aimed at facilitating the discovery of new experimental results not yet curated by model organism databases and is designed for both researchers and biocurators. Wormicloud is customized for the Caenorhabditis  elegans literature and provides several advantages over existing solutions, including being able to perform full-text searches through Textpresso, which provides more accurate results than other existing literature search engines. Wormicloud is integrated through direct links from gene interaction pages in WormBase. Additionally, it allows analysis on the gene sets obtained from literature searches with other WormBase tools such as SimpleMine and Gene Set Enrichment. Database URL: https://wormicloud.textpressolab.com


2014 ◽  
Vol 40 (2) ◽  
pp. 469-510 ◽  
Author(s):  
Khaled Shaalan

As more and more Arabic textual information becomes available through the Web in homes and businesses, via Internet and Intranet services, there is an urgent need for technologies and tools to process the relevant information. Named Entity Recognition (NER) is an Information Extraction task that has become an integral part of many other Natural Language Processing (NLP) tasks, such as Machine Translation and Information Retrieval. Arabic NER has begun to receive attention in recent years. The characteristics and peculiarities of Arabic, a member of the Semitic languages family, make dealing with NER a challenge. The performance of an Arabic NER component affects the overall performance of the NLP system in a positive manner. This article attempts to describe and detail the recent increase in interest and progress made in Arabic NER research. The importance of the NER task is demonstrated, the main characteristics of the Arabic language are highlighted, and the aspects of standardization in annotating named entities are illustrated. Moreover, the different Arabic linguistic resources are presented and the approaches used in Arabic NER field are explained. The features of common tools used in Arabic NER are described, and standard evaluation metrics are illustrated. In addition, a review of the state of the art of Arabic NER research is discussed. Finally, we present our conclusions. Throughout the presentation, illustrative examples are used for clarification.


10.29007/nwj8 ◽  
2019 ◽  
Author(s):  
Sebastien Carré ◽  
Victor Dyseryn ◽  
Adrien Facon ◽  
Sylvain Guilley ◽  
Thomas Perianin

Cache timing attacks are serious security threats that exploit cache memories to steal secret information.We believe that the identification of a sequence of operations from a set of cache-timing data measurements is not a trivial step when building an attack. We present a recurrent neural network model able to automatically retrieve a sequence of function calls from cache-timings. Inspired from natural language processing, our model is able to learn on partially labelled data. We use the model to unfold an end-to-end automated attack on OpenSSL ECDSA on the secp256k1 curve. Contrary to most research, we did not need human processing of the traces to retrieve relevant information.


Circulation ◽  
2018 ◽  
Vol 138 (Suppl_2) ◽  
Author(s):  
Lynda Knight ◽  
Todd Sweberg ◽  
Pual Mullan ◽  
Anita Sen ◽  
Matthew Braga ◽  
...  

Background: American Heart Association (AHA) recommends high quality CPR to promote optimal patient outcomes. Few reports compare team members’ perceptions of CPR quality with quantitative CPR data during actual pediatric CPR. Hypothesis: Self-reported team perception of CPR performance will not meet quantitative CPR metrics using AHA BLS guideline criteria. Methods: Prospective data from an international pediatric (pediRES-Q) resuscitation collaborative from February 2016 to August 2017. A modified Team Emergency Assessment Measure framework for qualitative content analysis was used to assess data from “hot” debriefings (held soon after arrest) by language processing experts blinded to CPR data. Events without reported perception of CPR and quantitative CPR data were excluded. Comments regarding CPR perception were grouped as either Plus perceptions of performance (PPP) or Delta perceptions of performance (DPP). Grouped events were matched and compared to quantitative CPR data of chest compression (CC) fraction (CCF), rate, and depth as collected by CPR-recording defibrillators. Compliance with AHA BLS guidelines were defined as events with mean: CCF >60%, CC rate 100-120/min; and CC depth for infants <1yo, ~4 cm (3.6-4.4 cm.); children 1-18 yo, 5-≤6 cm. Results: Of 227 arrests, 108 (48%) hot debriefings were reported. Reported CPR perceptions with paired quantitative CPR data were available for 53/108 (49%) events; 32/53 (60%) PPP and 21/53 (39%) DPP. Event CPR metric summaries (median [IQR]) for PPP - CCF 0.87 [0.77, 0.93]; CC rate 116/min [108.5, 120]; CC depth age <1yo 2.35 [2.01, 3.0] cm; >1yr 4.2 [3.3, 5.05] cm. DPP - CCF 0.79 [0.69, 0.92]; CC rate 118/min [109,129]; CC depth < 1 yo 2.03 [1.95, 2.2] cm; >1yo 3.93 [3.3, 5.06] cm. PPP events, 28/32 (87%) met guideline criteria for CCF, 25/32 (78%) for CC rate; 6/32 (19%) for CC depth; and 4/32 (12%) met criteria for all 3 categories. For DPP events, 17/21 (80%) met guideline criteria for CCF; 15/21 (71%) for CC rate; and 3/21 (15%) for CC depth, and 2/21 (9%) met criteria for all 3 categories. Conclusions: Self-reported team perception of CPR quality does not match quantitative CPR metrics using AHA guideline criteria whether CPR was positively perceived or not, depth being main reason for non-compliance.


Author(s):  
Logeswari Shanmugam ◽  
Premalatha K.

Biomedical literature is the primary repository of biomedical knowledge in which PubMed is the most absolute database for collecting, organizing and analyzing textual knowledge. The high dimensionality of the natural language text makes the text data quite noisy and sparse in the vector space. Hence, the data preprocessing and feature selection are important processes for the text processing issues. Ontologies select the meaningful terms semantically associated with the concepts from a document to reduce the dimensionality of the original text. In this chapter, semantic-based indexing approaches are proposed with cognitive search which makes use of domain ontology to extract relevant information from big and diverse data sets for users.


Sign in / Sign up

Export Citation Format

Share Document