Constituent vs Dependency Parsing-Based RDF Model Generation from Dengue Patients’ Case Sheets

Author(s):  
Runumi Devi ◽  
Deepti Mehrotra ◽  
Sana Ben Abdallah Ben Lamine

Electronic Health Record (EHR) systems in healthcare organisations are primarily maintained in isolation from each other that makes interoperability of unstructured(text) data stored in these EHR systems challenging in the healthcare domain. Similar information may be described using different terminologies by different applications that can be evaded by transforming the content into the Resource Description Framework (RDF) model that is interoperable amongst organisations. RDF requires a document’s contents to be translated into a repository of triplets (subject, predicate, object) known as RDF statements. Natural Language Processing (NLP) techniques can help get actionable insights from these text data and create triplets for RDF model generation. This paper discusses two NLP-based approaches to generate the RDF models from unstructured patients’ documents, namely dependency structure-based and constituent(phrase) structure-based parser. Models generated by both approaches are evaluated in two aspects: exhaustiveness of the represented knowledge and the model generation time. The precision measure is used to compute the models’ exhaustiveness in terms of the number of facts that are transformed into RDF representations.

2017 ◽  
Vol 8 (4) ◽  
pp. 66-76
Author(s):  
Aderonke A. Oni ◽  
Efosa Carroll Idemudia ◽  
Babafemi O. Odusote

Governments worldwide are using technology to provide effective and efficient services; and to improve the lives of all citizens. To date, there are few studies relating to unstructured text data that investigate the factors that influence government and mobile apps' usage and adoption. To address this issue, we conducted our research and developed the Natural Language Processing Model to sequentially analyze our unstructured text data, which we collected from MetricsCat's website. Our results from text analysis show that some of the most influential factors in why users adopt and use government apps are the quality of the app, the app's usefulness, whether or not the app is informative, and whether or not the app remains up to date. Our research contains practical and research implications for key government officials and designers of mobile apps.


Author(s):  
Runumi Devi ◽  
Deepti Mehrotra ◽  
Hajer Baazaoui-Zghal

The automatic extraction of triplets from unstructured patient records and transforming them into resource description framework (RDF) models has remained a huge challenge so far, and would provide significant benefit to potential applications like knowledge discovery, machine interoperability, and ontology design in the health care domain. This article describes an approach that extracts semantics (triplets) from dengue patient case-sheets and clinical reports and transforms them into an RDF model. A Text2Ontology framework is used for extracting relations from text and was found to have limited capability. The TypedDependency parsing-based algorithm is designed for extracting RDF facts from patients' case-sheets and subsequent conversion into RDF models. A mapping-driven semantifying approach is also designed for mapping clinical details extracted from patients' reports to its corresponding triplet components and subsequent RDF model generations. The exhaustiveness of the RDF models generated are measured based on the number of axioms generated with respect to the facts available.


Author(s):  
Jose L. Martinez-Rodriguez ◽  
Ivan Lopez-Arevalo ◽  
Jaime I. Lopez-Veyna ◽  
Ana B. Rios-Alvarado ◽  
Edwin Aldana-Bobadilla

One of the goals of data scientists and curators is to get information (contained in text) organized and integrated in a way that can be easily consumed by people and machines. A starting point for such a goal is to get a model to represent the information. This model should ease to obtain knowledge semantically (e.g., using reasoners and inferencing rules). In this sense, the Semantic Web is focused on representing the information through the Resource Description Framework (RDF) model, in which the triple (subject, predicate, object) is the basic unit of information. In this context, the natural language processing (NLP) field has been a cornerstone in the identification of elements that can be represented by triples of the Semantic Web. However, existing approaches for the representation of RDF triples from texts use diverse techniques and tasks for such purpose, which complicate the understanding of the process by non-expert users. This chapter aims to discuss the main concepts involved in the representation of the information through the Semantic Web and the NLP fields.


2021 ◽  
Vol 54 (2) ◽  
pp. 1-37
Author(s):  
Dhivya Chandrasekaran ◽  
Vijay Mago

Estimating the semantic similarity between text data is one of the challenging and open research problems in the field of Natural Language Processing (NLP). The versatility of natural language makes it difficult to define rule-based methods for determining semantic similarity measures. To address this issue, various semantic similarity methods have been proposed over the years. This survey article traces the evolution of such methods beginning from traditional NLP techniques such as kernel-based methods to the most recent research work on transformer-based models, categorizing them based on their underlying principles as knowledge-based, corpus-based, deep neural network–based methods, and hybrid methods. Discussing the strengths and weaknesses of each method, this survey provides a comprehensive view of existing systems in place for new researchers to experiment and develop innovative ideas to address the issue of semantic similarity.


2021 ◽  
Vol 39 (28_suppl) ◽  
pp. 324-324
Author(s):  
Isaac S. Chua ◽  
Elise Tarbi ◽  
Jocelyn H. Siegel ◽  
Kate Sciacca ◽  
Anne Kwok ◽  
...  

324 Background: Delivering goal-concordant care to patients with advanced cancer requires identifying eligible patients who would benefit from goals of care (GOC) conversations; training clinicians how to have these conversations; conducting conversations in a timely manner; and documenting GOC conversations that can be readily accessed by care teams. We used an existing, locally developed electronic cancer care clinical pathways system to guide oncologists toward these conversations. Methods: To identify eligible patients, pathways directors from 12 oncology disease centers identified therapeutic decision nodes for each pathway that corresponded to a predicted life expectancy of ≤1 year. When oncologists selected one of these pre-identified pathways nodes, the decision was captured in a relational database. From these patients, we sought evidence of GOC documentation within the electronic health record by extracting coded data from the advance care planning (ACP) module—a designated area within the electronic health record for clinicians to document GOC conversations. We also used rule-based natural language processing (NLP) to capture free text GOC documentation within these same patients’ progress notes. A domain expert reviewed all progress notes identified by NLP to confirm the presence of GOC documentation. Results: In a pilot sample obtained between March 20 and September 25, 2020, we identified a total of 21 pathway nodes conveying a poor prognosis, which represented 91 unique patients with advanced cancer. Among these patients, the mean age was 62 (SD 13.8) years old; 55 (60.4%) patients were female, and 69 (75.8%) were non-Hispanic White. The cancers most represented were thoracic (32 [35.2%]), breast (31 [34.1%]), and head and neck (13 [14.3%]). Within the 3 months leading up to the pathways decision date, a total 62 (68.1%) patients had any GOC documentation. Twenty-one (23.1%) patients had documentation in both the ACP module and NLP-identified progress notes; 5 (5.5%) had documentation in the ACP module only; and 36 (39.6%) had documentation in progress notes only. Twenty-two unique clinicians utilized the ACP module, of which 1 (4.5%) was an oncologist and 21 (95.5%) were palliative care clinicians. Conclusions: Approximately two thirds of patients had any GOC documentation. A total of 26 (28.6%) patients had any GOC documentation in the ACP module, and only 1 oncologist documented using the ACP module, where care teams can most easily retrieve GOC information. These findings provide an important baseline for future quality improvement efforts (e.g., implementing serious illness communications training, increasing support around ACP module utilization, and incorporating behavioral nudges) to enhance oncologists’ ability to conduct and to document timely, high quality GOC conversations.


Libri ◽  
2021 ◽  
Vol 71 (4) ◽  
pp. 375-387
Author(s):  
Seungmin Lee

Abstract A pidgin metadata framework based on the concept of pidgin metadata is proposed to complement the limitations of existing approaches to metadata interoperability and to achieve more reliable metadata interoperability. The framework consists of three layers, with a hierarchical structure, and reflects the semantic and structural characteristics of various metadata. Layer 1 performs both an external function, serving as an anchor for semantic association between metadata elements, and an internal function, providing semantic categories that can encompass detailed elements. Layer 2 is an arbitrary layer composed of substantial elements from existing metadata and performs a function in which different metadata elements describing the same or similar aspects of information resources are associated with the semantic categories of Layer 1. Layer 3 implements the semantic relationships between Layer 1 and Layer 2 through the Resource Description Framework syntax. With this structure, the pidgin metadata framework can establish the criteria for semantic connection between different elements and fully reflect the complexity and heterogeneity among various metadata. Additionally, it is expected to provide a bibliographic environment that can achieve more reliable metadata interoperability than existing approaches by securing the communication between metadata.


2015 ◽  
Vol 12 (2) ◽  
pp. 104-118 ◽  
Author(s):  
Frank T. Bergmann ◽  
Nicolas Rodriguez ◽  
Nicolas Le Novère

Summary Several standard formats have been proposed that can be used to describe models, simulations, data or other essential information in a consistent fashion. These constitute various separate components required to reproduce a given published scientific result.The Open Modeling EXchange format (OMEX) supports the exchange of all the information necessary for a modeling and simulation experiment in biology. An OMEX file is a ZIP container that includes a manifest file, an optional metadata file, and the files describing the model. The manifest is an XML file listing all files included in the archive and their type. The metadata file provides additional information about the archive and its content. Although any format can be used, we recommend an XML serialization of the Resource Description Framework.Together with the other standard formats from the Computational Modeling in Biology Network (COMBINE), OMEX is the basis of the COMBINE Archive. The content of a COMBINE Archive consists of files encoded in COMBINE standards whenever possible, but may include additional files defined by an Internet Media Type. The COMBINE Archive facilitates the reproduction of modeling and simulation experiments in biology by embedding all the relevant information in one file. Having all the information stored and exchanged at once also helps in building activity logs and audit trails.


Sign in / Sign up

Export Citation Format

Share Document