Context-Based Loose Information Structure for Medical Free Text Document

Author(s):  
Tadamasa Takemura ◽  
Kazuya Okamoto ◽  
Hyogyong Kim ◽  
Masahiro Hirose ◽  
Tomohiro Kuroda ◽  
...  
2015 ◽  
Vol 10 (1) ◽  
pp. 249-259 ◽  
Author(s):  
Graham A Parton ◽  
Steven Donegan ◽  
Stephen Pascoe ◽  
Ag Stephens ◽  
Spiros Ventouras ◽  
...  

ISO19156 Observations and Measurements (O&M) provides a standardised framework for organising information about the collection of information about the environment.  Here we describe the implementation of a specialisation of O&M for environmental data, the Metadata Objects for Linking Environmental Sciences (MOLES3).MOLES3 provides support for organising information about data, and for user navigation around data holdings. The implementation described here, “CEDA-MOLES”, also supports data management functions for the Centre for Environmental Data Archival, CEDA. The previous iteration of MOLES (MOLES2) saw active use over five years, being replaced by CEDA-MOLES in late 2014. During that period important lessons were learnt both about the information needed, as well as how to design and maintain the necessary information systems. In this paper we review the problems encountered in MOLES2; how and why CEDA-MOLES was developed and engineered; the migration of information holdings from MOLES2 to CEDA-MOLES; and, finally, provide an early assessment of MOLES3 (as implemented in CEDA-MOLES) and its limitations.Key drivers for the MOLES3 development included the necessity for improved data provenance, for further structured information to support ISO19115 discovery metadata  export (for EU INSPIRE compliance), and to provide appropriate fixed landing pages for Digital Object Identifiers (DOIs) in the presence of evolving datasets. Key lessons learned included the importance of minimising information structure in free text fields, and the necessity to support as much agility in the information infrastructure as possible without compromising on maintainability both by those using the systems internally and externally (e.g. citing in to the information infrastructure), and those responsible for the systems themselves. The migration itself needed to ensure continuity of service and traceability of archived assets.


Author(s):  
Wichor M. Bramer ◽  
Gerdien B. De Jonge ◽  
Melissa L. Rethlefsen ◽  
Frans Mast ◽  
Jos Kleijnen

Creating search strategies for systematic reviews, finding the best balance between sensitivity and specificity, and translating search strategies between databases is challenging. Several methods describe standards for systematic search strategies, but a consistent approach for creating an exhaustive search strategy has not yet been fully described in enough detail to be fully replicable. The authors have established a method that describes step by step the process of developing a systematic search strategy as needed in the systematic review. This method describes how single-line search strategies can be prepared in a text document by typing search syntax (such as field codes, parentheses, and Boolean operators) before copying and pasting search terms (keywords and free-text synonyms) that are found in the thesaurus. To help ensure term completeness, we developed a novel optimization technique that is mainly based on comparing the results retrieved by thesaurus terms with those retrieved by the free-text search words to identify potentially relevant candidate search terms. Macros in Microsoft Word have been developed to convert syntaxes between databases and interfaces almost automatically. This method helps information specialists in developing librarian-mediated searches for systematic reviews as well as medical and health care practitioners who are searching for evidence to answer clinical questions. The described method can be used to create complex and comprehensive search strategies for different databases and interfaces, such as those that are needed when searching for relevant references for systematic reviews, and will assist both information specialists and practitioners when they are searching the biomedical literature.


Author(s):  
Samir Malakar ◽  
Dheeraj Mohanta ◽  
Ram Sarkar ◽  
Mita Nasipuri

For developing a high quality Optical Character Recognition (OCR) system removal of noise from the document image is an utmost important step. To make this possible, filtering plays a significant role. Although mean and median filters, the two well-known statistical filtering techniques, are used commonly but sometimes these filters may fail to produce noise-free images or sometimes may introduce distortions on the characters in the form of gulfs or capes. In the work reported here, we have developed a new filtering technique, called Middle of Modal Class (MMC), for smoothing the input images. This filtering technique is applicable for both the noisy and noise free text document image at the same time. We have also compared our results with mean and median filters, and have achieved better result.


Author(s):  
M. M. Rufai ◽  
A. O. Afolabi ◽  
O. D. Fenwa ◽  
F. A. Ajala

Aims: To evaluate the performance of an Improved Latent Semantic Analysis (ILSA), Latent Semantic Analysis (LSA), Non-Negative Matrix Factorization (NMF) algorithms in an Electronic Assessment Application using metrics, Term Similarity, Precision, Recall and F-measure functions, Mean divergence, Assessment Accuracy and Adequacy in Semantic Representation. Methodology: The three algorithms were separately applied in developing an Electronic Assessment application. One hundred students’ responses to a test question in an introductory artificial intelligence course were used. Their performance was measured based on the following metrics, Term Similarity, Precision, Recall and F-measure functions, Mean divergence and Assessment Accuracy. Results: ILSA outperformed the LSA and NMF with an assessment accuracy of 96.64, mean divergence from manual score of 0.03, and recall, precision and f-measure value of 0.83, 0.85 and 0.87 respectively. Conclusion: The research observed the performance of an improved algorithm ILSA for electronic Assessment of free text document using Adequacy in Semantic Representation, Retrieval Quality and Assessment Accuracy as performance metrics. The results obtained from the experimental designs shows the adequacy of the improved algorithm in semantic representation, better retrieval quality and improved assessment accuracy.


2017 ◽  
Vol 56 (03) ◽  
pp. 230-237 ◽  
Author(s):  
Peter Krücken ◽  
Wolf Mueller ◽  
Kerstin Denecke ◽  
Stefan Kropf

SummaryBackground: Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results.Objectives: We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi- automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse.Methods: Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML.Results: Pathology reports (PRs) can be reliably structured into sections by a keyword- based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries.Conclusions: Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.


Author(s):  
Rufai Mohammed Mutiu ◽  
A. O. Afolabi ◽  
O. D. Fenwa ◽  
F. A. Ajala

Latent Semantic Analysis (LSA) is a statistical approach designed to capture the semantic content of a document which form the basis for its application in electronic assessment of free-text document in an examination context. The students submitted answers are transformed into a Document Term Matrix (DTM) and approximated using SVD-LSA for noise reduction. However, it has been shown that LSA still has remnant of noise in its semantic representation which ultimately affects the assessment result accuracy when compared to human grading. In this work, the LSA Model is formulated as an optimization problem using Non-negative Matrix Factorization(NMF)-Ant Colony Optimization (ACO). The factors of LSA are used to initialize NMF factors for quick convergence. ACO iteratively searches for the value of the decision variables in NMF that minimizes the objective function and use these values to construct a reduced DTM. The results obtained shows a better approximation of the DTM representation and improved assessment result of 91.35% accuracy, mean divergence of 0.0865 from human grading and a Pearson correlation coefficient of 0.632 which proved to be a better result than the existing ones.


2018 ◽  
Author(s):  
Goksel Misirli ◽  
Renee Taylor ◽  
Angel Goni-Moreno ◽  
James Alastair McLaughlin ◽  
Chris Myers ◽  
...  

Standard representation of data is key for the reproducibility of designs in synthetic biology. The Synthetic Biology Open Language (SBOL) has already emerged as a data standard to represent genetic circuit designs, and it is based on capturing data using graphs. The language provides the syntax using a free text document which is accessible to humans only. Here, we provide SBOL-OWL, an ontology for a machine understandable definition of SBOL. This ontology acts as a semantic layer for genetic circuit designs. As a result, computational tools can understand the meaning of design entities in addition to parsing structured SBOL data. SBOL-OWL not only describes how genetic circuits can be constructed computationally, it also facilitates the use of several existing Semantic Web tooling for synthetic biology. Here, we demonstrate some of these features, for example, to validate designs and check for inconsistencies. Through the use of SBOL-OWL, queries are simplified and become more intuitive. Moreover, existing reasoners can be used to infer information about genetic circuit designs that can't be directly retrieved using existing querying mechanisms. This ontological representation of the SBOL standard provides a new perspective to the verification, representation and querying of information about synthetic genetic circuits and is important to incorporate complex design information via the integration of biological ontologies.


1994 ◽  
Vol 33 (05) ◽  
pp. 454-463 ◽  
Author(s):  
A. M. van Ginneken ◽  
J. van der Lei ◽  
J. H. van Bemmel ◽  
P. W. Moorman

Abstract:Clinical narratives in patient records are usually recorded in free text, limiting the use of this information for research, quality assessment, and decision support. This study focuses on the capture of clinical narratives in a structured format by supporting physicians with structured data entry (SDE). We analyzed and made explicit which requirements SDE should meet to be acceptable for the physician on the one hand, and generate unambiguous patient data on the other. Starting from these requirements, we found that in order to support SDE, the knowledge on which it is based needs to be made explicit: we refer to this knowledge as descriptional knowledge. We articulate the nature of this knowledge, and propose a model in which it can be formally represented. The model allows the construction of specific knowledge bases, each representing the knowledge needed to support SDE within a circumscribed domain. Data entry is made possible through a general entry program, of which the behavior is determined by a combination of user input and the content of the applicable domain knowledge base. We clarify how descriptional knowledge is represented, modeled, and used for data entry to achieve SDE, which meets the proposed requirements.


1992 ◽  
Vol 31 (04) ◽  
pp. 268-274 ◽  
Author(s):  
W. Gaus ◽  
J. G. Wechsler ◽  
P. Janowitz ◽  
J. Tudyka ◽  
W. Kratzer ◽  
...  

Abstract:A system using structured reporting of findings was developed for the preparation of medical reports and for clinical documentation purposes in upper abdominal sonography, and evaluated in the course of routine use. The evaluation focussed on the following parameters: completeness and correctness of the entered data, the proportion of free text, the validity and objectivity of the documentation, user acceptance, and time required. The completeness in the case of two clinically relevant parameters could be compared with an already existing database containing freely dictated reports. The results confirmed the hypothesis that, for the description of results of a technical examination, structured data reporting is a viable alternative to free-text dictation. For the application evaluated, there is even evidence of the superiority of a structured approach. The system can be put to use in related areas of application.


Sign in / Sign up

Export Citation Format

Share Document