scholarly journals Templated Text Synthesis for Expert-Guided Multi-Label Extraction from Radiology Reports

2021 ◽  
Vol 3 (2) ◽  
pp. 299-317
Author(s):  
Patrick Schrempf ◽  
Hannah Watson ◽  
Eunsoo Park ◽  
Maciej Pajak ◽  
Hamish MacKinnon ◽  
...  

Training medical image analysis models traditionally requires large amounts of expertly annotated imaging data which is time-consuming and expensive to obtain. One solution is to automatically extract scan-level labels from radiology reports. Previously, we showed that, by extending BERT with a per-label attention mechanism, we can train a single model to perform automatic extraction of many labels in parallel. However, if we rely on pure data-driven learning, the model sometimes fails to learn critical features or learns the correct answer via simplistic heuristics (e.g., that “likely” indicates positivity), and thus fails to generalise to rarer cases which have not been learned or where the heuristics break down (e.g., “likely represents prominent VR space or lacunar infarct” which indicates uncertainty over two differential diagnoses). In this work, we propose template creation for data synthesis, which enables us to inject expert knowledge about unseen entities from medical ontologies, and to teach the model rules on how to label difficult cases, by producing relevant training examples. Using this technique alongside domain-specific pre-training for our underlying BERT architecture i.e., PubMedBERT, we improve F1 micro from 0.903 to 0.939 and F1 macro from 0.512 to 0.737 on an independent test set for 33 labels in head CT reports for stroke patients. Our methodology offers a practical way to combine domain knowledge with machine learning for text classification tasks.

2022 ◽  
Author(s):  
Jakob Nikolas Kather ◽  
Narmin Ghaffari Laleh ◽  
Sebastian Foersch ◽  
Daniel Truhn

The text-guided diffusion model GLIDE (Guided Language to Image Diffusion for Generation and Editing) is the state of the art in text-to-image generative artificial intelligence (AI). GLIDE has rich representations, but medical applications of this model have not been systematically explored. If GLIDE had useful medical knowledge, it could be used for medical image analysis tasks, a domain in which AI systems are still highly engineered towards a single use-case. Here we show that the publicly available GLIDE model has reasonably strong representations of key topics in cancer research and oncology, in particular the general style of histopathology images and multiple facets of diseases, pathological processes and laboratory assays. However, GLIDE seems to lack useful representations of the style and content of radiology data. Our findings demonstrate that domain-agnostic generative AI models can learn relevant medical concepts without explicit training. Thus, GLIDE and similar models might be useful for medical image processing tasks in the future - particularly with additional domain-specific fine-tuning.


Algorithms ◽  
2021 ◽  
Vol 14 (7) ◽  
pp. 212
Author(s):  
Youssef Skandarani ◽  
Pierre-Marc Jodoin ◽  
Alain Lalande

Deep learning methods are the de facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application, which, like many others, requires a large number of annotated data so that a trained network can generalize well. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated data sets on which machine learning can successfully be trained. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with different loss functions on expert and non-expert ground truth for cardiac cine–MRI segmentation. Evaluation was done with classic segmentation metrics (Dice index and Hausdorff distance) as well as clinical measurements, such as the ventricular ejection fractions and the myocardial mass. The results reveal that generalization performances of a segmentation neural network trained on non-expert ground truth data is, to all practical purposes, as good as that trained on expert ground truth data, particularly when the non-expert receives a decent level of training, highlighting an opportunity for the efficient and cost-effective creation of annotations for cardiac data sets.


Author(s):  
Uga Sproģis ◽  
Matīss Rikters

We present the Latvian Twitter Eater Corpus - a set of tweets in the narrow domain related to food, drinks, eating and drinking. The corpus has been collected over time-span of over 8 years and includes over 2 million tweets entailed with additional useful data. We also separate two sub-corpora of question and answer tweets and sentiment annotated tweets. We analyse the contents of the corpus and demonstrate use-cases for the sub-corpora by training domain-specific question-answering and sentiment-analysis models using the data from the corpus.


2017 ◽  
Author(s):  
Marilena Oita ◽  
Antoine Amarilli ◽  
Pierre Senellart

Deep Web databases, whose content is presented as dynamically-generated Web pages hidden behind forms, have mostly been left unindexed by search engine crawlers. In order to automatically explore this mass of information, many current techniques assume the existence of domain knowledge, which is costly to create and maintain. In this article, we present a new perspective on form understanding and deep Web data acquisition that does not require any domain-specific knowledge. Unlike previous approaches, we do not perform the various steps in the process (e.g., form understanding, record identification, attribute labeling) independently but integrate them to achieve a more complete understanding of deep Web sources. Through information extraction techniques and using the form itself for validation, we reconcile input and output schemas in a labeled graph which is further aligned with a generic ontology. The impact of this alignment is threefold: first, the resulting semantic infrastructure associated with the form can assist Web crawlers when probing the form for content indexing; second, attributes of response pages are labeled by matching known ontology instances, and relations between attributes are uncovered; and third, we enrich the generic ontology with facts from the deep Web.


2016 ◽  
Author(s):  
Maia A. Smith ◽  
Cydney Nielsen ◽  
Fong Chun Chan ◽  
Andrew McPherson ◽  
Andrew Roth ◽  
...  

Inference of clonal dynamics and tumour evolution has fundamental importance in understanding the major clinical endpoints in cancer: development of treatment resistance, relapse and metastasis. DNA sequencing technology has made measuring clonal dynamics through mutation analysis accessible at scale, facilitating computational inference of informative patterns of interest. However, currently no tools allow for biomedical experts to meaningfully interact with the often complex and voluminous dataset to inject domain knowledge into the inference process. We developed an interactive, web-based visual analytics software suite called E-scape which supports dynamically linked, multi-faceted views of cancer evolution data. Developed using R and javascript d3.js libraries, the suite includes three tools: TimeScape and MapScape for visualizing population dynamics over time and space, respectively, and CellScape for visualizing evolution at single cell resolution. The tool suite integrates phylogenetic, clonal prevalence, mutation and imaging data to generate intuitive, dynamically linked views of data which update in real time as a function of user actions. The system supports visualization of both point mutation and copy number alterations, rendering how mutations distribute in clones in both bulk and single cell experiment data in multiple representations including phylogenies, heatmaps, growth trajectories, spatial distributions and mutation tables. E-scape is open source and is freely available to the community at large.


Author(s):  
Mohan Sridharan ◽  
Tiago Mota

Our architecture uses non-monotonic logical reasoning with incomplete commonsense domain knowledge, and incremental inductive learning, to guide the construction of deep network models from a small number of training examples. Experimental results in the context of a robot reasoning about the partial occlusion of objects and the stability of object configurations in simulated images indicate an improvement in reliability and a reduction in computational effort in comparison with an architecture based just on deep networks.


2005 ◽  
Vol 19 (2) ◽  
pp. 57-77 ◽  
Author(s):  
Gregory J. Gerard

Most database textbooks on conceptual modeling do not cover domainspecific patterns. The texts emphasize notation, apparently assuming that notation enables individuals to correctly model domain-specific knowledge acquired from experience. However, the domain knowledge acquired may not aid in the construction of conceptual models if it is not structured to support conceptual modeling. This study uses the Resources Events Agents (REA) pattern as an example of a domain-specific pattern that can be encoded as a knowledge structure for conceptual modeling of accounting information systems (AIS), and tests its effects on the accuracy of conceptual modeling in a familiar business setting. Fifty-three undergraduate and forty-six graduate students completed recall tasks designed to measure REA knowledge structure. The accuracy of participants' conceptual models was positively related to REA knowledge structure. Results suggest it is insufficient to know only conceptual modeling notation because structured knowledge of domain-specific patterns reduces design errors.


Author(s):  
Sebastian Günther

Internal DSLs are a special kind of DSLs that use an existing programming language as their host. In this chapter, the author explains an iterative development process for internal DSLs. The goals of this process are: (1) to give developers a familiar environment in which they can use known and proven development steps, techniques, tools, and host languages, (2) to provide a set of repeatable, iterative steps that support the continuous adaptation and evolution of the domain knowledge and the DSL implementation, and (3) to apply design principles that help to develop DSLs with essential properties and to use host language independent design patterns to plan and communicate the design and implementation of the DSL. The process consists of three development steps (analysis, language design, and language implementation) and applies four principles: open form, agile and test-driven development, design pattern knowledge, and design principle knowledge.


Science ◽  
2018 ◽  
Vol 362 (6419) ◽  
pp. 1140-1144 ◽  
Author(s):  
David Silver ◽  
Thomas Hubert ◽  
Julian Schrittwieser ◽  
Ioannis Antonoglou ◽  
Matthew Lai ◽  
...  

The game of chess is the longest-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. By contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go by reinforcement learning from self-play. In this paper, we generalize this approach into a single AlphaZero algorithm that can achieve superhuman performance in many challenging games. Starting from random play and given no domain knowledge except the game rules, AlphaZero convincingly defeated a world champion program in the games of chess and shogi (Japanese chess), as well as Go.


Author(s):  
Gour C. Karmakar ◽  
Laurence Dooley ◽  
Mahbubhur Rahman Syed

This chapter provides a comprehensive overview of various methods of fuzzy logic-based image segmentation techniques. Fuzzy image segmentation techniques outperform conventional techniques, as they are able to evaluate imprecise data as well as being more robust in noisy environment. Fuzzy clustering methods need to set the number of clusters prior to segmentation and are sensitive to the initialization of cluster centers. Fuzzy rule-based segmentation techniques can incorporate the domain expert knowledge and manipulate numerical as well as linguistic data. It is also capable of drawing partial inference using fuzzy IF-THEN rules. It has been also intensively applied in medical imaging. These rules are, however, application-domain specific and very difficult to define either manually or automatically that can complete the segmentation alone. Fuzzy geometry and thresholding-based image segmentation techniques are suitable only for bimodal images and can be applied in multimodal images, but they don’t produce a good result for the images that contain a significant amount of overlapping pixels between background and foreground regions. A few techniques on image segmentation based on fuzzy integral and soft computing techniques have been published and appear to offer considerable promise.


Sign in / Sign up

Export Citation Format

Share Document