Knowledge Discovery and Data Mining of Free Text Radiology Reports

Data mining adalah analisis atau pengamatan terhadap kumpulan data yang besar dengan tujuan untuk menemukan hubungan tak terduga dan untuk meringkas data dengan cara yang lebih mudah dimengerti dan bermanfaat bagi pemilik data. Data mining merupakan proses inti dalam Knowledge Discovery in Database (KDD). Metode data mining digunakan untuk menganalisis data pembayaran kredit peminjam pembayaran kredit. Berdasarkan pola pembayaran kredit peminjam yang dihasilkan, dapat dilihat parameter-parameter kredit yang memiliki keterkaitan dan paling berpengaruh terhadap pembayaran angsuran kredit. Kata kunci—data mining, outlier, multikolonieritas, Anova

Download Full-text

Developing a RadLex-based Named Entity Recognition Tool for Mining Textual Radiology Reports (Preprint)

10.2196/preprints.25378 ◽

2020 ◽

Author(s):

Shintaro Tsuji ◽

Andrew Wen ◽

Naoki Takahashi ◽

Hongjian Zhang ◽

Katsuhiko Ogasawara ◽

...

Keyword(s):

Named Entity Recognition ◽

Noun Phrases ◽

General Purpose ◽

Entity Recognition ◽

Free Text ◽

Clinical Text ◽

Named Entity ◽

Radiology Reports ◽

Two Measures ◽

F Measure

BACKGROUND Named entity recognition (NER) plays an important role in extracting the features of descriptions for mining free-text radiology reports. However, the performance of existing NER tools is limited because the number of entities depends on its dictionary lookup. Especially, the recognition of compound terms is very complicated because there are a variety of patterns. OBJECTIVE The objective of the study is to develop and evaluate a NER tool concerned with compound terms using the RadLex for mining free-text radiology reports. METHODS We leveraged the clinical Text Analysis and Knowledge Extraction System (cTAKES) to develop customized pipelines using both RadLex and SentiWordNet (a general-purpose dictionary, GPD). We manually annotated 400 of radiology reports for compound terms (Cts) in noun phrases and used them as the gold standard for the performance evaluation (precision, recall, and F-measure). Additionally, we also created a compound-term-enhanced dictionary (CtED) by analyzing false negatives (FNs) and false positives (FPs), and applied it for another 100 radiology reports for validation. We also evaluated the stem terms of compound terms, through defining two measures: an occurrence ratio (OR) and a matching ratio (MR). RESULTS The F-measure of the cTAKES+RadLex+GPD was 32.2% (Precision 92.1%, Recall 19.6%) and that of combined the CtED was 67.1% (Precision 98.1%, Recall 51.0%). The OR indicated that stem terms of “effusion”, "node", "tube", and "disease" were used frequently, but it still lacks capturing Cts. The MR showed that 71.9% of stem terms matched with that of ontologies and RadLex improved about 22% of the MR from the cTAKES default dictionary. The OR and MR revealed that the characteristics of stem terms would have the potential to help generate synonymous phrases using ontologies. CONCLUSIONS We developed a RadLex-based customized pipeline for parsing radiology reports and demonstrated that CtED and stem term analysis has the potential to improve dictionary-based NER performance toward expanding vocabularies.

Download Full-text

The AI Delusion

10.1093/oso/9780198824305.001.0001 ◽

2018 ◽

Cited By ~ 5

Author(s):

Gary Smith

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Industrial Revolution ◽

The Real ◽

Intelligent Machines ◽

Black Boxes ◽

Real Danger ◽

The Way

We live in an incredible period in history. The Computer Revolution may be even more life-changing than the Industrial Revolution. We can do things with computers that could never be done before, and computers can do things for us that could never be done before. But our love of computers should not cloud our thinking about their limitations. We are told that computers are smarter than humans and that data mining can identify previously unknown truths, or make discoveries that will revolutionize our lives. Our lives may well be changed, but not necessarily for the better. Computers are very good at discovering patterns, but are useless in judging whether the unearthed patterns are sensible because computers do not think the way humans think. We fear that super-intelligent machines will decide to protect themselves by enslaving or eliminating humans. But the real danger is not that computers are smarter than us, but that we think computers are smarter than us and, so, trust computers to make important decisions for us. The AI Delusion explains why we should not be intimidated into thinking that computers are infallible, that data-mining is knowledge discovery, and that black boxes should be trusted.

Download Full-text

Multi-task weak supervision enables anatomically-resolved abnormality detection in whole-body FDG-PET/CT

Nature Communications ◽

10.1038/s41467-021-22018-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Sabri Eyuboglu ◽

Geoffrey Angus ◽

Bhavik N. Patel ◽

Anuj Pareek ◽

Guido Davidzon ◽

...

Keyword(s):

Fdg Pet ◽

Location Estimation ◽

Supervised Machine Learning ◽

Whole Body ◽

Free Text ◽

Abnormality Detection ◽

Weak Supervision ◽

Radiology Reports ◽

Pet Ct ◽

Fdg Pet Ct

AbstractComputational decision support systems could provide clinical value in whole-body FDG-PET/CT workflows. However, limited availability of labeled data combined with the large size of PET/CT imaging exams make it challenging to apply existing supervised machine learning systems. Leveraging recent advancements in natural language processing, we describe a weak supervision framework that extracts imperfect, yet highly granular, regional abnormality labels from free-text radiology reports. Our framework automatically labels each region in a custom ontology of anatomical regions, providing a structured profile of the pathologies in each imaging exam. Using these generated labels, we then train an attention-based, multi-task CNN architecture to detect and estimate the location of abnormalities in whole-body scans. We demonstrate empirically that our multi-task representation is critical for strong performance on rare abnormalities with limited training data. The representation also contributes to more accurate mortality prediction from imaging data, suggesting the potential utility of our framework beyond abnormality detection and location estimation.

Download Full-text

Particularities of data mining in medicine: lessons learned from patient medical time series data analysis

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1582-2 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 2

Author(s):

Shadi Aljawarneh ◽

Aurea Anguera ◽

John William Atwood ◽

Juan A. Lara ◽

David Lizcano

Keyword(s):

Data Mining ◽

Time Series ◽

Knowledge Discovery ◽

Time Series Data ◽

Medical Patient ◽

Lessons Learned ◽

Physiological Signals ◽

Knowledge Discovery In Databases ◽

Series Data ◽

Data Mining Techniques

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.

Download Full-text