scholarly journals Thyroid Ultrasound Reports: Will the Thyroid Imaging, Reporting, and Data System Improve Natural Language Processing Capture of Critical Thyroid Nodule Features?

2020 ◽  
Vol 256 ◽  
pp. 557-563
Author(s):  
Kallie J. Chen ◽  
Priya H. Dedhia ◽  
Joseph R. Imbus ◽  
David F. Schneider
Author(s):  
Priya H. Dedhia ◽  
Kallie Chen ◽  
Yiqiang Song ◽  
Eric LaRose ◽  
Joseph R. Imbus ◽  
...  

Abstract Objective Natural language processing (NLP) systems convert unstructured text into analyzable data. Here, we describe the performance measures of NLP to capture granular details on nodules from thyroid ultrasound (US) reports and reveal critical issues with reporting language. Methods We iteratively developed NLP tools using clinical Text Analysis and Knowledge Extraction System (cTAKES) and thyroid US reports from 2007 to 2013. We incorporated nine nodule features for NLP extraction. Next, we evaluated the precision, recall, and accuracy of our NLP tools using a separate set of US reports from an academic medical center (A) and a regional health care system (B) during the same period. Two physicians manually annotated each test-set report. A third physician then adjudicated discrepancies. The adjudicated “gold standard” was then used to evaluate NLP performance on the test-set. Results A total of 243 thyroid US reports contained 6,405 data elements. Inter-annotator agreement for all elements was 91.3%. Compared with the gold standard, overall recall of the NLP tool was 90%. NLP recall for thyroid lobe or isthmus characteristics was: laterality 96% and size 95%. NLP accuracy for nodule characteristics was: laterality 92%, size 92%, calcifications 76%, vascularity 65%, echogenicity 62%, contents 76%, and borders 40%. NLP recall for presence or absence of lymphadenopathy was 61%. Reporting style accounted for 18% errors. For example, the word “heterogeneous” interchangeably referred to nodule contents or echogenicity. While nodule dimensions and laterality were often described, US reports only described contents, echogenicity, vascularity, calcifications, borders, and lymphadenopathy, 46, 41, 17, 15, 9, and 41% of the time, respectively. Most nodule characteristics were equally likely to be described at hospital A compared with hospital B. Conclusions NLP can automate extraction of critical information from thyroid US reports. However, ambiguous and incomplete reporting language hinders performance of NLP systems regardless of institutional setting. Standardized or synoptic thyroid US reports could improve NLP performance.


2021 ◽  
Vol 8 (8) ◽  
pp. 385-391
Author(s):  
Kania Difa Parama Citta ◽  
Sahudi Sahudi ◽  
Iskandar Ali

Background: Thyroid cancer is a malignancy of the endocrine gland with the highest incidence. There are many radiological examination modalities that are used to help diagnose thyroid carcinoma, one of which is Ultrasonography. Ultrasonography (USG) can be useful to support the diagnosis of thyroid malignancy. A classification method that categorizes thyroid nodules based on risk for cancer, one of which is by using the Thyroid Imaging Reporting and Data System (TI-RADS). TI-RADS (Thyroid Imaging, Reporting and Data System) is a classification of thyroid ultrasound readings to differentiate between benign and malignant thyroid nodules. Several research efforts that have been done at Dr. Soetomo Hospital previously related to diagnostic of thyroid carcinoma but the results are meaningless and require large funds for the laboratory examination. The aim of this study is to make a relatively easy and inexpensive method using the TI-RADS classification, which is expected to assist in the preoperative diagnostics of a follicular thyroid carcinoma. It is hoped that there will be a method or modality that is easier, cheaper, accurate, and minimally invasive in predicting a follicular thyroid carcinoma. Methods: In this cross-sectional study, we included patients with thyroid mass who underwent treatment in Surgery Department, Dr. Soetomo Teaching Hospital between January 2012 and December 2020. In this study, we utilized the patients’ medical record to collect the necessary clinical data. The inclusion criteria in this study were patients with singular thyroid nodule, underwent thyroid ultrasound, and diagnosed as follicular nodular carcinoma by histopathology examination. Finally, a total of 53 patients were included for further analysis. Ethical approval was obtained from the Ethics Committee of Dr. Soetomo Teaching Hospital (Surabaya, Indonesia). Results: From a total of 53 research subjects, the subjects with the most age were more than 50 years old with a percentage of 52.8% or 28 patients and the rest, 47.2% or 25 patients. The results of this study indicates that nodule diameters less than 5 cm and more than 5 cm have almost the same number based on the number of data samples in this study, namely 53 patients. This can be seen from the number of respectively 27 (50.9%) and 26 (49%). In the TIRADS nodule score, the largest percentage obtained from medical data records in the form of a TIRADS score, namely a TIRADS score greater than TR 4 with a percentage of 60.4% or as many as 32 patients and the rest, namely a TIRADS score less than TR 4 of 39.6% or as much as 21 patients. In the third dependent variable, the authors looked for the odd ratio value for each variable on follicular carcinoma. The authors calculated the OR values ​​for each variable, obtaining results of 1.012 for age, 1.111 for nodule size, and 3.520 for TIRADS scores. Conclusion: There is a correlation between the TIRADS scores with the incidence of follicular thyroid carcinoma. Keywords: Thyroid cancer, TIRADS, Follicular Thyroid Carcinoma.


2020 ◽  
pp. 3-17
Author(s):  
Peter Nabende

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.


Diabetes ◽  
2019 ◽  
Vol 68 (Supplement 1) ◽  
pp. 1243-P
Author(s):  
JIANMIN WU ◽  
FRITHA J. MORRISON ◽  
ZHENXIANG ZHAO ◽  
XUANYAO HE ◽  
MARIA SHUBINA ◽  
...  

Author(s):  
Pamela Rogalski ◽  
Eric Mikulin ◽  
Deborah Tihanyi

In 2018, we overheard many CEEA-AGEC members stating that they have "found their people"; this led us to wonder what makes this evolving community unique. Using cultural historical activity theory to view the proceedings of CEEA-ACEG 2004-2018 in comparison with the geographically and intellectually adjacent ASEE, we used both machine-driven (Natural Language Processing, NLP) and human-driven (literature review of the proceedings) methods. Here, we hoped to build on surveys—most recently by Nelson and Brennan (2018)—to understand, beyond what members say about themselves, what makes the CEEA-AGEC community distinct, where it has come from, and where it is going. Engaging in the two methods of data collection quickly diverted our focus from an analysis of the data themselves to the characteristics of the data in terms of cultural historical activity theory. Our preliminary findings point to some unique characteristics of machine- and human-driven results, with the former, as might be expected, focusing on the micro-level (words and language patterns) and the latter on the macro-level (ideas and concepts). NLP generated data within the realms of "community" and "division of labour" while the review of proceedings centred on "subject" and "object"; both found "instruments," although NLP with greater granularity. With this new understanding of the relative strengths of each method, we have a revised framework for addressing our original question.  


2020 ◽  
Author(s):  
Vadim V. Korolev ◽  
Artem Mitrofanov ◽  
Kirill Karpov ◽  
Valery Tkachenko

The main advantage of modern natural language processing methods is a possibility to turn an amorphous human-readable task into a strict mathematic form. That allows to extract chemical data and insights from articles and to find new semantic relations. We propose a universal engine for processing chemical and biological texts. We successfully tested it on various use-cases and applied to a case of searching a therapeutic agent for a COVID-19 disease by analyzing PubMed archive.


Sign in / Sign up

Export Citation Format

Share Document