Automated histopathological evaluation of pterygium using artificial intelligence

British Journal of Ophthalmology ◽

10.1136/bjophthalmol-2021-320141 ◽

2022 ◽

pp. bjophthalmol-2021-320141

Author(s):

Jong Hoon Kim ◽

Young Jae Kim ◽

Yeon Jeong Lee ◽

Joon Young Hyon ◽

Sang Beom Han ◽

...

Keyword(s):

Artificial Intelligence ◽

External Validation ◽

Region Of Interest ◽

Ground Truth ◽

True Positive Rate ◽

Automated Method ◽

Expectation Maximisation ◽

Histopathological Evaluation ◽

Histopathological Images ◽

Automated Grading

PurposeThis study aimed to evaluate the efficacy of a new automated method for the evaluation of histopathological images of pterygium using artificial intelligence.MethodsAn in-house software for automated grading of histopathological images was developed. Histopathological images of pterygium (400 images from 40 patients) were analysed using our newly developed software. Manual grading (I–IV), labelled based on an established scoring system, served as the ground truth for training the four-grade classification models. Region of interest segmentation was performed before the classification of grades, which was achieved by the combination of expectation-maximisation and k-nearest neighbours. Fifty-five radiomic features extracted from each image were analysed with feature selection methods to examine the significant features. Five classifiers were evaluated for their ability to predict quantitative grading.ResultsAmong the classifier models applied for automated grading in this study, the bagging tree showed the best performance, with a 75.9% true positive rate (TPR) and 75.8% positive predictive value (PPV) in internal validation. In external validation, the method also demonstrated reproducibility, with an 81.3% TPR and 82.0% PPV for the average of four classification grades.ConclusionsOur newly developed automated method for quantitative grading of histopathological images of pterygium may be a reliable method for quantitative analysis of histopathological evaluation of pterygium.

External validation of a convolutional neural network artificial intelligence tool to predict malignancy in pulmonary nodules

Thorax ◽

10.1136/thoraxjnl-2019-214104 ◽

2020 ◽

Vol 75 (4) ◽

pp. 306-312 ◽

Cited By ~ 12

Author(s):

David R Baldwin ◽

Jennifer Gustafson ◽

Lyndsey Pickup ◽

Carlos Arteta ◽

Petr Novotny ◽

...

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Convolutional Neural Network ◽

External Validation ◽

False Negative ◽

Area Under The Curve ◽

Pulmonary Nodules ◽

Ground Truth ◽

Risk Of Malignancy ◽

Benign Nodules

BackgroundEstimation of the risk of malignancy in pulmonary nodules detected by CT is central in clinical management. The use of artificial intelligence (AI) offers an opportunity to improve risk prediction. Here we compare the performance of an AI algorithm, the lung cancer prediction convolutional neural network (LCP-CNN), with that of the Brock University model, recommended in UK guidelines.MethodsA dataset of incidentally detected pulmonary nodules measuring 5–15 mm was collected retrospectively from three UK hospitals for use in a validation study. Ground truth diagnosis for each nodule was based on histology (required for any cancer), resolution, stability or (for pulmonary lymph nodes only) expert opinion. There were 1397 nodules in 1187 patients, of which 234 nodules in 229 (19.3%) patients were cancer. Model discrimination and performance statistics at predefined score thresholds were compared between the Brock model and the LCP-CNN.ResultsThe area under the curve for LCP-CNN was 89.6% (95% CI 87.6 to 91.5), compared with 86.8% (95% CI 84.3 to 89.1) for the Brock model (p≤0.005). Using the LCP-CNN, we found that 24.5% of nodules scored below the lowest cancer nodule score, compared with 10.9% using the Brock score. Using the predefined thresholds, we found that the LCP-CNN gave one false negative (0.4% of cancers), whereas the Brock model gave six (2.5%), while specificity statistics were similar between the two models.ConclusionThe LCP-CNN score has better discrimination and allows a larger proportion of benign nodules to be identified without missing cancers than the Brock model. This has the potential to substantially reduce the proportion of surveillance CT scans required and thus save significant resources.

Deep learning for automated detection and numbering of permanent teeth on panoramic images

Dentomaxillofacial Radiology ◽

10.1259/dmfr.20210296 ◽

2021 ◽

Author(s):

Mohamed Estai ◽

Marc Tennant ◽

Dieter Gebauer ◽

Andrew Brostek ◽

Janardhan Vignarajan ◽

...

Keyword(s):

Deep Learning ◽

High Performance ◽

Detection System ◽

Ground Truth ◽

Automated Detection ◽

Permanent Teeth ◽

Automated Method ◽

Step Procedure ◽

Panoramic Images ◽

Tooth Number

Objective: This study aimed to evaluate an automated detection system to detect and classify permanent teeth on orthopantomogram (OPG) images using convolutional neural networks (CNNs). Methods: In total, 591 digital OPGs were collected from patients older than 18 years. Three qualified dentists performed individual teeth labelling on images to generate the ground truth annotations. A three-step procedure, relying upon CNNs, was proposed for automated detection and classification of teeth. Firstly, U-Net, a type of CNN, performed preliminary segmentation of tooth regions or detecting regions of interest (ROIs) on panoramic images. Secondly, the Faster R-CNN, an advanced object detection architecture, identified each tooth within the ROI determined by the U-Net. Thirdly, VGG-16 architecture classified each tooth into 32 categories, and a tooth number was assigned. A total of 17,135 teeth cropped from 591 radiographs were used to train and validate the tooth detection and tooth numbering modules. 90% of OPG images were used for training, and the remaining 10% were used for validation. 10-folds cross-validation was performed for measuring the performance. The intersection over union (IoU), F1 score, precision, and recall (i.e. sensitivity) were used as metrics to evaluate the performance of resultant CNNs. Results: The ROI detection module had an IoU of 0.70. The tooth detection module achieved a recall of 0.99 and a precision of 0.99. The tooth numbering module had a recall, precision and F1 score of 0.98. Conclusion: The resultant automated method achieved high performance for automated tooth detection and numbering from OPG images. Deep learning can be helpful in the automatic filing of dental charts in general dentistry and forensic medicine.

An Artificial Intelligence System for the Detection of Bladder Cancer via Cystoscopy: A Multicenter Diagnostic Study

JNCI Journal of the National Cancer Institute ◽

10.1093/jnci/djab179 ◽

2021 ◽

Author(s):

Shaoxu Wu ◽

Xiong Chen ◽

Jiexin Pan ◽

Wen Dong ◽

Xiayao Diao ◽

...

Keyword(s):

Artificial Intelligence ◽

Bladder Cancer ◽

Diagnostic Accuracy ◽

External Validation ◽

Operation Time ◽

Short Latency ◽

Lesion Detection ◽

Diagnostic Study ◽

Internal Validation ◽

Predictive Values

Abstract Background Cystoscopy plays an important role in bladder cancer (BCa) diagnosis and treatment, but its sensitivity needs improvement. Artificial intelligence has shown promise in endoscopy, but few cystoscopic applications have been reported. We report a Cystoscopy Artificial Intelligence Diagnostic System (CAIDS) for BCa diagnosis. Methods In total, 69,204 images from 10,729 consecutive patients from six hospitals were collected and divided into training, internal validation, and external validation sets. The CAIDS was built using a pyramid scene parsing network and transfer learning. A subset (n = 260) of the validation sets was used for a performance comparison between the CAIDS and urologists for complex lesion detection. The diagnostic accuracy, sensitivity, specificity, and positive and negative predictive values and 95% confidence intervals (CIs) were calculated using the Clopper-Pearson method. Results The diagnostic accuracies of the CAIDS were 0.977 (95% CI = 0.974–0.979) in the internal validation set and 0.990 (95% CI = 0.979–0.996), 0.982 (95% CI = 0.974–0.988), 0.978 (95% CI = 0.959–0.989), and 0.991 (95% CI = 0.987–0.994) in different external validation sets. In the CAIDS versus urologists’ comparisons, the CAIDS showed high accuracy and sensitivity (accuracy = 0.939, 95% CI = 0.902–0.964; and sensitivity = 0.954, 95% CI = 0.902–0.983) with a short latency of 12 s, much more accurate and quicker than the expert urologists. Conclusions The CAIDS achieved accurate BCa detection with a short latency. The CAIDS may provide many clinical benefits, from increasing the diagnostic accuracy for BCa, even for commonly misdiagnosed cases such as flat cancerous tissue (carcinoma in situ), to reducing the operation time for cystoscopy.

Watch For Failing Objects: What Inappropriate Compliance Reveals About Shared Mental Models In Autonomous Cars

Proceedings of the Human Factors and Ergonomics Society Annual Meeting ◽

10.1177/1071181321651081 ◽

2021 ◽

Vol 65 (1) ◽

pp. 643-647

Author(s):

Yosef S. Razin ◽

Jack Gale ◽

Jiaojiao Fan ◽

Jaznae’ Smith ◽

Karen M. Feigh

Keyword(s):

Mental Models ◽

Mental Model ◽

False Positive Rate ◽

Ground Truth ◽

True Positive Rate ◽

Shared Mental Models ◽

Shared Mental Model ◽

Autonomous Cars ◽

Positive Rate ◽

Dispositional Factors

This paper evaluates Banks et al.’s Human-AI Shared Mental Model theory by examining how a self-driving vehicle’s hazard assessment facilitates shared mental models. Participants were asked to affirm the vehicle’s assessment of road objects as either hazards or mistakes in real-time as behavioral and subjective measures were collected. The baseline performance of the AI was purposefully low (<50%) to examine how the human’s shared mental model might lead to inappropriate compliance. Results indicated that while the participant true positive rate was high, overall performance was reduced by the large false positive rate, indicating that participants were indeed being influenced by the Al’s faulty assessments, despite full transparency as to the ground-truth. Both performance and compliance were directly affected by frustration, mental, and even physical demands. Dispositional factors such as faith in other people’s cooperativeness and in technology companies were also significant. Thus, our findings strongly supported the theory that shared mental models play a measurable role in performance and compliance, in a complex interplay with trust.

CT Radiomics for the Preoperative Prediction of Ki67 Index in Gastrointestinal Stromal Tumors: A Multi-Center Study

Frontiers in Oncology ◽

10.3389/fonc.2021.689136 ◽

2021 ◽

Vol 11 ◽

Author(s):

Yilei Zhao ◽

Meibao Feng ◽

Minhong Wang ◽

Liang Zhang ◽

Meirong Li ◽

...

Keyword(s):

Dimensionality Reduction ◽

Gastrointestinal Stromal Tumors ◽

External Validation ◽

Region Of Interest ◽

Feature Subset ◽

Ki67 Index ◽

Training Set ◽

Stromal Tumors ◽

Preoperative Prediction ◽

Multi Center Study

PurposeThis study established and verified a radiomics model for the preoperative prediction of the Ki67 index of gastrointestinal stromal tumors (GISTs).Materials and MethodsA total of 344 patients with GISTs from three hospitals were divided into a training set and an external validation set. The tumor region of interest was delineated based on enhanced computed-tomography (CT) images to extract radiomic features. The Boruta algorithm was used for dimensionality reduction of the features, and the random forest algorithm was used to construct the model for radiomics prediction of the Ki67 index. The receiver operating characteristic (ROC) curve was used to evaluate the model’s performance and generalization ability.ResultsAfter dimensionality reduction, a feature subset having 21 radiomics features was generated. The generated radiomics model had an the area under curve (AUC) value of 0.835 (95% confidence interval(CI): 0.761–0.908) in the training set and 0.784 (95% CI: 0.691–0.874) in the external validation cohort.ConclusionThe radiomics model of this study had the potential to predict the Ki67 index of GISTs preoperatively.

Semantic segmentation of gonio-photographs via adaptive ROI localisation and uncertainty estimation

BMJ Open Ophthalmology ◽

10.1136/bmjophth-2021-000898 ◽

2021 ◽

Vol 6 (1) ◽

pp. e000898

Author(s):

Andrea Peroni ◽

Anna Paviotti ◽

Mauro Campigotto ◽

Luis Abegão Pinto ◽

Carlo Alberto Cutolo ◽

...

Keyword(s):

Region Of Interest ◽

Ground Truth ◽

Semantic Segmentation ◽

Uncertainty Estimation ◽

Depth Of Field ◽

Clinical Settings ◽

Proposed Model ◽

Validation Experiment ◽

Segmentation Accuracy ◽

Ground Truth Image

ObjectiveTo develop and test a deep learning (DL) model for semantic segmentation of anatomical layers of the anterior chamber angle (ACA) in digital gonio-photographs.Methods and analysisWe used a pilot dataset of 274 ACA sector images, annotated by expert ophthalmologists to delineate five anatomical layers: iris root, ciliary body band, scleral spur, trabecular meshwork and cornea. Narrow depth-of-field and peripheral vignetting prevented clinicians from annotating part of each image with sufficient confidence, introducing a degree of subjectivity and features correlation in the ground truth. To overcome these limitations, we present a DL model, designed and trained to perform two tasks simultaneously: (1) maximise the segmentation accuracy within the annotated region of each frame and (2) identify a region of interest (ROI) based on local image informativeness. Moreover, our calibrated model provides results interpretability returning pixel-wise classification uncertainty through Monte Carlo dropout.ResultsThe model was trained and validated in a 5-fold cross-validation experiment on ~90% of available data, achieving ~91% average segmentation accuracy within the annotated part of each ground truth image of the hold-out test set. An appropriate ROI was successfully identified in all test frames. The uncertainty estimation module located correctly inaccuracies and errors of segmentation outputs.ConclusionThe proposed model improves the only previously published work on gonio-photographs segmentation and may be a valid support for the automatic processing of these images to evaluate local tissue morphology. Uncertainty estimation is expected to facilitate acceptance of this system in clinical settings.

An Automated Method To Enrich Consumer Health Vocabularies Using GloVe Word Embeddings and An Auxiliary Lexical Resource (Preprint)

10.2196/preprints.26160 ◽

2020 ◽

Author(s):

Mohammed Ibrahim ◽

Susan Gauch ◽

Omar Salman ◽

Mohammed Alqahatani

Keyword(s):

Social Media ◽

Ground Truth ◽

Consumer Health ◽

Word Embeddings ◽

Relative Improvement ◽

Lexical Resource ◽

Automated Method ◽

Medical Terms ◽

Media Platform ◽

Novel Algorithms

BACKGROUND Clear language makes communication easier between any two parties. A layman may have difficulty communicating with a professional due to not understanding the specialized terms common to the domain. In healthcare, it is rare to find a layman knowledgeable in medical jargon which can lead to poor understanding of their condition and/or treatment. To bridge this gap, several professional vocabularies and ontologies have been created to map laymen medical terms to professional medical terms and vice versa. OBJECTIVE Many of the presented vocabularies are built manually or semi-automatically requiring large investments of time and human effort and consequently the slow growth of these vocabularies. In this paper, we present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain. METHODS Our entirely automatic approach uses machine learning, specifically Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social media healthcare platform to extend and enhance consumer health vocabularies (CHV). Our approach further improves the CHV by incorporating synonyms and hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms incorporating WordNet were evaluated using two laymen datasets from the National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC CHV) and MedlinePlus Healthcare Vocabulary. RESULTS The results show that GloVe was able to find new laymen terms with an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed basic GloVe with an average F-score of 61%, a relative improvement of 25%. CONCLUSIONS This paper presents an automatic approach to enrich consumer health vocabularies using the GloVe word embeddings and an auxiliary lexical source, WordNet. Our approach was evaluated used a healthcare text downloaded from MedHelp.org, a healthcare social media platform using two standard laymen vocabularies, OAC CHV, and MedlinePlus. We used the WordNet ontology to expand the healthcare corpus by including synonyms, hyponyms, and hypernyms for each CHV layman term occurrence in the corpus. Given a seed term selected from a concept in the ontology, we measured our algorithms’ ability to automatically extract synonyms for those terms that appeared in the ground truth concept. We found that enhanced GloVe outperformed GloVe with a relative improvement of 25% in the F-score.

Considerations on the region of interest in the ROC space

Statistical Methods in Medical Research ◽

10.1177/09622802211060515 ◽

2021 ◽

pp. 096228022110605

Author(s):

Luigi Lavazza ◽

Sandro Morasca

Keyword(s):

Receiver Operating Characteristic ◽

Operating Characteristic ◽

False Positive Rate ◽

Area Under The Curve ◽

Region Of Interest ◽

True Positive Rate ◽

True Positive ◽

Receiver Operating Characteristic Curves ◽

Positive Rate ◽

Receiver Operating

Receiver Operating Characteristic curves have been widely used to represent the performance of diagnostic tests. The corresponding area under the curve, widely used to evaluate their performance quantitatively, has been criticized in several respects. Several proposals have been introduced to improve area under the curve by taking into account only specific regions of the Receiver Operating Characteristic space, that is, the plane to which Receiver Operating Characteristic curves belong. For instance, a region of interest can be delimited by setting specific thresholds for the true positive rate or the false positive rate. Different ways of setting the borders of the region of interest may result in completely different, even opposing, evaluations. In this paper, we present a method to define a region of interest in a rigorous and objective way, and compute a partial area under the curve that can be used to evaluate the performance of diagnostic tests. The method was originally conceived in the Software Engineering domain to evaluate the performance of methods that estimate the defectiveness of software modules. We compare this method with previous proposals. Our method allows the definition of regions of interest by setting acceptability thresholds on any kind of performance metric, and not just false positive rate and true positive rate: for instance, the region of interest can be determined by imposing that [Formula: see text] (also known as the Matthews Correlation Coefficient) is above a given threshold. We also show how to delimit the region of interest corresponding to acceptable costs, whenever the individual cost of false positives and false negatives is known. Finally, we demonstrate the effectiveness of the method by applying it to the Wisconsin Breast Cancer Data. We provide Python and R packages supporting the presented method.

1267 Utility of Artificial Intelligence in the Cystoscopic Detection of Bladder Cancer: A Systematic Review and Meta-Analysis

British Journal of Surgery ◽

10.1093/bjs/znab258.053 ◽

2021 ◽

Vol 108 (Supplement_6) ◽

Author(s):

S Ganesananthan ◽

B S Simpson ◽

J M Norris

Keyword(s):

Artificial Intelligence ◽

Bladder Cancer ◽

Sensitivity And Specificity ◽

Meta Analysis ◽

External Validation ◽

High Sensitivity ◽

Diagnostic Odds Ratio ◽

Pooled Analysis ◽

Likelihood Ratios ◽

Novel Technology

Abstract Aim Detection of suspected bladder cancer at diagnostic cystoscopy is challenging and is dependent on clinician skill. Artificial Intelligence (AI) algorithms, specifically, machine learning and deep learning, have shown promise in accurate classification of pathological images in various specialties. However, utility of AI for urothelial cancer diagnosis is unknown. Here, we aimed to systematically review the extant literature in this field and quantitively summarise the role of these algorithms in bladder cancer detection. Method The EMBASE, PubMed and CENTRAL databases were searched up to December 22nd 2020 , in accordance with the PRISMA guidelines, for studies that evaluated AI algorithms for cystoscopic diagnosis of bladder cancer. Random-effects meta-analysis was performed to summarise eligible studies. Risk of Bias was assessed using the QUADAS-2 tool. Results Five from 6715 studies met criteria for inclusion. Pooled sensitivity and specificity values were 0.93 (95% CI 0.89–0.95) and 0.93 (95% CI 0.80–0.89) respectively. Pooled positive likelihood and negative likelihood ratios were 14 (95% CI 4.3–44) and 0.08 (95% CI: 0.05–0.11), respectively. Pooled diagnostic odds ratio was 182 (95% CI 61–546). Summary AUC curve value was 0.95 (95% CI 0.93–0.97). No significant publication bias was noted. Conclusions In summary, AI algorithms performed very well in detection of bladder cancer in this pooled analysis, with high sensitivity and specificity values. However, as with other clinical AI usage, further external validation through deployment in real clinical situations is essential to assess true applicability of this novel technology.

External validation of AIBx, an artificial intelligence model for risk stratification, in surgically resected thyroid nodules

Endocrine Abstracts ◽

10.1530/endoabs.73.oc10.5 ◽

2021 ◽

Author(s):

Kristine Swan ◽

Johnson Thomas ◽

Viveque Nielsen ◽

Marie Louise Jespersen ◽

Steen Bonnema

Keyword(s):

Artificial Intelligence ◽

Risk Stratification ◽

Thyroid Nodules ◽

External Validation