scholarly journals Developing Deep Learning Models for the Classification of Pediatric Elbow Radiographic Abnormalities of Comparable Performance to Physicians: Strategies for Model Optimization With Small Sized Development Sets

Author(s):  
Mark B. TAN ◽  
Russ Y. CHUA ◽  
Qiao FAN ◽  
Marielle V. FORTIER ◽  
Pearlly P. CHANG

Abstract BackgroundTo compare the performance of an AI model based on strategies designed to overcome small sized development sets to pediatric ER physicians at a classification triage task of pediatric elbow radiographs. Methods1,314 pediatric elbow lateral radiographs (mean age: 8.2 years) were retrospectively retrieved, binomially classified based on their annotation as normal or abnormal (with pathology), and randomly partitioned into a development set (993 images), tuning set (109 images), second tuning set (100 images) and test set (112 images). The AI model was trained on the development set and utilized the EfficientNet B1 compound scaling network architecture and online augmentations. Its performance on the test set was compared to a group of five physicians (inter-rater agreement: fair). Statistical analysis: AUC of AI model - DeLong method. Performance of AI model and physician groups - McNemar test. ResultsAccuracy of the model on the test set - 0.804 (95% CI, 0.718 - 0.873), AUROC - 0.872 (95% CI, 0.831 - 0.947). AI model performance compared to the physician group on the test set - sensitivity 0.790 (95% CI 0.684 to 0.895) vs 0.649 (95% CI 0.525 to 0.773), p value 0.088; specificity 0.818 (95% CI 0.716 to 0.920) vs 0.873 (95% CI 0.785 to 0.961), p value 0.439.ConclusionsThe AI model for elbow radiograph triage designed with strategies to optimize performance for a small sized development set showed comparable performance to physicians.

Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2381
Author(s):  
Dan Li ◽  
Kaifeng Zhang ◽  
Zhenbo Li ◽  
Yifei Chen

The statistical data of different kinds of behaviors of pigs can reflect their health status. However, the traditional behavior statistics of pigs were obtained and then recorded from the videos through human eyes. In order to reduce labor and time consumption, this paper proposed a pig behavior recognition network with a spatiotemporal convolutional network based on the SlowFast network architecture for behavior classification of five categories. Firstly, a pig behavior recognition video dataset (PBVD-5) was built by cutting short clips from 3-month non-stop shooting videos, which was composed of five categories of pig’s behavior: feeding, lying, motoring, scratching and mounting. Subsequently, a SlowFast network based spatiotemporal convolutional network for the pig’s multi-behavior recognition (PMB-SCN) was proposed. The results of the networks with variant architectures of the PMB-SCN were implemented and the optimal architecture was compared with the state-of-the-art single stream 3D convolutional network in our dataset. Our 3D pig behavior recognition network showed a top-1 accuracy of 97.63% and a views accuracy of 96.35% on the test set of PBVD and a top-1 accuracy of 91.87% and a views accuracy of 84.47% on a new test set collected from a completely different pigsty. The experimental results showed that this network provided remarkable ability of generalization and possibility for the subsequent pig detection and behavior recognition simultaneously.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Bayu Adhi Nugroho

AbstractA common problem found in real-word medical image classification is the inherent imbalance of the positive and negative patterns in the dataset where positive patterns are usually rare. Moreover, in the classification of multiple classes with neural network, a training pattern is treated as a positive pattern in one output node and negative in all the remaining output nodes. In this paper, the weights of a training pattern in the loss function are designed based not only on the number of the training patterns in the class but also on the different nodes where one of them treats this training pattern as positive and the others treat it as negative. We propose a combined approach of weights calculation algorithm for deep network training and the training optimization from the state-of-the-art deep network architecture for thorax diseases classification problem. Experimental results on the Chest X-Ray image dataset demonstrate that this new weighting scheme improves classification performances, also the training optimization from the EfficientNet improves the performance furthermore. We compare the aggregate method with several performances from the previous study of thorax diseases classifications to provide the fair comparisons against the proposed method.


2021 ◽  
Vol 09 (03) ◽  
pp. E388-E394
Author(s):  
Francesco Cocomazzi ◽  
Marco Gentile ◽  
Francesco Perri ◽  
Antonio Merla ◽  
Fabrizio Bossa ◽  
...  

Abstract Background and study aims The Paris classification of superficial colonic lesions has been widely adopted, but a simplified description that subgroups the shape into pedunculated, sessile/flat and depressed lesions has been proposed recently. The aim of this study was to evaluate the accuracy and inter-rater agreement among 13 Western endoscopists for the two classification systems. Methods Seventy video clips of superficial colonic lesions were classified according to the two classifications, and their size estimated. The interobserver agreement for each classification was assessed using both Cohen k and AC1 statistics. Accuracy was taken as the concordance between the standard morphology definition and that made by participants. Sensitivity analyses investigated agreement between trainees (T) and staff members (SM), simple or mixed lesions, distinct lesion phenotypes, and for laterally spreading tumors (LSTs). Results Overall, the interobserver agreement for the Paris classification was substantial (κ = 0.61; AC1 = 0.66), with 79.3 % accuracy. Between SM and T, the values were superimposable. For size estimation, the agreement was 0.48 by the κ-value, and 0.50 by AC1. For single or mixed lesions, κ-values were 0.60 and 0.43, respectively; corresponding AC1 values were 0.68 and 0.57. Evaluating the several different polyp subtypes separately, agreement differed significantly when analyzed by the k-statistics (0.08–0.12) or the AC1 statistics (0.59–0.71). Analyses of LSTs provided a κ-value of 0.50 and an AC1 score of 0.62, with 77.6 % accuracy. The simplified classification outperformed the Paris classification: κ = 0.68, AC1 = 0.82, accuracy = 91.6 %. Conclusions Agreement is often measured with Cohen’s κ, but we documented higher levels of agreement when analyzed with the AC1 statistic. The level of agreement was substantial for the Paris classification, and almost perfect for the simplified system.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ashwath Radhachandran ◽  
Anurag Garikipati ◽  
Nicole S. Zelin ◽  
Emily Pellegrini ◽  
Sina Ghandian ◽  
...  

Abstract Background Acute heart failure (AHF) is associated with significant morbidity and mortality. Effective patient risk stratification is essential to guiding hospitalization decisions and the clinical management of AHF. Clinical decision support systems can be used to improve predictions of mortality made in emergency care settings for the purpose of AHF risk stratification. In this study, several models for the prediction of seven-day mortality among AHF patients were developed by applying machine learning techniques to retrospective patient data from 236,275 total emergency department (ED) encounters, 1881 of which were considered positive for AHF and were used for model training and testing. The models used varying subsets of age, sex, vital signs, and laboratory values. Model performance was compared to the Emergency Heart Failure Mortality Risk Grade (EHMRG) model, a commonly used system for prediction of seven-day mortality in the ED with similar (or, in some cases, more extensive) inputs. Model performance was assessed in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. Results When trained and tested on a large academic dataset, the best-performing model and EHMRG demonstrated test set AUROCs of 0.84 and 0.78, respectively, for prediction of seven-day mortality. Given only measurements of respiratory rate, temperature, mean arterial pressure, and FiO2, one model produced a test set AUROC of 0.83. Neither a logistic regression comparator nor a simple decision tree outperformed EHMRG. Conclusions A model using only the measurements of four clinical variables outperforms EHMRG in the prediction of seven-day mortality in AHF. With these inputs, the model could not be replaced by logistic regression or reduced to a simple decision tree without significant performance loss. In ED settings, this minimal-input risk stratification tool may assist clinicians in making critical decisions about patient disposition by providing early and accurate insights into individual patient’s risk profiles.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tuan D. Pham

AbstractAutomated analysis of physiological time series is utilized for many clinical applications in medicine and life sciences. Long short-term memory (LSTM) is a deep recurrent neural network architecture used for classification of time-series data. Here time–frequency and time–space properties of time series are introduced as a robust tool for LSTM processing of long sequential data in physiology. Based on classification results obtained from two databases of sensor-induced physiological signals, the proposed approach has the potential for (1) achieving very high classification accuracy, (2) saving tremendous time for data learning, and (3) being cost-effective and user-comfortable for clinical trials by reducing multiple wearable sensors for data recording.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Olga Majewska ◽  
Charlotte Collins ◽  
Simon Baker ◽  
Jari Björne ◽  
Susan Windisch Brown ◽  
...  

Abstract Background Recent advances in representation learning have enabled large strides in natural language understanding; However, verbal reasoning remains a challenge for state-of-the-art systems. External sources of structured, expert-curated verb-related knowledge have been shown to boost model performance in different Natural Language Processing (NLP) tasks where accurate handling of verb meaning and behaviour is critical. The costliness and time required for manual lexicon construction has been a major obstacle to porting the benefits of such resources to NLP in specialised domains, such as biomedicine. To address this issue, we combine a neural classification method with expert annotation to create BioVerbNet. This new resource comprises 693 verbs assigned to 22 top-level and 117 fine-grained semantic-syntactic verb classes. We make this resource available complete with semantic roles and VerbNet-style syntactic frames. Results We demonstrate the utility of the new resource in boosting model performance in document- and sentence-level classification in biomedicine. We apply an established retrofitting method to harness the verb class membership knowledge from BioVerbNet and transform a pretrained word embedding space by pulling together verbs belonging to the same semantic-syntactic class. The BioVerbNet knowledge-aware embeddings surpass the non-specialised baseline by a significant margin on both tasks. Conclusion This work introduces the first large, annotated semantic-syntactic classification of biomedical verbs, providing a detailed account of the annotation process, the key differences in verb behaviour between the general and biomedical domain, and the design choices made to accurately capture the meaning and properties of verbs used in biomedical texts. The demonstrated benefits of leveraging BioVerbNet in text classification suggest the resource could help systems better tackle challenging NLP tasks in biomedicine.


Neurosurgery ◽  
2020 ◽  
Vol 67 (Supplement_1) ◽  
Author(s):  
Syed M Adil ◽  
Lefko T Charalambous ◽  
Kelly R Murphy ◽  
Shervin Rahimpour ◽  
Stephen C Harward ◽  
...  

Abstract INTRODUCTION Opioid misuse persists as a public health crisis affecting approximately one in four Americans.1 Spinal cord stimulation (SCS) is a neuromodulation strategy to treat chronic pain, with one goal being decreased opioid consumption. Accurate prognostication about SCS success is key in optimizing surgical decision making for both physicians and patients. Deep learning, using neural network models such as the multilayer perceptron (MLP), enables accurate prediction of non-linear patterns and has widespread applications in healthcare. METHODS The IBM MarketScan® (IBM) database was queried for all patients ≥ 18 years old undergoing SCS from January 2010 to December 2015. Patients were categorized into opioid dose groups as follows: No Use, ≤ 20 morphine milligram equivalents (MME), 20–50 MME, 50–90 MME, and >90 MME. We defined “opiate weaning” as moving into a lower opioid dose group (or remaining in the No Use group) during the 12 months following permanent SCS implantation. After pre-processing, there were 62 predictors spanning demographics, comorbidities, and pain medication history. We compared an MLP with four hidden layers to the LR model with L1 regularization. Model performance was assessed using area under the receiver operating characteristic curve (AUC) with 5-fold nested cross-validation. RESULTS Ultimately, 6,124 patients were included, of which 77% had used opioids for >90 days within the 1-year pre-SCS and 72% had used >5 types of medications during the 90 days prior to SCS. The mean age was 56 ± 13 years old. Collectively, 2,037 (33%) patients experienced opiate weaning. The AUC was 0.74 for the MLP and 0.73 for the LR model. CONCLUSION To our knowledge, we present the first use of deep learning to predict opioid weaning after SCS. Model performance was slightly better than regularized LR. Future efforts should focus on optimization of neural network architecture and hyperparameters to further improve model performance. Models should also be calibrated and externally validated on an independent dataset. Ultimately, such tools may assist both physicians and patients in predicting opioid dose reduction after SCS.


Information ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 248
Author(s):  
Simone Leonardi ◽  
Giuseppe Rizzo ◽  
Maurizio Morisio

In social media, users are spreading misinformation easily and without fact checking. In principle, they do not have a malicious intent, but their sharing leads to a socially dangerous diffusion mechanism. The motivations behind this behavior have been linked to a wide variety of social and personal outcomes, but these users are not easily identified. The existing solutions show how the analysis of linguistic signals in social media posts combined with the exploration of network topologies are effective in this field. These applications have some limitations such as focusing solely on the fake news shared and not understanding the typology of the user spreading them. In this paper, we propose a computational approach to extract features from the social media posts of these users to recognize who is a fake news spreader for a given topic. Thanks to the CoAID dataset, we start the analysis with 300 K users engaged on an online micro-blogging platform; then, we enriched the dataset by extending it to a collection of more than 1 M share actions and their associated posts on the platform. The proposed approach processes a batch of Twitter posts authored by users of the CoAID dataset and turns them into a high-dimensional matrix of features, which are then exploited by a deep neural network architecture based on transformers to perform user classification. We prove the effectiveness of our work by comparing the precision, recall, and f1 score of our model with different configurations and with a baseline classifier. We obtained an f1 score of 0.8076, obtaining an improvement from the state-of-the-art by 4%.


Cancers ◽  
2021 ◽  
Vol 13 (7) ◽  
pp. 1615
Author(s):  
Ines P. Nearchou ◽  
Hideki Ueno ◽  
Yoshiki Kajiwara ◽  
Kate Lillard ◽  
Satsuki Mochizuki ◽  
...  

The categorisation of desmoplastic reaction (DR) present at the colorectal cancer (CRC) invasive front into mature, intermediate or immature type has been previously shown to have high prognostic significance. However, the lack of an objective and reproducible assessment methodology for the assessment of DR has been a major hurdle to its clinical translation. In this study, a deep learning algorithm was trained to automatically classify immature DR on haematoxylin and eosin digitised slides of stage II and III CRC cases (n = 41). When assessing the classifier’s performance on a test set of patient samples (n = 40), a Dice score of 0.87 for the segmentation of myxoid stroma was reported. The classifier was then applied to the full cohort of 528 stage II and III CRC cases, which was then divided into a training (n = 396) and a test set (n = 132). Automatically classed DR was shown to have superior prognostic significance over the manually classed DR in both the training and test cohorts. The findings demonstrated that deep learning algorithms could be applied to assist pathologists in the detection and classification of DR in CRC in an objective, standardised and reproducible manner.


2021 ◽  
Vol 09 (06) ◽  
pp. E955-E964
Author(s):  
Ganggang Mu ◽  
Yijie Zhu ◽  
Zhanyue Niu ◽  
Shigang Ding ◽  
Honggang Yu ◽  
...  

Abstract Background and study aims Endoscopy plays a crucial role in diagnosis of gastritis. Endoscopists have low accuracy in diagnosing atrophic gastritis with white-light endoscopy (WLE). High-risk factors (such as atrophic gastritis [AG]) for carcinogenesis demand early detection. Deep learning (DL)-based gastritis classification with WLE rarely has been reported. We built a system for improving the accuracy of diagnosis of AG with WLE to assist with this common gastritis diagnosis and help lessen endoscopist fatigue. Methods We collected a total of 8141 endoscopic images of common gastritis, other gastritis, and non-gastritis in 4587 cases and built a DL -based system constructed with UNet + + and Resnet-50. A system was developed to sort common gastritis images layer by layer: The first layer included non-gastritis/common gastritis/other gastritis, the second layer contained AG/non-atrophic gastritis, and the third layer included atrophy/intestinal metaplasia and erosion/hemorrhage. The convolutional neural networks were tested with three separate test sets. Results Rates of accuracy for classifying non-atrophic gastritis/AG, atrophy/intestinal metaplasia, and erosion/hemorrhage were 88.78 %, 87.40 %, and 93.67 % in internal test set, 91.23 %, 85.81 %, and 92.70 % in the external test set ,and 95.00 %, 92.86 %, and 94.74 % in the video set, respectively. The hit ratio with the segmentation model was 99.29 %. The accuracy for detection of non-gastritis/common gastritis/other gastritis was 93.6 %. Conclusions The system had decent specificity and accuracy in classification of gastritis lesions. DL has great potential in WLE gastritis classification for assisting with achieving accurate diagnoses after endoscopic procedures.


Sign in / Sign up

Export Citation Format

Share Document