A highly accurate delta check method using deep learning for detection of sample mix-up in the clinical laboratory

Author(s):  
Rui Zhou ◽  
Yu-fang Liang ◽  
Hua-Li Cheng ◽  
Wei Wang ◽  
Da-wei Huang ◽  
...  

Abstract Objectives Delta check (DC) is widely used for detecting sample mix-up. Owing to the inadequate error detection and high false-positive rate, the implementation of DC in real-world settings is labor-intensive and rarely capable of absolute detection of sample mix-ups. The aim of the study was to develop a highly accurate DC method based on designed deep learning to detect sample mix-up. Methods A total of 22 routine hematology test items were adopted for the study. The hematology test results, collected from two hospital laboratories, were independently divided into training, validation, and test sets. By selecting six mainstream algorithms, the Deep Belief Network (DBN) was able to learn error-free and artificially (intentionally) mixed sample results. The model’s analytical performance was evaluated using training and test sets. The model’s clinical validity was evaluated by comparing it with three well-recognized statistical methods. Results When the accuracy of our model in the training set reached 0.931 at the 22nd epoch, the corresponding accuracy in the validation set was equal to 0.922. The loss values for the training and validation sets showed a similar (change) trend over time. The accuracy in the test set was 0.931 and the area under the receiver operating characteristic curve was 0.977. DBN demonstrated better performance than the three comparator statistical methods. The accuracy of DBN and revised weighted delta check (RwCDI) was 0.931 and 0.909, respectively. DBN performed significantly better than RCV and EDC. Of all test items, the absolute difference of DC yielded higher accuracy than the relative difference for all methods. Conclusions The findings indicate that input of a group of hematology test items provides more comprehensive information for the accurate detection of sample mix-up by machine learning (ML) when compared with a single test item input method. The DC method based on DBN demonstrated highly effective sample mix-up identification performance in real-world clinical settings.

Cancers ◽  
2021 ◽  
Vol 14 (1) ◽  
pp. 12
Author(s):  
Jose M. Castillo T. ◽  
Muhammad Arif ◽  
Martijn P. A. Starmans ◽  
Wiro J. Niessen ◽  
Chris H. Bangma ◽  
...  

The computer-aided analysis of prostate multiparametric MRI (mpMRI) could improve significant-prostate-cancer (PCa) detection. Various deep-learning- and radiomics-based methods for significant-PCa segmentation or classification have been reported in the literature. To be able to assess the generalizability of the performance of these methods, using various external data sets is crucial. While both deep-learning and radiomics approaches have been compared based on the same data set of one center, the comparison of the performances of both approaches on various data sets from different centers and different scanners is lacking. The goal of this study was to compare the performance of a deep-learning model with the performance of a radiomics model for the significant-PCa diagnosis of the cohorts of various patients. We included the data from two consecutive patient cohorts from our own center (n = 371 patients), and two external sets of which one was a publicly available patient cohort (n = 195 patients) and the other contained data from patients from two hospitals (n = 79 patients). Using multiparametric MRI (mpMRI), the radiologist tumor delineations and pathology reports were collected for all patients. During training, one of our patient cohorts (n = 271 patients) was used for both the deep-learning- and radiomics-model development, and the three remaining cohorts (n = 374 patients) were kept as unseen test sets. The performances of the models were assessed in terms of their area under the receiver-operating-characteristic curve (AUC). Whereas the internal cross-validation showed a higher AUC for the deep-learning approach, the radiomics model obtained AUCs of 0.88, 0.91 and 0.65 on the independent test sets compared to AUCs of 0.70, 0.73 and 0.44 for the deep-learning model. Our radiomics model that was based on delineated regions resulted in a more accurate tool for significant-PCa classification in the three unseen test sets when compared to a fully automated deep-learning model.


2021 ◽  
Author(s):  
Peng Zhang ◽  
Fan Lin ◽  
Fei Ma ◽  
Yuting Chen ◽  
Daowen Wang ◽  
...  

SummaryBackgroundWith the increasing demand for atrial fibrillation (AF) screening, clinicians spend a significant amount of time in identifying the AF signals from massive electrocardiogram (ECG) data in long-term dynamic ECG monitoring. In this study, we aim to reduce clinicians’ workload and promote AF screening by using artificial intelligence (AI) to automatically detect AF episodes and identify AF patients in 24 h Holter recording.MethodsWe used a total of 22 979 Holter recordings (24 h) from 22 757 adult patients and established accurate annotations for AF by cardiologists. First, a randomized clinical cohort of 3 000 recordings (1 500 AF and 1 500 non-AF) from 3000 patients recorded between April 2012 and May 2020 was collected and randomly divided into training, validation and test sets (10:1:4). Then, a deep-learning-based AI model was developed to automatically detect AF episode using RR intervals and was tested with the test set. Based on AF episode detection results, AF patients were automatically identified by using a criterion of at least one AF episode of 6 min or longer. Finally, the clinical effectiveness of the model was verified with an independent real-world test set including 19 979 recordings (1 006 AF and 18 973 non-AF) from 19 757 consecutive patients recorded between June 2020 and January 2021.FindingsOur model achieved high performance for AF episode detection in both test sets (sensitivity: 0.992 and 0.972; specificity: 0.997 and 0.997, respectively). It also achieved high performance for AF patient identification in both test sets (sensitivity:0.993 and 0.994; specificity: 0.990 and 0.973, respectively). Moreover, it obtained superior and consistent performance in an external public database.InterpretationOur AI model can automatically identify AF in long-term ECG recording with high accuracy. This cost-effective strategy may promote AF screening by improving diagnostic effectiveness and reducing clinical workload.Research in contextEvidence before this studyWe searched Google Scholar and PubMed for research articles on artificial intelligence-based diagnosis of atrial fibrillation (AF) published in English between Jan 1, 2016 and Aug 1, 2021, using the search terms “deep learning” OR “deep neural network” OR “machine learning” OR “artificial intelligence” AND “atrial fibrillation”. We found that most of the previous deep learning models in AF detection were trained and validated on benchmark datasets (such as the PhysioNet database, the Massachusetts Institute of Technology Beth Israel Hospital AF database or Long-Term AF database), in which there were less than 100 patients or the recordings contained only short ECG segments (30-60s). Our search did not identify any articles that explored deep neural networks for AF detection in large real-world dataset of 24 h Holter recording, nor did we find articles that can automatically identify patients with AF in 24 h Holter recording.Added value of this studyFirst, long-term Holter monitoring is the main method of AF screening, however, most previous studies of automatic AF detection mainly tested on short ECG recordings. This work focused on 24 h Holter recording data and achieved high accuracy in detecting AF episodes. Second, AF episodes detection did not automatically transform to AF patient identification in 24 h Holter recording, since at present, there is no well-recognized criterion for automatically identifying AF patient. Therefore, we established a criterion to identify AF patients by use of at least one AF episode of 6 min or longer, as this condition led to significantly increased risk of thromboembolism. Using this criterion, our method identified AF patients with high accuracy. Finally, and more importantly, our model was trained on a randomized clinical dataset and tested on an independent real-world clinical dataset to show great potential in clinical application. We did not exclude rare or special cases in the real-world dataset so as not to inflate our AF detection performance. To the best of our knowledge, this is the first study to automatically identifies both AF episodes and AF patients in 24 h Holter recording of large real-world clinical dataset.Implications of all the available evidenceOur deep learning model automatically identified AF patient with high accuracy in 24 h Holter recording and was verified in real-world data, therefore, it can be embedded into the Holter analysis system and deployed at the clinical level to assist the decision making of Holter analysis system and clinicians. This approach can help improve the efficiency of AF screening and reduce the cost for AF diagnosis. In addition, our RR-interval-based model achieved comparable or better performance than the raw-ECG-based method, and can be widely applied to medical devices that can collect heartbeat information, including not only the multi-lead and single-lead Holter devices, but also other wearable devices that can reliably measure the heartbeat signals.


2020 ◽  
Author(s):  
Stephen Charles Van Hedger ◽  
Ingrid Johnsrude ◽  
Laura Batterink

Listeners are adept at extracting regularities from the environment, a process known as statistical learning (SL). SL has been generally assumed to be a form of “context-free” learning that occurs independently of prior knowledge, and SL experiments typically involve exposing participants to presumed novel regularities, such as repeating nonsense words. However, recent work has called this assumption into question, demonstrating that learners’ previous language experience can considerably influence SL performance. In the present experiment, we tested whether previous knowledge also shapes SL in a non-linguistic domain, using a paradigm that involves extracting regularities over tone sequences. Participants learned novel tone sequences, which consisted of pitch intervals not typically found in Western music. For one group of participants, the tone sequences used artificial, computerized instrument sounds. For the other group, the same tone sequences used familiar instrument sounds (piano or violin). Knowledge of the statistical regularities was assessed using both trained sounds (measuring specific learning) and sounds that differed in pitch range and/or instrument (measuring transfer learning). In a follow-up experiment, two additional testing sessions were administered to gauge retention of learning (one day and approximately one-week post-training). Compared to artificial instruments, training on sequences played by familiar instruments resulted in reduced correlations among test items, reflecting more idiosyncratic performance. Across all three testing sessions, learning of novel regularities presented with familiar instruments was worse compared to unfamiliar instruments, suggesting that prior exposure to music produced by familiar instruments interfered with new sequence learning. Overall, these results demonstrate that real-world experience influences SL in a non-linguistic domain, supporting the view that SL involves the continuous updating of existing representations, rather than the establishment of entirely novel ones.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Ling-Ping Cen ◽  
Jie Ji ◽  
Jian-Wei Lin ◽  
Si-Tong Ju ◽  
Hong-Jie Lin ◽  
...  

AbstractRetinal fundus diseases can lead to irreversible visual impairment without timely diagnoses and appropriate treatments. Single disease-based deep learning algorithms had been developed for the detection of diabetic retinopathy, age-related macular degeneration, and glaucoma. Here, we developed a deep learning platform (DLP) capable of detecting multiple common referable fundus diseases and conditions (39 classes) by using 249,620 fundus images marked with 275,543 labels from heterogenous sources. Our DLP achieved a frequency-weighted average F1 score of 0.923, sensitivity of 0.978, specificity of 0.996 and area under the receiver operating characteristic curve (AUC) of 0.9984 for multi-label classification in the primary test dataset and reached the average level of retina specialists. External multihospital test, public data test and tele-reading application also showed high efficiency for multiple retinal diseases and conditions detection. These results indicate that our DLP can be applied for retinal fundus disease triage, especially in remote areas around the world.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1127
Author(s):  
Ji Hyung Nam ◽  
Dong Jun Oh ◽  
Sumin Lee ◽  
Hyun Joo Song ◽  
Yun Jeong Lim

Capsule endoscopy (CE) quality control requires an objective scoring system to evaluate the preparation of the small bowel (SB). We propose a deep learning algorithm to calculate SB cleansing scores and verify the algorithm’s performance. A 5-point scoring system based on clarity of mucosal visualization was used to develop the deep learning algorithm (400,000 frames; 280,000 for training and 120,000 for testing). External validation was performed using additional CE cases (n = 50), and average cleansing scores (1.0 to 5.0) calculated using the algorithm were compared to clinical grades (A to C) assigned by clinicians. Test results obtained using 120,000 frames exhibited 93% accuracy. The separate CE case exhibited substantial agreement between the deep learning algorithm scores and clinicians’ assessments (Cohen’s kappa: 0.672). In the external validation, the cleansing score decreased with worsening clinical grade (scores of 3.9, 3.2, and 2.5 for grades A, B, and C, respectively, p < 0.001). Receiver operating characteristic curve analysis revealed that a cleansing score cut-off of 2.95 indicated clinically adequate preparation. This algorithm provides an objective and automated cleansing score for evaluating SB preparation for CE. The results of this study will serve as clinical evidence supporting the practical use of deep learning algorithms for evaluating SB preparation quality.


Animals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1549
Author(s):  
Robert D. Chambers ◽  
Nathanael C. Yoder ◽  
Aletha B. Carson ◽  
Christian Junge ◽  
David E. Allen ◽  
...  

Collar-mounted canine activity monitors can use accelerometer data to estimate dog activity levels, step counts, and distance traveled. With recent advances in machine learning and embedded computing, much more nuanced and accurate behavior classification has become possible, giving these affordable consumer devices the potential to improve the efficiency and effectiveness of pet healthcare. Here, we describe a novel deep learning algorithm that classifies dog behavior at sub-second resolution using commercial pet activity monitors. We built machine learning training databases from more than 5000 videos of more than 2500 dogs and ran the algorithms in production on more than 11 million days of device data. We then surveyed project participants representing 10,550 dogs, which provided 163,110 event responses to validate real-world detection of eating and drinking behavior. The resultant algorithm displayed a sensitivity and specificity for detecting drinking behavior (0.949 and 0.999, respectively) and eating behavior (0.988, 0.983). We also demonstrated detection of licking (0.772, 0.990), petting (0.305, 0.991), rubbing (0.729, 0.996), scratching (0.870, 0.997), and sniffing (0.610, 0.968). We show that the devices’ position on the collar had no measurable impact on performance. In production, users reported a true positive rate of 95.3% for eating (among 1514 users), and of 94.9% for drinking (among 1491 users). The study demonstrates the accurate detection of important health-related canine behaviors using a collar-mounted accelerometer. We trained and validated our algorithms on a large and realistic training dataset, and we assessed and confirmed accuracy in production via user validation.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Albert T. Young ◽  
Kristen Fernandez ◽  
Jacob Pfau ◽  
Rasika Reddy ◽  
Nhat Anh Cao ◽  
...  

AbstractArtificial intelligence models match or exceed dermatologists in melanoma image classification. Less is known about their robustness against real-world variations, and clinicians may incorrectly assume that a model with an acceptable area under the receiver operating characteristic curve or related performance metric is ready for clinical use. Here, we systematically assessed the performance of dermatologist-level convolutional neural networks (CNNs) on real-world non-curated images by applying computational “stress tests”. Our goal was to create a proxy environment in which to comprehensively test the generalizability of off-the-shelf CNNs developed without training or evaluation protocols specific to individual clinics. We found inconsistent predictions on images captured repeatedly in the same setting or subjected to simple transformations (e.g., rotation). Such transformations resulted in false positive or negative predictions for 6.5–22% of skin lesions across test datasets. Our findings indicate that models meeting conventionally reported metrics need further validation with computational stress tests to assess clinic readiness.


Electronics ◽  
2021 ◽  
Vol 10 (10) ◽  
pp. 1161
Author(s):  
Kuo-Hao Fanchiang ◽  
Yen-Chih Huang ◽  
Cheng-Chien Kuo

The safety of electric power networks depends on the health of the transformer. However, once a variety of transformer failure occurs, it will not only reduce the reliability of the power system but also cause major accidents and huge economic losses. Until now, many diagnosis methods have been proposed to monitor the operation of the transformer. Most of these methods cannot be detected and diagnosed online and are prone to noise interference and high maintenance cost that will cause obstacles to the real-time monitoring system of the transformer. This paper presents a full-time online fault monitoring system for cast-resin transformer and proposes an overheating fault diagnosis method based on infrared thermography (IRT) images. First, the normal and fault IRT images of the cast-resin transformer are collected by the proposed thermal camera monitoring system. Next is the model training for the Wasserstein Autoencoder Reconstruction (WAR) model and the Differential Image Classification (DIC) model. The differential image can be acquired by the calculation of pixel-wise absolute difference between real images and regenerated images. Finally, in the test phase, the well-trained WAR and DIC models are connected in series to form a module for fault diagnosis. Compared with the existing deep learning algorithms, the experimental results demonstrate the great advantages of the proposed model, which can obtain the comprehensive performance with lightweight, small storage size, rapid inference time and adequate diagnostic accuracy.


Cancers ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2866
Author(s):  
Fernando Navarro ◽  
Hendrik Dapper ◽  
Rebecca Asadpour ◽  
Carolin Knebel ◽  
Matthew B. Spraker ◽  
...  

Background: In patients with soft-tissue sarcomas, tumor grading constitutes a decisive factor to determine the best treatment decision. Tumor grading is obtained by pathological work-up after focal biopsies. Deep learning (DL)-based imaging analysis may pose an alternative way to characterize STS tissue. In this work, we sought to non-invasively differentiate tumor grading into low-grade (G1) and high-grade (G2/G3) STS using DL techniques based on MR-imaging. Methods: Contrast-enhanced T1-weighted fat-saturated (T1FSGd) MRI sequences and fat-saturated T2-weighted (T2FS) sequences were collected from two independent retrospective cohorts (training: 148 patients, testing: 158 patients). Tumor grading was determined following the French Federation of Cancer Centers Sarcoma Group in pre-therapeutic biopsies. DL models were developed using transfer learning based on the DenseNet 161 architecture. Results: The T1FSGd and T2FS-based DL models achieved area under the receiver operator characteristic curve (AUC) values of 0.75 and 0.76 on the test cohort, respectively. T1FSGd achieved the best F1-score of all models (0.90). The T2FS-based DL model was able to significantly risk-stratify for overall survival. Attention maps revealed relevant features within the tumor volume and in border regions. Conclusions: MRI-based DL models are capable of predicting tumor grading with good reproducibility in external validation.


Neurosurgery ◽  
2020 ◽  
Vol 67 (Supplement_1) ◽  
Author(s):  
Syed M Adil ◽  
Lefko T Charalambous ◽  
Kelly R Murphy ◽  
Shervin Rahimpour ◽  
Stephen C Harward ◽  
...  

Abstract INTRODUCTION Opioid misuse persists as a public health crisis affecting approximately one in four Americans.1 Spinal cord stimulation (SCS) is a neuromodulation strategy to treat chronic pain, with one goal being decreased opioid consumption. Accurate prognostication about SCS success is key in optimizing surgical decision making for both physicians and patients. Deep learning, using neural network models such as the multilayer perceptron (MLP), enables accurate prediction of non-linear patterns and has widespread applications in healthcare. METHODS The IBM MarketScan® (IBM) database was queried for all patients ≥ 18 years old undergoing SCS from January 2010 to December 2015. Patients were categorized into opioid dose groups as follows: No Use, ≤ 20 morphine milligram equivalents (MME), 20–50 MME, 50–90 MME, and &gt;90 MME. We defined “opiate weaning” as moving into a lower opioid dose group (or remaining in the No Use group) during the 12 months following permanent SCS implantation. After pre-processing, there were 62 predictors spanning demographics, comorbidities, and pain medication history. We compared an MLP with four hidden layers to the LR model with L1 regularization. Model performance was assessed using area under the receiver operating characteristic curve (AUC) with 5-fold nested cross-validation. RESULTS Ultimately, 6,124 patients were included, of which 77% had used opioids for &gt;90 days within the 1-year pre-SCS and 72% had used &gt;5 types of medications during the 90 days prior to SCS. The mean age was 56 ± 13 years old. Collectively, 2,037 (33%) patients experienced opiate weaning. The AUC was 0.74 for the MLP and 0.73 for the LR model. CONCLUSION To our knowledge, we present the first use of deep learning to predict opioid weaning after SCS. Model performance was slightly better than regularized LR. Future efforts should focus on optimization of neural network architecture and hyperparameters to further improve model performance. Models should also be calibrated and externally validated on an independent dataset. Ultimately, such tools may assist both physicians and patients in predicting opioid dose reduction after SCS.


Sign in / Sign up

Export Citation Format

Share Document