Training machine learning models on patient level data segregation is crucial in practical clinical applications

Mapping Intimacies ◽

10.1101/2020.04.23.20076406 ◽

2020 ◽

Author(s):

Mustafa Umit Oner ◽

Yi-Chih Cheng ◽

Hwee Kuan Lee ◽

Wing-Kin Sung

Keyword(s):

Neural Network ◽

Machine Learning ◽

Real World ◽

Deep Neural Network ◽

Strongly Correlated ◽

Training Set ◽

Test Set ◽

The Real ◽

Patient Level ◽

Test Sets

This article discusses the effect of segregation of histopathology images data into three sets; training set for training machine learning model, validation set for model selection and test set for testing model performance. We found that one must be cautious when segregating histological images data (slides) into training, validation and test sets because subtle mishandling of data can introduce data leakage and gives illusively good results on the test set. We performed this study on gene mutation prediction performance by using the deep neural network in the paper of Coudray et al. [1]. By using the provided code and the same set of data, we discovered that data segregation method of the paper suffered from a data leakage problem [2]. The paper pools all the slides from all patients and then segregates them exclusively into training, validation and test sets. In this way, none of the slides is used in more than one set. This seems to be a clean separation of the data. However, the paper did not consider that some slides were strongly correlated. For example, if the tumor of a patient is cut and stained to produce multiple slides, these slides are strongly correlated. If one slide is used for training and another one is used for testing, essentially, the deep neural network can memorize the pattern on the slide in the training set and apply this memory on the slide in the test set. Hence, by memorization, the deep neural network can predict very well on the slide in the test set. This mechanism of prediction is not useful in a practical clinical setting since no two tumors are the same in the real world. In this real setting, we demand the deep neural network to generalize across patients and tumors. Hereafter, we call this way of data segregation slide-level segregation. There is a better way to perform data segregation that is compatible for deployment of deep learning model in practical clinical settings. First, the patients are segregated exclusively into training, validation and test sets. All the slides belonging to the patients in the training set are used solely for training. Similarly, all the slides belonging to the patients in the test set are used for testing only. Segregation of data in this way forces the deep neural network to generalize across patients. We call this way of data segregation patient-level segregation.In slide-level segregation approach analysis, we obtained similar results to that presented in the paper by Coudray et al. [1]: overall performance on the test set was good. However, it was illusory due to data leakage. The model gave very good testing results on the slides that come from a patient who also has slides in the training set. On the other hand, the test result was quite bad on the slides that come from a patient who does not have any slides in the training set. Hereafter, we call the slide in the test set as seen-patient data if the corresponding patient also has some slides in the training set. Otherwise, the slide in the test set is called unseen-patient data if the corresponding patient does not have slides in the training set. Furthermore, we analyzed performance of the model on the data segregated by the patient-level segregation approach. Note that, in this approach, all patients in the test set mimics the real world clinical workflow. We observed a significant drop in the performance of the model on the test set of patient-level segregation approach compared to the performance on the test set of slide-level segregation approach. Moreover, the performance of the model on the test set of patient-level segregation approach was very similar to the performance on the unseen-patients data in the test set of slide-level segregation approach. Hence, we conclude that patient-level segregation approach is crucial and appropriate to simulate real world scenario, where each patient in the test set can be thought as a patient walking into clinic tomorrow.

Neural Network Model for Assessing the Physical and Mechanical Properties of a Metal Material Based on Deep Learning

Journal of Digital Science ◽

10.33847/2686-8296.2.1_2 ◽

2020 ◽

pp. 18-28

Author(s):

Andrei Kliuev ◽

Roman Klestov ◽

Valerii Stolbov

Keyword(s):

Neural Network ◽

Mechanical Properties ◽

Deep Neural Network ◽

Physical And Mechanical Properties ◽

Training Set ◽

Test Set ◽

Algorithmic Stability ◽

Test Sets ◽

Trained Network ◽

Basic Test

The paper investigates the algorithmic stability of learning a deep neural network in problems of recognition of the materials microstructure. It is shown that at 8% of quantitative deviation in the basic test set the algorithm trained network loses stability. This means that with such a quantitative or qualitative deviation in the training or test sets, the results obtained with such trained network can hardly be trusted. Although the results of this study are applicable to the particular case, i.e. problems of recognition of the microstructure using ResNet-152, the authors propose a cheaper method for studying stability based on the analysis of the test, rather than the training set.

Limited Scalability of Single Deep Neural Network for Surgical Instrument Segmentation in Different Surgical Environments

10.21203/rs.3.rs-888076/v1 ◽

2021 ◽

Author(s):

Daichi Kitaguchi ◽

Toru Fujino ◽

Nobuyoshi Takeshita ◽

Hiro Hasegawa ◽

Kensaku Mori ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Deep Neural Network ◽

Recording System ◽

Surgical Instrument ◽

Training Set ◽

Device Development ◽

Registration Number ◽

Surgical Device ◽

Test Sets

Abstract Clarifying the scalability of deep-learning-based surgical instrument segmentation networks in diverse surgical environments is important in recognizing the challenges of overfitting in surgical device development. This study comprehensively evaluated deep neural network scalability for surgical instrument segmentation, using 5238 images randomly extracted from 128 intraoperative videos. The video dataset contained 112 laparoscopic colorectal resection, 5 laparoscopic distal gastrectomy, 5 laparoscopic cholecystectomy, and 6 laparoscopic partial hepatectomy cases. Deep-learning-based surgical instrument segmentation was performed for test sets with 1) the same conditions as the training set; 2) the same recognition target surgical instrument and surgery type but different laparoscopic recording systems; 3) the same laparoscopic recording system and surgery type but slightly different recognition target laparoscopic surgical forceps; 4) the same laparoscopic recording system and recognition target surgical instrument but different surgery types. The mean average precision and mean intersection over union for test sets 1, 2, 3, and 4 were 0.941 and 0.887, 0.866 and 0.671, 0.772 and 0.676, and 0.588 and 0.395, respectively. Therefore, the recognition accuracy decreased even under slightly different conditions. To enhance the generalization of deep neural networks in surgery, constructing a training set that considers diverse surgical environments under real-world conditions is crucial. Trial Registration Number: 2020–315, date of registration: October 5, 2020

Identification Method for Series Arc Faults Based on Wavelet Transform and Deep Neural Network

Energies ◽

10.3390/en13010142 ◽

2019 ◽

Vol 13 (1) ◽

pp. 142 ◽

Cited By ~ 2

Author(s):

Qiongfang Yu ◽

Yaqian Hu ◽

Yi Yang

Keyword(s):

Neural Network ◽

Wavelet Transform ◽

Power Supply ◽

Distribution Systems ◽

Deep Neural Network ◽

Experimental Result ◽

Discrete Wavelet ◽

Training Set ◽

Test Set ◽

Power Distribution System

The power supply quality and power supply safety of a low-voltage residential power distribution system is seriously affected by the occurrence of series arc faults. It is difficult to detect and extinguish them due to the characteristics of small current, high stochasticity, and strong concealment. In order to improve the overall safety of residential distribution systems, a novel method based on discrete wavelet transform (DWT) and deep neural network (DNN) is proposed to detect series arc faults in this paper. An experimental bed is built to obtain current signals under two states, normal and arcing. The collected signals are discomposed in different scales applying the DWT. The wavelet coefficient sequences are used for forming training set and test set. The deep neural network trained by training set under 4 different loads adaptively learn the feature of arc faults. The accuracy of arc faults recognition is sent through feeding test set into the model, about 97.75%. The experimental result shows that this method has good accuracy and generality under different types of loading.

Machine learning and deep neural network — Artificial intelligence core for lab and real-world test and validation for ADAS and autonomous vehicles: AI for efficient and quality test and validation

2017 Intelligent Systems Conference (IntelliSys) ◽

10.1109/intellisys.2017.8324372 ◽

2017 ◽

Cited By ~ 6

Author(s):

Harsha Jakkanahalli Vishnukumar ◽

Bjorn Butting ◽

Christian Muller ◽

Eric Sax

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Machine Learning ◽

Real World ◽

Autonomous Vehicles ◽

Deep Neural Network ◽

Quality Test

Abstract 27: Machine Learning Models for Outcome Prediction of Out-Of-Hospital Cardiac Arrest of Presumed Cardiac Cause Using the All-Japan Utstein Registry

Circulation ◽

10.1161/circ.140.suppl_2.27 ◽

2019 ◽

Vol 140 (Suppl_2) ◽

Author(s):

Tomohisa Seki ◽

Tomoyoshi Tamura ◽

Kazuhiko Ohe ◽

Masaru Suzuki

Keyword(s):

Neural Network ◽

Machine Learning ◽

Logistic Regression ◽

Cardiac Arrest ◽

Deep Neural Network ◽

Outcome Prediction ◽

Machine Learning Techniques ◽

Test Set ◽

Learning Techniques ◽

Hospital Cardiac Arrest

Background: Outcome prediction for patients with out-of-hospital cardiac arrest (OHCA) using prehospital information has been one of the major challenges in resuscitation medicine. Recently, machine learning techniques have been shown to be highly effective in predicting outcomes using clinical registries. In this study, we aimed to establish a prediction model for outcomes of OHCA of presumed cardiac cause using machine learning techniques. Methods: We analyzed data from the All-Japan Utstein Registry of the Fire and Disaster Management Agency between 2005 and 2016. Of 1,423,338 cases, data of OHCA patients aged ≥18 years with presumed cardiac etiology were retrieved and divided into two groups: training set, n = 584,748 (between 2005 and 2013) and test set, n = 223,314 (between 2014 and 2016). The endpoints were neurologic outcome at 1-month and survival at 1-month. Of 47 variables evaluated during the prehospital course, 19 variables (e.g.,sex, age, ECG waveform, and practice of bystander CPR) were used for outcome prediction. Performances of logistic regression, random forests, and deep neural network were examined in this study. Results: For prediction of neurologic outcomes (cerebral performance category 1 or 2) using the test set, the generated models showed area under the receiver operating characteristic curve (AUROC) values of 0.942 (95% confidence interval [CI] 0.941-0.943), 0.947 (95% CI 0.946-0.948), and 0.948 (95% CI 0.948-0.950) in logistic regression, random forest, and deep neural network, respectively. For survival prediction, the generated models showed AUROC values of 0.901 (95% CI 0.900-0.902), 0.913 (95% CI 0.912-0.914), and 0.912 (95% CI 0.911-0.913) in logistic regression, random forest, and deep neural network, respectively. Conclusions: Machine learning techniques using prehospital variables showed favorable prediction capability for 1-month neurologic outcome and survival in OHCA of presumed cardiac cause.

Using Deep Neural Network to Diagnose Thyroid Nodules on Ultrasound in Patients With Hashimoto’s Thyroiditis

Frontiers in Oncology ◽

10.3389/fonc.2021.614172 ◽

2021 ◽

Vol 11 ◽

Author(s):

Yiqing Hou ◽

Chao Chen ◽

Lu Zhang ◽

Wei Zhou ◽

Qinyang Lu ◽

...

Keyword(s):

Neural Network ◽

Hashimoto’S Thyroiditis ◽

High Performance ◽

Deep Neural Network ◽

Thyroid Nodules ◽

Hashimoto's Thyroiditis ◽

Training Set ◽

Test Set ◽

The One ◽

Benign Nodules

ObjectiveThe aim of this study is to develop a model using Deep Neural Network (DNN) to diagnose thyroid nodules in patients with Hashimoto’s Thyroiditis.MethodsIn this retrospective study, we included 2,932 patients with thyroid nodules who underwent thyroid ultrasonogram in our hospital from January 2017 to August 2019. 80% of them were included as training set and 20% as test set. Nodules suspected for malignancy underwent FNA or surgery for pathological results. Two DNN models were trained to diagnose thyroid nodules, and we chose the one with better performance. The features of nodules as well as parenchyma around nodules will be learned by the model to achieve better performance under diffused parenchyma. 10-fold cross-validation and an independent test set were used to evaluate the performance of the algorithm. The performance of the model was compared with that of the three groups of radiologists with clinical experience of <5 years, 5–10 years, >10 years respectively.ResultsIn total, 9,127 images were collected from 2,932 patients with 7,301 images for the training set and 1,806 for the test set. 56% of the patients enrolled had Hashimoto’s Thyroiditis. The model achieved an AUC of 0.924 for distinguishing malignant and benign nodules in the test set. It showed similar performance under diffused thyroid parenchyma and normal parenchyma with sensitivity of 0.881 versus 0.871 (p = 0.938) and specificity of 0.846 versus 0.822 (p = 0.178). In patients with HT, the model achieved an AUC of 0.924 to differentiate malignant and benign nodules which was significantly higher than that of the three groups of radiologists (AUC = 0.824, 0.857, 0.863 respectively, p < 0.05).ConclusionThe model showed high performance in diagnosing thyroid nodules under both normal and diffused parenchyma. In patients with Hashimoto’s Thyroiditis, the model showed a better performance compared to radiologists with various years of experience.

Feature-Weighted Sampling for Proper Evaluation of Classification Models

Applied Sciences ◽

10.3390/app11052039 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2039

Author(s):

Hyunseok Shin ◽

Sejong Oh

Keyword(s):

Random Sampling ◽

Sampling Method ◽

Classification Model ◽

Training Set ◽

Test Set ◽

Feature Importance ◽

Proper Training ◽

Machine Learning Applications ◽

Test Sets ◽

The Given

In machine learning applications, classification schemes have been widely used for prediction tasks. Typically, to develop a prediction model, the given dataset is divided into training and test sets; the training set is used to build the model and the test set is used to evaluate the model. Furthermore, random sampling is traditionally used to divide datasets. The problem, however, is that the performance of the model is evaluated differently depending on how we divide the training and test sets. Therefore, in this study, we proposed an improved sampling method for the accurate evaluation of a classification model. We first generated numerous candidate cases of train/test sets using the R-value-based sampling method. We evaluated the similarity of distributions of the candidate cases with the whole dataset, and the case with the smallest distribution–difference was selected as the final train/test set. Histograms and feature importance were used to evaluate the similarity of distributions. The proposed method produces more proper training and test sets than previous sampling methods, including random and non-random sampling.

Burst Pressure Prediction of API 5L X-Grade Dented Pipelines Using Deep Neural Network

Journal of Marine Science and Engineering ◽

10.3390/jmse8100766 ◽

2020 ◽

Vol 8 (10) ◽

pp. 766

Author(s):

Dohan Oh ◽

Julia Race ◽

Selda Oterkus ◽

Bonguk Koo

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Network Model ◽

Neural Network Model ◽

Deep Neural Network ◽

Machine Learning Techniques ◽

Burst Pressure ◽

Comparison Results ◽

Artificial Neural

Mechanical damage is recognized as a problem that reduces the performance of oil and gas pipelines and has been the subject of continuous research. The artificial neural network in the spotlight recently is expected to be another solution to solve the problems relating to the pipelines. The deep neural network, which is on the basis of artificial neural network algorithm and is a method amongst various machine learning methods, is applied in this study. The applicability of machine learning techniques such as deep neural network for the prediction of burst pressure has been investigated for dented API 5L X-grade pipelines. To this end, supervised learning is employed, and the deep neural network model has four layers with three hidden layers, and the neural network uses the fully connected layer. The burst pressure computed by deep neural network model has been compared with the results of finite element analysis based parametric study, and the burst pressure calculated by the experimental results. According to the comparison results, it showed good agreement. Therefore, it is concluded that deep neural networks can be another solution for predicting the burst pressure of API 5L X-grade dented pipelines.

How to train your robot with deep reinforcement learning: lessons we have learned

The International Journal of Robotics Research ◽

10.1177/0278364920987859 ◽

2021 ◽

pp. 027836492098785

Author(s):

Julian Ibarz ◽

Jie Tan ◽

Chelsea Finn ◽

Mrinal Kalakrishnan ◽

Peter Pastor ◽

...

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Case Studies ◽

Real World ◽

Review Article ◽

The Real ◽

Complex Skills ◽

Real World Learning ◽

Level Sensor ◽

Embodied Agent

Deep reinforcement learning (RL) has emerged as a promising approach for autonomously acquiring complex behaviors from low-level sensor observations. Although a large portion of deep RL research has focused on applications in video games and simulated control, which does not connect with the constraints of learning in real environments, deep RL has also demonstrated promise in enabling physical robots to learn complex skills in the real world. At the same time, real-world robotics provides an appealing domain for evaluating such algorithms, as it connects directly to how humans learn: as an embodied agent in the real world. Learning to perceive and move in the real world presents numerous challenges, some of which are easier to address than others, and some of which are often not considered in RL research that focuses only on simulated domains. In this review article, we present a number of case studies involving robotic deep RL. Building off of these case studies, we discuss commonly perceived challenges in deep RL and how they have been addressed in these works. We also provide an overview of other outstanding challenges, many of which are unique to the real-world robotics setting and are not often the focus of mainstream RL research. Our goal is to provide a resource both for roboticists and machine learning researchers who are interested in furthering the progress of deep RL in the real world.

A deep neural network model for packing density predictions and its application in the study of 1.5 million organic molecules

Chemical Science ◽

10.1039/c9sc02677k ◽

2019 ◽

Vol 10 (36) ◽

pp. 8374-8383 ◽

Cited By ~ 1

Author(s):

Mohammad Atif Faiz Afzal ◽

Aditya Sonpal ◽

Mojtaba Haghighatlari ◽

Andrew J. Schultz ◽

Johannes Hachmann

Keyword(s):

Neural Network ◽

Machine Learning ◽

Refractive Index ◽

High Throughput ◽

Neural Network Model ◽

High Throughput Screening ◽

Deep Neural Network ◽

Organic Molecules ◽

High Refractive Index ◽

Computational Pipeline

Computational pipeline for the accelerated discovery of organic materials with high refractive index via high-throughput screening and machine learning.