scholarly journals Understanding covariate shift in model performance

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 597 ◽  
Author(s):  
Georgia McGaughey ◽  
W. Patrick Walters ◽  
Brian Goldman

Three (3) different methods (logistic regression, covariate shift and k-NN) were applied to five (5) internal datasets and one (1) external, publically available dataset where covariate shift existed. In all cases, k-NN’s performance was inferior to either logistic regression or covariate shift. Surprisingly, there was no obvious advantage for using covariate shift to reweight the training data in the examined datasets.

F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 597
Author(s):  
Georgia McGaughey ◽  
W. Patrick Walters ◽  
Brian Goldman

Three (3) different methods (logistic regression, covariate shift and k-NN) were applied to five (5) internal datasets and one (1) external, publically available dataset where covariate shift existed. In all cases, k-NN’s performance was inferior to either logistic regression or covariate shift. Surprisingly, there was no obvious advantage for using covariate shift to reweight the training data in the examined datasets.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 597 ◽  
Author(s):  
Georgia McGaughey ◽  
W. Patrick Walters ◽  
Brian Goldman

Three (3) different methods (logistic regression, covariate shift and k-NN) were applied to five (5) internal datasets and one (1) external, publically available dataset where covariate shift existed. In all cases, k-NN’s performance was inferior to either logistic regression or covariate shift. Surprisingly, there was no obvious advantage for using covariate shift to reweight the training data in the examined datasets.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Ashwath Radhachandran ◽  
Anurag Garikipati ◽  
Nicole S. Zelin ◽  
Emily Pellegrini ◽  
Sina Ghandian ◽  
...  

Abstract Background Acute heart failure (AHF) is associated with significant morbidity and mortality. Effective patient risk stratification is essential to guiding hospitalization decisions and the clinical management of AHF. Clinical decision support systems can be used to improve predictions of mortality made in emergency care settings for the purpose of AHF risk stratification. In this study, several models for the prediction of seven-day mortality among AHF patients were developed by applying machine learning techniques to retrospective patient data from 236,275 total emergency department (ED) encounters, 1881 of which were considered positive for AHF and were used for model training and testing. The models used varying subsets of age, sex, vital signs, and laboratory values. Model performance was compared to the Emergency Heart Failure Mortality Risk Grade (EHMRG) model, a commonly used system for prediction of seven-day mortality in the ED with similar (or, in some cases, more extensive) inputs. Model performance was assessed in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. Results When trained and tested on a large academic dataset, the best-performing model and EHMRG demonstrated test set AUROCs of 0.84 and 0.78, respectively, for prediction of seven-day mortality. Given only measurements of respiratory rate, temperature, mean arterial pressure, and FiO2, one model produced a test set AUROC of 0.83. Neither a logistic regression comparator nor a simple decision tree outperformed EHMRG. Conclusions A model using only the measurements of four clinical variables outperforms EHMRG in the prediction of seven-day mortality in AHF. With these inputs, the model could not be replaced by logistic regression or reduced to a simple decision tree without significant performance loss. In ED settings, this minimal-input risk stratification tool may assist clinicians in making critical decisions about patient disposition by providing early and accurate insights into individual patient’s risk profiles.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S375-S376
Author(s):  
ljubomir Buturovic ◽  
Purvesh Khatri ◽  
Benjamin Tang ◽  
Kevin Lai ◽  
Win Sen Kuan ◽  
...  

Abstract Background While major progress has been made to establish diagnostic tools for the diagnosis of SARS-CoV-2 infection, determining the severity of COVID-19 remains an unmet medical need. With limited hospital resources, gauging severity would allow for some patients to safely recover in home quarantine while ensuring sicker patients get needed care. We discovered a 5 host mRNA-based classifier for the severity of influenza and other acute viral infections and validated the classifier in COVID-19 patients from Greece. Methods We used training data (N=705) from 21 retrospective clinical studies of influenza and other viral illnesses. Five host mRNAs from a preselected panel were applied to train a logistic regression classifier for predicting 30-day mortality in influenza and other viral illnesses. We then applied this classifier, with fixed weights, to an independent cohort of subjects with confirmed COVID-19 from Athens, Greece (N=71) using NanoString nCounter. Finally, we developed a proof-of-concept rapid, isothermal qRT-LAMP assay for the 5-mRNA host signature using the QuantStudio 6 qPCR platform. Results In 71 patients with COVID-19, the 5 mRNA classifier had an AUROC of 0.88 (95% CI 0.80-0.97) for identifying patients with severe respiratory failure and/or 30-day mortality (Figure 1). Applying a preset cutoff based on training data, the 5-mRNA classifier had 100% sensitivity and 46% specificity for identifying mortality, and 88% sensitivity and 68% specificity for identifying severe respiratory failure. Finally, our proof-of-concept qRT-LAMP assay showed high correlation with the reference NanoString 5-mRNA classifier (r=0.95). Figure 1. Validation of the 5-mRNA classifier in the COVID-19 cohort. (A) Expression of the 5 genes used in the logistic regression model in patients with (red) and without (blue) mortality. (B) The 5-mRNA classifier accurately distinguishes non-severe and severe patients with COVID-19 as well as those at risk of death. Conclusion Our 5-mRNA classifier demonstrated very high accuracy for the prediction of COVID-19 severity and could assist in the rapid, point-of-impact assessment of patients with confirmed COVID-19 to determine level of care thereby improving patient management and healthcare burden. Disclosures ljubomir Buturovic, PhD, Inflammatix Inc. (Employee, Shareholder) Purvesh Khatri, PhD, Inflammatix Inc. (Shareholder) Oliver Liesenfeld, MD, Inflammatix Inc. (Employee, Shareholder) James Wacker, n/a, Inflammatix Inc. (Employee, Shareholder) Uros Midic, PhD, Inflammatix Inc. (Employee, Shareholder) Roland Luethy, PhD, Inflammatix Inc. (Employee, Shareholder) David C. Rawling, PhD, Inflammatix Inc. (Employee, Shareholder) Timothy Sweeney, MD, Inflammatix, Inc. (Employee)


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1688
Author(s):  
Luqman Ali ◽  
Fady Alnajjar ◽  
Hamad Al Jassmi ◽  
Munkhjargal Gochoo ◽  
Wasif Khan ◽  
...  

This paper proposes a customized convolutional neural network for crack detection in concrete structures. The proposed method is compared to four existing deep learning methods based on training data size, data heterogeneity, network complexity, and the number of epochs. The performance of the proposed convolutional neural network (CNN) model is evaluated and compared to pretrained networks, i.e., the VGG-16, VGG-19, ResNet-50, and Inception V3 models, on eight datasets of different sizes, created from two public datasets. For each model, the evaluation considered computational time, crack localization results, and classification measures, e.g., accuracy, precision, recall, and F1-score. Experimental results demonstrated that training data size and heterogeneity among data samples significantly affect model performance. All models demonstrated promising performance on a limited number of diverse training data; however, increasing the training data size and reducing diversity reduced generalization performance, and led to overfitting. The proposed customized CNN and VGG-16 models outperformed the other methods in terms of classification, localization, and computational time on a small amount of data, and the results indicate that these two models demonstrate superior crack detection and localization for concrete structures.


2017 ◽  
Vol 26 (01) ◽  
pp. 212-213

Agarwal V, Podchiyska T, Banda JM, Goel V, Leung TI, Minty EP, Sweeney TE, Gyang E, Shah NH. Learning statistical models of phenotypes using noisy labeled training data. J Am Med Inform Assoc 2016;23(6):1166-73 https://academic.oup.com/jamia/article-lookup/doi/10.1093/jamia/ocw028 Harmanci A, Gerstein M. Quantification of private information leakage from phenotype-genotype data: linking attacks. Nat Methods 2016;13(3):251-6 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4834871/ Pfiffner PB, Pinyol I, Natter MD, Mandl KD. C3-PRO: Connecting ResearchKit to the Health System Using i2b2 and FHIR. PloS One 2016;11(3):e0152722 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4816293/ Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, ‘t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/ Springer DB, Tarassenko L, Clifford GD. Logistic regression-HSMM-based heart sound segmentation. IEEE Trans Biomed Eng 2016 Apr;63(4):822-32


2017 ◽  
Vol 3 ◽  
pp. e137 ◽  
Author(s):  
Mona Alshahrani ◽  
Othman Soufan ◽  
Arturo Magana-Mora ◽  
Vladimir B. Bajic

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.


Author(s):  
D. Gritzner ◽  
J. Ostermann

Abstract. Modern machine learning, especially deep learning, which is used in a variety of applications, requires a lot of labelled data for model training. Having an insufficient amount of training examples leads to models which do not generalize well to new input instances. This is a particular significant problem for tasks involving aerial images: often training data is only available for a limited geographical area and a narrow time window, thus leading to models which perform poorly in different regions, at different times of day, or during different seasons. Domain adaptation can mitigate this issue by using labelled source domain training examples and unlabeled target domain images to train a model which performs well on both domains. Modern adversarial domain adaptation approaches use unpaired data. We propose using pairs of semantically similar images, i.e., whose segmentations are accurate predictions of each other, for improved model performance. In this paper we show that, as an upper limit based on ground truth, using semantically paired aerial images during training almost always increases model performance with an average improvement of 4.2% accuracy and .036 mean intersection-over-union (mIoU). Using a practical estimate of semantic similarity, we still achieve improvements in more than half of all cases, with average improvements of 2.5% accuracy and .017 mIoU in those cases.


2021 ◽  
Vol 27 (4) ◽  
pp. 391-399
Author(s):  
Weihong Yuan ◽  
Charles B. Stevenson ◽  
Mekibib Altaye ◽  
Blaise V. Jones ◽  
James Leach ◽  
...  

OBJECTIVE The aim of this study was to investigate diffusion tensor imaging (DTI), an objective and noninvasive neuroimaging technique, for its potential as an imaging biomarker to predict the need and timing of CSF diversion surgery in patients after prenatal myelomeningocele (MMC) repair. METHODS This was a retrospective analysis of data based on 35 pediatric patients after prenatal MMC repair (gestational age at birth 32.68 ± 3.42 weeks, range 24–38 weeks; 15 females and 20 males). A logistic regression analysis was used to classify patients to determine the need for CSF diversion surgery. The model performance was compared between using the frontooccipital horn ratio (FOHR) alone and using the FOHR combined with DTI values (the genu of the corpus callosum [gCC] and the posterior limb of the internal capsule [PLIC]). For patients who needed to be treated surgically, timing of the procedure was used as the clinical outcome to test the predictive value of DTI acquired prior to surgery based on a linear regression analysis. RESULTS Significantly lower fractional anisotropy (FA) values in the gCC (p = 0.014) and PLIC (p = 0.037) and higher mean diffusivity (MD) values in the gCC (p = 0.013) were found in patients who required CSF diversion surgery compared with those who did not require surgery (all p values adjusted for age). Based on the logistic regression analysis, the FOHR alone showed an accuracy of performance of 0.69 and area under the receiver operating characteristic curve (AUC) of 0.60. The performance of the model was higher when DTI measures were used in the logistic regression model (accuracy = 0.77, AUC = 0.84 for using DTI values in gCC; accuracy = 0.75, AUC = 0.84 for using DTI values in PLIC). Combining the DTI values of the gCC or PLIC and FOHR did not improve the model performance when compared with using the DTI values alone. In patients who needed CSF diversion surgery, significant correlation was found between DTI values in the gCC and the time interval between imaging and surgery (FA: ρ = 0.625, p = 0.022; MD: ρ = −0.6830, p = 0.010; both adjusted for age and FOHR). CONCLUSIONS The authors’ data demonstrated that DTI could potentially serve as an objective biomarker differentiating patients after prenatal MMC repair regarding those who may require surgery for MMC-associated hydrocephalus. The predictive value for the need and timing of CSF diversion surgery is highly clinically relevant for improving and optimizing decision-making for the treatment of hydrocephalus in this patient population.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e16265-e16265
Author(s):  
Gulfem Guler ◽  
Anna Bergamaschi ◽  
David Haan ◽  
Michael Kesling ◽  
Yuhong Ning ◽  
...  

e16265 Background: Pancreatic cancer (PaCa) is the third leading cause of cancer death in the United States despite its low incidence rate, owing to a 5-year survival rate of 10%. It is often asymptomatic in early stage, resulting in the majority of diagnoses occurring when cancer has already metastasized to distant organs. Late diagnosis deprives patients of potentially curative treatments such as surgery and impacts survival rates. Diabetes can be an early symptom of PaCa. Indeed, 25% of PaCa patients had a preceding diabetes diagnosis. Among all people with new onset diabetes (NOD), 0.85% will be diagnosed with PaCa within 3 years, which represents 6-8 fold increased risk for PaCa compared to the general population. Surveillance of the NOD population for PaCa presents an opportunity to shift PaCa diagnosis to earlier stage by finding it sooner. Methods: Whole blood was obtained from a cohort of 117 PaCa patients as well as 800 non-cancer controls with and without NOD. Plasma was processed to isolate cfDNA and 5hmC and low pass whole genome libraries were generated and sequenced. The EpiDetect assay combines 5hmC and whole genome sequencing data and were generated using Bluestar Genomics’s technology platform. Results: To investigate whether PaCa can be detected in plasma, we interrogated plasma-derived cfDNA epigenomic and genomic signal from PaCa patients and non-cancer controls. We first trained stacked ensemble models on PaCa and non-cancer samples utilizing 5hmC, fragmentation and CNV-based biomarkers from cfDNA. These models performed stably with a median of 72.8% sensitivity and 90.1% specificity measured across 25 outer fold iterations using the training data set, which was composed of 50% early stage (Stages I & II) disease. The final binomial ensemble model was trained using all of the training data, yielding an area under the receiver operating characteristic curve (auROC) of 0.9, with 75% sensitivity and 89% specificity. This model was then tested on an independent validation data set from 33 PaCa patients (24 with diabetes, 15 of which was NOD) and 202 non-cancer control patients (76 with diabetes, 51 of which was NOD) and yielded a classification performance auROC of 0.9 with 67% sensitivity at 92% specificity. Lastly, model performance in the subset of patient cohort with NOD only had an auROC of 0.87 with 60% sensitivity at 88% specificity. Conclusions: Our results indicate that 5hmC profiles along with CNV and fragmentation patterns from cfDNA can be used to detect PaCa in plasma-derived cfDNA. Overall, model performance was stable and consistent between the training and independent validation datasets. A larger clinical study is under development to investigate the utility of the model described in this pilot study in identifying occult PaCa within the NOD population, with the aim of shifting diagnosis to early stage and potentially improving patient outcomes.


Sign in / Sign up

Export Citation Format

Share Document