Explaining Predictive Model Performance: An Experimental Study of Data Preparation and Model Choice

Big Data ◽  
2021 ◽  
Author(s):  
Hamidreza Ahady Dolatsara ◽  
Ying-Ju Chen ◽  
Robert D. Leonard ◽  
Fadel M. Megahed ◽  
L. Allison Jones-Farmer
2021 ◽  
Author(s):  
Pin Li ◽  
Jeremy M. G. Taylor ◽  
Daniel E. Spratt ◽  
R. Jeffery Karnes ◽  
Matthew J. Schipper

Diabetes ◽  
2021 ◽  
Vol 70 (Supplement 1) ◽  
pp. 68-OR
Author(s):  
DIANA FERRO ◽  
DAVID D. WILLIAMS ◽  
SUSANA R. PATTON ◽  
RYAN MCDONOUGH ◽  
MARK A. CLEMENTS

2020 ◽  
Vol 24 (6) ◽  
pp. 79-90
Author(s):  
Kim Seng Chia ◽  
Fan Wei Hong

Near infrared spectroscopy is a susceptible technique which can be affected by various factors including the surface of samples. According to the Lambertian reflection, the uneven and matte surface of fruits will provide Lambertian light or diffuse reflectance where the light enters the sample tissues and that uniformly reflects out in all orientations. Bunch of researches were carried out using near infrared diffuse reflection mode in non-destructive soluble solids content (SSC) prediction whereas fewer of them studying about the geometrical effects of uneven surface of samples. Thus, this study aims to investigate the parameters that affect the near infrared diffuse reflection signals in non-destructive SSC prediction using intact pineapples. The relationship among the reflectance intensity, measurement positions, and the SSC value was studied. Next, three independent artificial neural networks were separately trained to investigate the geometrical effects on three different measurement positions. Results show that the concave surface of top and bottom parts of pineapples would affect the reflectance of light and consequently deteriorate the predictive model performance. The predictive model of middle part of pineapples achieved the best performance, i.e. root mean square error of prediction (RMSEP) and correlation coefficient of prediction (Rp) of 1.2104 °Brix and 0.7301 respectively.


2021 ◽  
Author(s):  
Yaqian Mao ◽  
Lizhen Xu ◽  
Ting Xue ◽  
Jixing Liang ◽  
Wei Lin ◽  
...  

Objective: To establish a rapid, cost-effective, accurate, and acceptable osteoporosis (OP) screening model for the Chinese male population (age ≥ 40years) based on data mining technology. Materials and methods: A total of 1,834 subjects who did not have OP at the baseline and completed a 3-year follow-up were included in this study. All subjects underwent quantitative ultrasound examinations for calcaneus at the baseline and follow-ups that lasted for 3 years. We utilized the least absolute shrinkage and selection operator (LASSO) regression model to select feature variables. The characteristic variables selected in the LASSO regression were analyzed by multivariable logistic regression (MLR) to construct the predictive model. This predictive model was displayed through a nomogram. We used the receiver operating characteristic (ROC) curve, C-index, calibration curve and clinical decision curve analysis (DCA) to evaluate model performance and the bootstrapping validation to internally validate the model. Results: The area under the ROC (AUC) curve of the risk nomogram was 0.882 (95%CI, 0.858-0.907), exhibiting good predictive ability and performance. The C-index for the risk nomogram was 0.882 in the prediction model, which presented good refinement. In addition, the nomogram calibration curve indicated that the prediction model was consistent. The DCA showed that when the threshold probability was between 1% and 100%, the nomogram had good clinical application value. More importantly, the internally verified C-index of the nomogram was still very high, at 0.870. Conclusions: This novel nomogram can effectively predict the 3-year incidence risk of OP in the male population.


2020 ◽  
Vol 38 (15_suppl) ◽  
pp. e20002-e20002
Author(s):  
Li Zhou ◽  
Rob Steen ◽  
Lynn Lu

e20002 Background: Identifying optimal therapy options can help maximize treatment outcomes. Finding ways to help improve treatment decision is of great value to achieve better patient care. With the availability of robust patient real world data and the application of state of the art Artificial Intelligence and Machine Learning (AIML) technology, new opportunities have emerged for a broad spectrum of research needs from oncology R&D to commercialization. To illustrate the above advancements, this study identified patients diagnosed with CLL who may progress to next line of treatment in the near future (e.g. future 3 months). More importantly, we can identify treatment patterns which are more effective in treating different types of CLL patients. Methods: This study includes multiple steps which have already been analyzed for feasibility: 1. Collect CLL patients. IQVIA's real world data contains ~60,000 active CLL treated patients. ~2,000 patients have progressed line of treatment in 3 month. 2. Define patients into positive and negative cohorts based on those who have/have not advanced to line L2+. 3. Determine patient profiles based on treatment regimens, symptoms, lab tests, doctor visits, hospital visits, and co-morbidity, etc. 4. Select patient and treatment features to fit an AIML predictive model. 5. Test different algorithms to achieve best model results and validate model performance. 6. Score and classify CLL patients into high and low probability based on the predictive model. 7. Match patients based on feature importance and compare regimens between positive and negative cohort. Results: Model accuracy is above 90%. Top clinical features are calculated for each patient. Optimum treatment patterns between high and low probability patients are identified, with controlling patient key features. Conclusions: Conclusions from this study is expected to yield deeper insight into more tailored treatments by patient type. CLL patients started with oral therapy(targeting) have better response than other treatments.


Author(s):  
B. M. Fernandez-Felix ◽  
E. García-Esquinas ◽  
A. Muriel ◽  
A. Royuela ◽  
J. Zamora

Overfitting is a common problem in the development of predictive models. It leads to an optimistic estimation of apparent model performance. Internal validation using bootstrapping techniques allows one to quantify the optimism of a predictive model and provide a more realistic estimate of its performance measures. Our objective is to build an easy-to-use command, bsvalidation, aimed to perform a bootstrap internal validation of a logistic regression model.


2019 ◽  
Vol 7 (2) ◽  
pp. 51
Author(s):  
Ari Hardianto ◽  
Muhammad Yusuf

Epitopes are essential peptides for immune system stimulation, such as governing helper T lymphocyte (HTL) activation via antigen presentation and recognition. Current predictive models for epitope selection mainly rely on the antigen presentation, although HTLs only recognize 50% of the presented peptides. Thus, we developed a HTL epitope predictor which involves the antigen recognition step. The predictor is specific for epitopes presented by Human Leukocyte Allele (HLA)-DRB1*01:01, which is protective against developing multiple sclerosis and association with autoimmune diseases. As the data set, we used binding register of immunogenic and non-immunogenic HTL peptides related to HLA-DRB1*01:01. The binding registers were obtained from consensus results of two current HLA-binder predictors. Amino acid descriptors were extracted from the binding registers and subjected to random forest algorithm. A threshold optimization were applied to overcome data set imbalance class. In addition, descriptors were screened by using a recursive feature elimination to enhance the model performance. The obtained model shows that the hydrophobicity, steric, and electrostatic properties of epitopes, mainly at center of binding registers, are important for the TCR recognition as well as the HTL epitopes predictive model. The model complements current HLA-DRB1*01:01-binder prediction methods to screen immunogenic HTL epitopes.


2021 ◽  
Vol 64 (2) ◽  
pp. 21-25
Author(s):  
Oleg Arnaut ◽  
◽  
Ion Grabovschi ◽  
Serghei Sandru ◽  
Gheorghe Rojnoveanu ◽  
...  

Background: Trauma remains a medical-social problem, still having high lethality rate. Indirect lung injury (ILI) occurs in trauma due to systemic neutrophils activation and proteases release into primarily intact tissues. There are no data in the literature regarding ILI predictive models in trauma. Material and methods: In the experimental study (19 traumatized male rabbits), the proteases, antiproteases and the pulmonary morphological changes, assessed according to the SAMCRS score (Semiquantitative Reflected Qualitative Changes Assessment Scale) were followed. There were used two statistical instruments – correlational analysis and multivariate linear regression. Results: Initially, a correlational analysis between the values of the SAMCRS score and the proteases/ anti proteases was performed. The null hypothesis was rejected (F = 7.017, p = .002). The correlation coefficient of the predicted results and the real values of SAMCRSlungs was .854, the determination coefficient being .626. The final model included the following parameters: constant (B = 9.427; 95% CI 7.341, 11.513; p <.001); α2-macroglobulin0 (B = -4.053; 95% Cl -6.350, -1.757; p = .002); AEAMP0 (B = .002; 95% CI .000, .004; p = .075); AEAMP24 (B = -. 006; 95% CI -.010, -.002; p = .003); AECG2 (B = .081; 95% CI .040, .122; p = .001); AEE0 (B = -. 026; 95% CI -.040, -.011; p = .002). Conclusions: In this research, a predictive model for indirect lung injury in experimental trauma was developed, the predictors being some elements of the proteases/antiproteases system. This, in turn, allows the hypotheses emission regarding the pathophysiology, prophylaxis and treatment of ILI.


Sign in / Sign up

Export Citation Format

Share Document