Developing machine-learning regression model with Logical Analysis of Data (LAD)

2020 ◽  
pp. 106947
Author(s):  
Ramy M. Khalifa ◽  
Soumaya Yacout ◽  
Samuel Bassetto
Author(s):  
Himani Chauhan ◽  
◽  
Garima Saxena ◽  
Arpit Tripathi ◽  
◽  
...  

Author(s):  
Р.И. Кузьмич ◽  
А.А. Ступина ◽  
С.Н. Ежеманская ◽  
А.П. Шугалей

Предлагаются две оптимизационные модели для построения информативных закономерностей. Приводится эмпирическое подтверждение целесообразности использования критерия бустинга в качестве целевой функции оптимизационной модели для получения информативных закономерностей. Информативность, закономерность, критерий бустинга, оптимизационная модель Comparison of two optimization models for constructing patterns in the method of logical analysis of data Two optimization models for constructing informative patterns are proposed. An empirical confirmation of the expediency of using the boosting criterion as an objective function of the optimization model for obtaining informative patterns is given.


2020 ◽  
Vol 237 (12) ◽  
pp. 1430-1437
Author(s):  
Achim Langenbucher ◽  
Nóra Szentmáry ◽  
Jascha Wendelstein ◽  
Peter Hoffmann

Abstract Background and Purpose In the last decade, artificial intelligence and machine learning algorithms have been more and more established for the screening and detection of diseases and pathologies, as well as for describing interactions between measures where classical methods are too complex or fail. The purpose of this paper is to model the measured postoperative position of an intraocular lens implant after cataract surgery, based on preoperatively assessed biometric effect sizes using techniques of machine learning. Patients and Methods In this study, we enrolled 249 eyes of patients who underwent elective cataract surgery at Augenklinik Castrop-Rauxel. Eyes were measured preoperatively with the IOLMaster 700 (Carl Zeiss Meditec), as well as preoperatively and postoperatively with the Casia 2 OCT (Tomey). Based on preoperative effect sizes axial length, corneal thickness, internal anterior chamber depth, thickness of the crystalline lens, mean corneal radius and corneal diameter a selection of 17 machine learning algorithms were tested for prediction performance for calculation of internal anterior chamber depth (AQD_post) and axial position of equatorial plane of the lens in the pseudophakic eye (LEQ_post). Results The 17 machine learning algorithms (out of 4 families) varied in root mean squared/mean absolute prediction error between 0.187/0.139 mm and 0.255/0.204 mm (AQD_post) and 0.183/0.135 mm and 0.253/0.206 mm (LEQ_post), using 5-fold cross validation techniques. The Gaussian Process Regression Model using an exponential kernel showed the best performance in terms of root mean squared error for prediction of AQDpost and LEQpost. If the entire dataset is used (without splitting for training and validation data), comparison of a simple multivariate linear regression model vs. the algorithm with the best performance showed a root mean squared prediction error for AQD_post/LEQ_post with 0.188/0.187 mm vs. the best performance Gaussian Process Regression Model with 0.166/0.159 mm. Conclusion In this paper we wanted to show the principles of supervised machine learning applied to prediction of the measured physical postoperative axial position of the intraocular lenses. Based on our limited data pool and the algorithms used in our setting, the benefit of machine learning algorithms seems to be limited compared to a standard multivariate regression model.


2020 ◽  
Vol 22 (Supplement_2) ◽  
pp. ii135-ii136
Author(s):  
John Lin ◽  
Michelle Mai ◽  
Saba Paracha

Abstract Glioblastoma multiforme (GBM), the most common form of glioma, is a malignant tumor with a high risk of mortality. By providing accurate survival estimates, prognostic models have been identified as promising tools in clinical decision support. In this study, we produced and validated two machine learning-based models to predict survival time for GBM patients. Publicly available clinical and genomic data from The Cancer Genome Atlas (TCGA) and Broad Institute GDAC Firehouse were obtained through cBioPortal. Random forest and multivariate regression models were created to predict survival. Predictive accuracy was assessed and compared through mean absolute error (MAE) and root mean square error (RMSE) calculations. 619 GBM patients were included in the dataset. There were 381 (62.9%) cases of recurrence/progression and 53 (8.7%) cases of disease-free survival. The MAE and RMSE values were 0.553 and 0.887 years respectively for the random forest regression model, and they were 1.756 and 2.451 years respectively for the multivariate regression model. Both models accurately predicted overall survival. Comparison of models through MAE, RMSE, and visual analysis produced higher accuracy values for random forest than multivariate linear regression. Further investigation on feature selection and model optimization may improve predictive power. These findings suggest that using machine learning in GBM prognostic modeling will improve clinical decision support. *Co-first authors.


2021 ◽  
Vol 9 ◽  
Author(s):  
Fu-Sheng Chou ◽  
Laxmi V. Ghimire

Background: Pediatric myocarditis is a rare disease. The etiologies are multiple. Mortality associated with the disease is 5–8%. Prognostic factors were identified with the use of national hospitalization databases. Applying these identified risk factors for mortality prediction has not been reported.Methods: We used the Kids' Inpatient Database for this project. We manually curated fourteen variables as predictors of mortality based on the current knowledge of the disease, and compared performance of mortality prediction between linear regression models and a machine learning (ML) model. For ML, the random forest algorithm was chosen because of the categorical nature of the variables. Based on variable importance scores, a reduced model was also developed for comparison.Results: We identified 4,144 patients from the database for randomization into the primary (for model development) and testing (for external validation) datasets. We found that the conventional logistic regression model had low sensitivity (~50%) despite high specificity (>95%) or overall accuracy. On the other hand, the ML model struck a good balance between sensitivity (89.9%) and specificity (85.8%). The reduced ML model with top five variables (mechanical ventilation, cardiac arrest, ECMO, acute kidney injury, ventricular fibrillation) were sufficient to approximate the prediction performance of the full model.Conclusions: The ML algorithm performs superiorly when compared to the linear regression model for mortality prediction in pediatric myocarditis in this retrospective dataset. Prospective studies are warranted to further validate the applicability of our model in clinical settings.


2021 ◽  
Vol 8 ◽  
Author(s):  
Robert A. Reed ◽  
Andrei S. Morgan ◽  
Jennifer Zeitlin ◽  
Pierre-Henri Jarreau ◽  
Héloïse Torchin ◽  
...  

Introduction: Preterm babies are a vulnerable population that experience significant short and long-term morbidity. Rehospitalisations constitute an important, potentially modifiable adverse event in this population. Improving the ability of clinicians to identify those patients at the greatest risk of rehospitalisation has the potential to improve outcomes and reduce costs. Machine-learning algorithms can provide potentially advantageous methods of prediction compared to conventional approaches like logistic regression.Objective: To compare two machine-learning methods (least absolute shrinkage and selection operator (LASSO) and random forest) to expert-opinion driven logistic regression modelling for predicting unplanned rehospitalisation within 30 days in a large French cohort of preterm babies.Design, Setting and Participants: This study used data derived exclusively from the population-based prospective cohort study of French preterm babies, EPIPAGE 2. Only those babies discharged home alive and whose parents completed the 1-year survey were eligible for inclusion in our study. All predictive models used a binary outcome, denoting a baby's status for an unplanned rehospitalisation within 30 days of discharge. Predictors included those quantifying clinical, treatment, maternal and socio-demographic factors. The predictive abilities of models constructed using LASSO and random forest algorithms were compared with a traditional logistic regression model. The logistic regression model comprised 10 predictors, selected by expert clinicians, while the LASSO and random forest included 75 predictors. Performance measures were derived using 10-fold cross-validation. Performance was quantified using area under the receiver operator characteristic curve, sensitivity, specificity, Tjur's coefficient of determination and calibration measures.Results: The rate of 30-day unplanned rehospitalisation in the eligible population used to construct the models was 9.1% (95% CI 8.2–10.1) (350/3,841). The random forest model demonstrated both an improved AUROC (0.65; 95% CI 0.59–0.7; p = 0.03) and specificity vs. logistic regression (AUROC 0.57; 95% CI 0.51–0.62, p = 0.04). The LASSO performed similarly (AUROC 0.59; 95% CI 0.53–0.65; p = 0.68) to logistic regression.Conclusions: Compared to an expert-specified logistic regression model, random forest offered improved prediction of 30-day unplanned rehospitalisation in preterm babies. However, all models offered relatively low levels of predictive ability, regardless of modelling method.


Sign in / Sign up

Export Citation Format

Share Document