Updated external validation of the SORG machine learning algorithms for prediction of ninety-day and one-year mortality after surgery for spinal metastasis

Author(s):  
Akash A. Shah ◽  
Aditya V. Karhade ◽  
Howard Y. Park ◽  
William L. Sheppard ◽  
Luke J. Macyszyn ◽  
...  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Alan Brnabic ◽  
Lisa M. Hess

Abstract Background Machine learning is a broad term encompassing a number of methods that allow the investigator to learn from the data. These methods may permit large real-world databases to be more rapidly translated to applications to inform patient-provider decision making. Methods This systematic literature review was conducted to identify published observational research of employed machine learning to inform decision making at the patient-provider level. The search strategy was implemented and studies meeting eligibility criteria were evaluated by two independent reviewers. Relevant data related to study design, statistical methods and strengths and limitations were identified; study quality was assessed using a modified version of the Luo checklist. Results A total of 34 publications from January 2014 to September 2020 were identified and evaluated for this review. There were diverse methods, statistical packages and approaches used across identified studies. The most common methods included decision tree and random forest approaches. Most studies applied internal validation but only two conducted external validation. Most studies utilized one algorithm, and only eight studies applied multiple machine learning algorithms to the data. Seven items on the Luo checklist failed to be met by more than 50% of published studies. Conclusions A wide variety of approaches, algorithms, statistical software, and validation strategies were employed in the application of machine learning methods to inform patient-provider decision making. There is a need to ensure that multiple machine learning approaches are used, the model selection strategy is clearly defined, and both internal and external validation are necessary to be sure that decisions for patient care are being made with the highest quality evidence. Future work should routinely employ ensemble methods incorporating multiple machine learning algorithms.


Cancers ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 3817
Author(s):  
Shi-Jer Lou ◽  
Ming-Feng Hou ◽  
Hong-Tai Chang ◽  
Chong-Chi Chiu ◽  
Hao-Hsien Lee ◽  
...  

No studies have discussed machine learning algorithms to predict recurrence within 10 years after breast cancer surgery. This study purposed to compare the accuracy of forecasting models to predict recurrence within 10 years after breast cancer surgery and to identify significant predictors of recurrence. Registry data for breast cancer surgery patients were allocated to a training dataset (n = 798) for model development, a testing dataset (n = 171) for internal validation, and a validating dataset (n = 171) for external validation. Global sensitivity analysis was then performed to evaluate the significance of the selected predictors. Demographic characteristics, clinical characteristics, quality of care, and preoperative quality of life were significantly associated with recurrence within 10 years after breast cancer surgery (p < 0.05). Artificial neural networks had the highest prediction performance indices. Additionally, the surgeon volume was the best predictor of recurrence within 10 years after breast cancer surgery, followed by hospital volume and tumor stage. Accurate recurrence within 10 years prediction by machine learning algorithms may improve precision in managing patients after breast cancer surgery and improve understanding of risk factors for recurrence within 10 years after breast cancer surgery.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Yelena Petrosyan ◽  
Kednapa Thavorn ◽  
Glenys Smith ◽  
Malcolm Maclure ◽  
Roanne Preston ◽  
...  

Abstract Background Since primary data collection can be time-consuming and expensive, surgical site infections (SSIs) could ideally be monitored using routinely collected administrative data. We derived and internally validated efficient algorithms to identify SSIs within 30 days after surgery with health administrative data, using Machine Learning algorithms. Methods All patients enrolled in the National Surgical Quality Improvement Program from the Ottawa Hospital were linked to administrative datasets in Ontario, Canada. Machine Learning approaches, including a Random Forests algorithm and the high-performance logistic regression, were used to derive parsimonious models to predict SSI status. Finally, a risk score methodology was used to transform the final models into the risk score system. The SSI risk models were validated in the validation datasets. Results Of 14,351 patients, 795 (5.5%) had an SSI. First, separate predictive models were built for three distinct administrative datasets. The final model, including hospitalization diagnostic, physician diagnostic and procedure codes, demonstrated excellent discrimination (C statistics, 0.91, 95% CI, 0.90–0.92) and calibration (Hosmer-Lemeshow χ2 statistics, 4.531, p = 0.402). Conclusion We demonstrated that health administrative data can be effectively used to identify SSIs. Machine learning algorithms have shown a high degree of accuracy in predicting postoperative SSIs and can integrate and utilize a large amount of administrative data. External validation of this model is required before it can be routinely used to identify SSIs.


2016 ◽  
Author(s):  
Qing Yi Feng ◽  
Ruggero Vasile ◽  
Marc Segond ◽  
Avi Gozolchiani ◽  
Yang Wang ◽  
...  

Abstract. We present the toolbox ClimateLearn to tackle problems in climate prediction using machine learning techniques and climate network analysis. The package allows basic operations of data mining, i.e. reading, merging, and cleaning data, and running machine learning algorithms such as multilayer artificial neural networks and symbolic regression with genetic programming. Because spatial temporal information on climate variability can be efficiently represented by complex network measures, such data are considered here as input to the machine-learning algorithms. As an example, the toolbox is applied to the prediction of the occurrence and the development of El Niño in the equatorial Pacific, first concentrating on the occurrence of El Niño events one year ahead and second on the evolution of sea surface temperature anomalies with a lead time of three months.


Medicina ◽  
2021 ◽  
Vol 57 (2) ◽  
pp. 99
Author(s):  
Yueying Wang ◽  
Shuai Liu ◽  
Zhao Wang ◽  
Yusi Fan ◽  
Jingxuan Huang ◽  
...  

Background and Objective: Primary lung cancer is a lethal and rapidly-developing cancer type and is one of the most leading causes of cancer deaths. Materials and Methods: Statistical methods such as Cox regression are usually used to detect the prognosis factors of a disease. This study investigated survival prediction using machine learning algorithms. The clinical data of 28,458 patients with primary lung cancers were collected from the Surveillance, Epidemiology, and End Results (SEER) database. Results: This study indicated that the survival rate of women with primary lung cancer was often higher than that of men (p < 0.001). Seven popular machine learning algorithms were utilized to evaluate one-year, three-year, and five-year survival prediction The two classifiers extreme gradient boosting (XGB) and logistic regression (LR) achieved the best prediction accuracies. The importance variable of the trained XGB models suggested that surgical removal (feature “Surgery”) made the largest contribution to the one-year survival prediction models, while the metastatic status (feature “N” stage) of the regional lymph nodes was the most important contributor to three-year and five-year survival prediction. The female patients’ three-year prognosis model achieved a prediction accuracy of 0.8297 on the independent future samples, while the male model only achieved the accuracy 0.7329. Conclusions: This data suggested that male patients may have more complicated factors in lung cancer than females, and it is necessary to develop gender-specific diagnosis and prognosis models.


2020 ◽  
Author(s):  
Yelena Petrosyan ◽  
Kednapa Thavorn ◽  
Glenys Smith ◽  
Malcolm Maclure ◽  
Roanne Preston ◽  
...  

Abstract Background: Since primary data collection can be time-consuming and expensive, surgical site infections (SSIs) could ideally be monitored using routinely collected administrative data. We derived and internally validated efficient algorithms to identify SSIs within 30 days after surgery with health administrative data, using Machine Learning algorithms. All patients enrolled in the National Surgical Quality Improvement Program from the Ottawa Hospital were linked to administrative datasets in Ontario, Canada. Machine Learning approaches, including a Random Forests algorithm and the high-performance logistic regression, were used to derive parsimonious models to predict SSI status. Finally, a risk score methodology was used to transform the final models into the risk score system. The SSI risk models were validated in the validation datasets.Results: Of 14,351 patients, 795 (5.5%) had an SSI. First, separate predictive models were built for three distinct administrative datasets. The final model, including hospitalization diagnostic, physician diagnostic and procedure codes, demonstrated excellent discrimination (C statistics, 0.91, 95% CI, 0.90-0.92) and calibration (Hosmer-Lemeshow χ2 statistics, 4.531, p=0.402). Conclusion: We demonstrated that health administrative data can be effectively used to identify SSIs. Machine learning algorithms have shown a high degree of accuracy in predicting postoperative SSIs and can integrate and utilize a large amount of administrative data. External validation of this model is required before it can be routinely used to identify SSIs.


Author(s):  
Mustafa Berkant Selek ◽  
Saadet Sena Egeli ◽  
Yalcin Isler

In this study, the intensive care unit patient survival is predicted by machine learning algorithms according to the examinations performed in the first 24 hours. The data of intensive care patients collected from approximately two hundred hospitals over a period of one year were used. Algorithms are run in Python environment. Machine learning models were compared with the Cross-Validation method, and the random forest algorithm is used. The model made the prediction with 92,53% accuracy rate.


2021 ◽  
Author(s):  
Ahmed A Al-Jaishi ◽  
Monica Taljaard ◽  
Melissa D Al-Jaishi ◽  
Sheikh S Abdullah ◽  
Lehana Thabane ◽  
...  

Abstract Background: Cluster randomized trials (CRTs) are becoming an increasingly important design. However, authors do not always adhere to requirements to explicitly identify the design as cluster randomized in titles and abstracts, making retrieval from bibliographic databases difficult. Machine learning algorithms may improve their identification and retrieval. Therefore, we aimed to develop machine learning algorithms that accurately determine whether a bibliographic citation is a CRT report. Methods: We trained, internally validated, and externally validated two convolutional neural networks and one support vector machines (SVM) algorithms to predict whether a citation is a CRT report or not. We exclusively used the information in an article citation, including the title, abstract, keywords, and subject headings. The algorithms' output was a probability from 0 to 1. We assessed algorithm performance using the area under the receiver operating characteristic (AUC) curves. Each algorithm's performance was evaluated individually and together as an ensemble. We randomly selected 5000 from 87,633 citations to train and internally validate our algorithms. Of the 5000 selected citations, 589 (12%) were confirmed CRT reports. We then externally validated our algorithms on an independent set of 1916 randomized trial citations, with 665 (35%) confirmed CRT reports. Results: In internal validation, the ensemble algorithm discriminated best for identifying CRT reports with an AUC of 98.6% (95% confidence interval: 97.8%, 99.4%), sensitivity of 97.7% (94.3%, 100%), and specificity of 85.0% (81.8%, 88.1%). In external validation, the ensemble algorithm had an AUC of 97.8 % (97.0%, 98.5%), sensitivity of 97.6% (96.4%, 98.6%), and specificity of 78.2% (75.9%, 80.4%)). All three individual algorithms performed well, but less so than the ensemble. Conclusions: We successfully developed high-performance algorithms that identified whether a citation was a CRT report with high sensitivity and moderately high specificity. We provide open-source software to facilitate the use of our algorithms in practice.


Sign in / Sign up

Export Citation Format

Share Document