Comparison of Gait Speed Reserve, Usual Gait Speed, and Maximum Gait Speed of Adults Aged 50+ in Ireland Using Explainable Machine Learning

Frontiers in Network Physiology ◽

10.3389/fnetp.2021.754477 ◽

2021 ◽

Vol 1 ◽

Author(s):

James R. C Davis ◽

Silvin P. Knight ◽

Orna A. Donoghue ◽

Belinda Hernández ◽

Rossella Rizzo ◽

...

Keyword(s):

Machine Learning ◽

Visual Acuity ◽

Grip Strength ◽

Resting State ◽

Gait Speed ◽

Orthostatic Intolerance ◽

Depression Scale ◽

Training Data ◽

Gradient Boosting ◽

Pulse Interval

Gait speed is a measure of general fitness. Changing from usual (UGS) to maximum (MGS) gait speed requires coordinated action of many body systems. Gait speed reserve (GSR) is defined as MGS–UGS. From a shortlist of 88 features across five categories including sociodemographic, cognitive, and physiological, we aimed to find and compare the sets of predictors that best describe UGS, MGS, and GSR. For this, we leveraged data from 3,925 adults aged 50+ from Wave 3 of The Irish Longitudinal Study on Ageing (TILDA). Features were selected by a histogram gradient boosting regression-based stepwise feature selection pipeline. Each model’s feature importance and input–output relationships were explored using TreeExplainer from the Shapely Additive Explanations explainable machine learning package. The mean Radj2 (SD) from fivefold cross-validation on training data and the Radj2 score on test data were 0.38 (0.04) and 0.41 for UGS, 0.45 (0.04) and 0.46 for MGS, and 0.19 (0.02) and 0.21 for GSR. Each model selected features across all categories. Features common to all models were age, grip strength, chair stands time, mean motor reaction time, and height. Exclusive to UGS and MGS were educational attainment, fear of falling, Montreal cognitive assessment errors, and orthostatic intolerance. Exclusive to MGS and GSR were body mass index (BMI), and number of medications. No features were selected exclusively for UGS and GSR. Features unique to UGS were resting-state pulse interval, Center for Epidemiologic Studies Depression Scale (CESD) depression, sit-to-stand difference in diastolic blood pressure, and left visual acuity. Unique to MGS were standard deviation in sustained attention to response task times, resting-state heart rate, smoking status, total heartbeat power during paced breathing, and visual acuity. Unique to GSR were accuracy proportion in a sound-induced flash illusion test, Mini-mental State Examination errors, and number of cardiovascular conditions. No interactions were present in the GSR model. The four features that overall gave the most impactful interactions in the UGS and MGS models were age, chair stands time, grip strength, and BMI. These findings may help provide new insights into the multisystem predictors of gait speed and gait speed reserve in older adults and support a network physiology approach to their study.

Download Full-text

Comparison of gait speed reserve, usual gait speed, and maximum gait speed of adults aged 50+ in Ireland using explainable machine learning

10.1101/2021.07.23.21260911 ◽

2021 ◽

Author(s):

James R.C. Davis ◽

Silvin P. Knight ◽

Orna A. Donoghue ◽

Belinda Hernández ◽

Rose Anne Kenny ◽

...

Keyword(s):

Machine Learning ◽

Reaction Time ◽

Grip Strength ◽

Choice Reaction Time ◽

Gait Speed ◽

Orthostatic Intolerance ◽

Motor Reaction ◽

Motor Reaction Time ◽

Time Test ◽

Reaction Time Test

Gait speed is a measure of general fitness. Changing from usual (UGS) to maximum (MGS) gait speed requires a general effort across many body systems. The difference, MGS − UGS, is defined as gait speed reserve (GSR). In the present study, using 3925 participants aged 50+ from Wave 3 of The Irish Longitudinal Study on Ageing (TILDA), we used a gradient boosted trees-based stepwise feature selection pipeline for the discovery of clinically relevant predictors of GSR, UGS, and MGS using a shortlist of 88 features across 5 categories (socio-demographics/anthropometrics/medical history; cardiovascular system; physical strength; sensory; and cognitive/psychological). The TreeSHAP explainable machine learning package was used to analyse the input-output relationships of the three models. The mean R2adj (SD) from 5-fold cross validation on training data and the R2adj score on test data for the models are: 0.38 (0.04) and 0.41 for UGS; 0.45 (0.04) and 0.46 for MGS; and 0.19 (0.02) and 0.21 for GSR. Features selected for the UGS model were: age, chair stands time, body mass index, grip strength, number of medications, resting state pulse interval, mean motor reaction time in the choice reaction time test, height, depression score, sit-to-stand difference in diastolic blood pressure, and left visual acuity. The features selected for the MGS model were: age, grip strength, repeated chair stands time, body mass index, education, mean motor reaction time in the choice reaction time test, number of medications, height, the standard deviation of the mean reaction time in the sustained attention to response task, mean heart rate at resting state, fear of falling, MOCA errors, orthostatic intolerance during active stand, smoking status, total heart beat power during paced breathing, the root mean square of successive differences between heartbeats during paced breathing, and visual acuity. Finally, the features chosen for the GSR model were: mean motor reaction time in the choice reaction time test, grip strength, education, chair stands time, MOCA errors, accuracy proportion in the sound induced flash illusion (two beeps and one flash with stimulus-onset asynchrony of +150 ms), fear of falling, height, age, sex, orthostatic intolerance, MMSE errors, and number of cardiovascular conditions. MGS and UGS were more explainable than GSR. All three models contain features from all five categories. There were common features to all three models (age, grip strength, chair stands time, mean motor reaction time in the choice reaction time test, and height), but also some features unique to each of them. Overall, findings on all three models were clinically plausible and support a network physiology approach to the understanding of predictors of performance-based tasks. By employing an explainable machine learning technique, our observations may help clinicians gain new insights into the multisystem predictors of gait speed and gait speed reserve in older adults.

Download Full-text

Abstract 1122‐000047: Machine Learning to Predict Stroke Outcomes after Mechanical Thrombectomy

Stroke: Vascular and Interventional Neurology ◽

10.1161/svin.01.suppl_1.000047 ◽

2021 ◽

Vol 1 (S1) ◽

Author(s):

Mehdi Bouslama ◽

Leonardo Pisani ◽

Diogo Haussen ◽

Raul Nogueira

Keyword(s):

Machine Learning ◽

Decision Making ◽

Cerebral Artery ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Anterior Circulation ◽

Discriminative Performance ◽

Post Procedure ◽

Procedural Models

Introduction : Prognostication is an integral part of clinical decision‐making in stroke care. Machine learning (ML) methods have gained increasing popularity in the medical field due to their flexibility and high performance. Using a large comprehensive stroke center registry, we sought to apply various ML techniques for 90‐day stroke outcome predictions after thrombectomy. Methods : We used individual patient data from our prospectively collected thrombectomy database between 09/2010 and 03/2020. Patients with anterior circulation strokes (Internal Carotid Artery, Middle Cerebral Artery M1, M2, or M3 segments and Anterior Cerebral Artery) and complete records were included. Our primary outcome was 90‐day functional independence (defined as modified Rankin Scale score 0–2). Pre‐ and post‐procedure models were developed. Four known ML algorithms (support vector machine, random forest, gradient boosting, and artificial neural network) were implemented using a 70/30 training‐test data split and 10‐fold cross‐validation on the training data for model calibration. Discriminative performance was evaluated using the area under the receiver operator characteristics curve (AUC) metric. Results : Among 1248 patients with anterior circulation large vessel occlusion stroke undergoing thrombectomy during the study period, 1020 had complete records and were included in the analysis. In the training data (n = 714), 49.3% of the patients achieved independence at 90‐days. Fifteen baseline clinical, laboratory and neuroimaging features were used to develop the pre‐procedural models, with four additional parameters included in the post‐procedure models. For the preprocedural models, the highest AUC was 0.797 (95%CI [0.75‐ 0.85]) for the gradient boosting model. Similarly, the same ML technique performed best on post‐procedural data and had an improved discriminative performance compared to the pre‐procedure model with an AUC of 0.82 (95%CI [0.77‐ 0.87]). Conclusions : Our pre‐and post‐procedural models reliably estimated outcomes in stroke patients undergoing thrombectomy. They represent a step forward in creating simple and efficient prognostication tools to aid treatment decision‐making. A web‐based platform and related mobile app are underway.

Download Full-text

Predicting Electric Vehicle Charging Station Availability Using Ensemble Machine Learning

Energies ◽

10.3390/en14237834 ◽

2021 ◽

Vol 14 (23) ◽

pp. 7834

Author(s):

Christopher Hecht ◽

Jan Figgener ◽

Dirk Uwe Sauer

Keyword(s):

Machine Learning ◽

Binary Data ◽

Training Data ◽

Gradient Boosting ◽

Traffic Density ◽

Learning Models ◽

Charging Infrastructure ◽

Ensemble Models ◽

Charging Station ◽

Machine Learning Models

Electric vehicles may reduce greenhouse gas emissions from individual mobility. Due to the long charging times, accurate planning is necessary, for which the availability of charging infrastructure must be known. In this paper, we show how the occupation status of charging infrastructure can be predicted for the next day using machine learning models— Gradient Boosting Classifier and Random Forest Classifier. Since both are ensemble models, binary training data (occupied vs. available) can be used to provide a certainty measure for predictions. The prediction may be used to adapt prices in a high-load scenario, predict grid stress, or forecast available power for smart or bidirectional charging. The models were chosen based on an evaluation of 13 different, typically used machine learning models. We show that it is necessary to know past charging station usage in order to predict future usage. Other features such as traffic density or weather have a limited effect. We show that a Gradient Boosting Classifier achieves 94.8% accuracy and a Matthews correlation coefficient of 0.838, making ensemble models a suitable tool. We further demonstrate how a model trained on binary data can perform non-binary predictions to give predictions in the categories “low likelihood” to “high likelihood”.

Download Full-text

Machine Learning Application to CO2 Foam Rheology

10.2118/208016-ms ◽

2021 ◽

Author(s):

Javad Iskandarov ◽

George Fanourgakis ◽

Waleed Alameri ◽

George Froudakis ◽

Georgios Karanikolos

Keyword(s):

Machine Learning ◽

Oil Recovery ◽

Experimental Studies ◽

Training Data ◽

Computational Time ◽

Gradient Boosting ◽

Operational Conditions ◽

Co2 Foam ◽

Modelling Techniques ◽

Foam Rheology

Abstract Conventional foam modelling techniques require tuning of too many parameters and long computational time in order to provide accurate predictions. Therefore, there is a need for alternative methodologies for the efficient and reliable prediction of the foams’ performance. Foams are susceptible to various operational conditions and reservoir parameters. This research aims to apply machine learning (ML) algorithms to experimental data in order to correlate important affecting parameters to foam rheology. In this way, optimum operational conditions for CO2 foam enhanced oil recovery (EOR) can be determined. In order to achieve that, five different ML algorithms were applied to experimental rheology data from various experimental studies. It was concluded that the Gradient Boosting (GB) algorithm could successfully fit the training data and give the most accurate predictions for unknown cases.

Download Full-text

Exploiting Rules to Enhance Machine Learning in Extracting Information From Multi-Institutional Prostate Pathology Reports

JCO Clinical Cancer Informatics ◽

10.1200/cci.20.00028 ◽

2020 ◽

pp. 865-874

Author(s):

Enrico Santus ◽

Tal Schuster ◽

Amir M. Tahmasebi ◽

Clara Li ◽

Adam Yala ◽

...

Keyword(s):

Machine Learning ◽

Hybrid Systems ◽

High Performance ◽

Feature Model ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Extreme Gradient Boosting ◽

Pathology Reports

PURPOSE Literature on clinical note mining has highlighted the superiority of machine learning (ML) over hand-crafted rules. Nevertheless, most studies assume the availability of large training sets, which is rarely the case. For this reason, in the clinical setting, rules are still common. We suggest 2 methods to leverage the knowledge encoded in pre-existing rules to inform ML decisions and obtain high performance, even with scarce annotations. METHODS We collected 501 prostate pathology reports from 6 American hospitals. Reports were split into 2,711 core segments, annotated with 20 attributes describing the histology, grade, extension, and location of tumors. The data set was split by institutions to generate a cross-institutional evaluation setting. We assessed 4 systems, namely a rule-based approach, an ML model, and 2 hybrid systems integrating the previous methods: a Rule as Feature model and a Classifier Confidence model. Several ML algorithms were tested, including logistic regression (LR), support vector machine (SVM), and eXtreme gradient boosting (XGB). RESULTS When training on data from a single institution, LR lags behind the rules by 3.5% (F1 score: 92.2% v 95.7%). Hybrid models, instead, obtain competitive results, with Classifier Confidence outperforming the rules by +0.5% (96.2%). When a larger amount of data from multiple institutions is used, LR improves by +1.5% over the rules (97.2%), whereas hybrid systems obtain +2.2% for Rule as Feature (97.7%) and +2.6% for Classifier Confidence (98.3%). Replacing LR with SVM or XGB yielded similar performance gains. CONCLUSION We developed methods to use pre-existing handcrafted rules to inform ML algorithms. These hybrid systems obtain better performance than either rules or ML models alone, even when training data are limited.

Download Full-text

Noise Prediction Using Machine Learning with Measurements Analysis

Applied Sciences ◽

10.3390/app10186619 ◽

2020 ◽

Vol 10 (18) ◽

pp. 6619

Author(s):

Po-Jiun Wen ◽

Chihpin Huang

Keyword(s):

Machine Learning ◽

Noise Exposure ◽

Learning Model ◽

Training Data ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Noise Prediction ◽

Time Duration ◽

Proposed Model ◽

The Impact

The noise prediction using machine learning is a special study that has recently received increased attention. This is particularly true in workplaces with noise pollution, which increases noise exposure for general laborers. This study attempts to analyze the noise equivalent level (Leq) at the National Synchrotron Radiation Research Center (NSRRC) facility and establish a machine learning model for noise prediction. This study utilized the gradient boosting model (GBM) as the learning model in which past noise measurement records and many other features are integrated as the proposed model makes a prediction. This study analyzed the time duration and frequency of the collected Leq and also investigated the impact of training data selection. The results presented in this paper indicate that the proposed prediction model works well in almost noise sensors and frequencies. Moreover, the model performed especially well in sensor 8 (125 Hz), which was determined to be a serious noise zone in the past noise measurements. The results also show that the root-mean-square-error (RMSE) of the predicted harmful noise was less than 1 dBA and the coefficient of determination (R2) value was greater than 0.7. That is, the working field showed a favorable noise prediction performance using the proposed method. This positive result shows the ability of the proposed approach in noise prediction, thus providing a notification to the laborer to prevent long-term exposure. In addition, the proposed model accurately predicts noise future pollution, which is essential for laborers in high-noise environments. This would keep employees healthy in avoiding noise harmful positions to prevent people from working in that environment.

Download Full-text

Rack Temperature Prediction Model Using Machine Learning after Stopping Computer Room Air Conditioner in Server Room

Energies ◽

10.3390/en13174300 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4300

Author(s):

Kosuke Sasakura ◽

Takeshi Aoki ◽

Masayoshi Komatsu ◽

Takeshi Watanabe

Keyword(s):

Machine Learning ◽

High Heat ◽

Training Data ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Air Conditioner ◽

Tree Model ◽

Explanatory Variables ◽

Temperature Environment ◽

The Impact

Data centers (DCs) are becoming increasingly important in recent years, and highly efficient and reliable operation and management of DCs is now required. The generated heat density of the rack and information and communication technology (ICT) equipment is predicted to get higher in the future, so it is crucial to maintain the appropriate temperature environment in the server room where high heat is generated in order to ensure continuous service. It is especially important to predict changes of rack intake temperature in the server room when the computer room air conditioner (CRAC) is shut down, which can cause a rapid rise in temperature. However, it is quite difficult to predict the rack temperature accurately, which in turn makes it difficult to determine the impact on service in advance. In this research, we propose a model that predicts the rack intake temperature after the CRAC is shut down. Specifically, we use machine learning to construct a gradient boosting decision tree model with data from the CRAC, ICT equipment, and rack intake temperature. Experimental results demonstrate that the proposed method has a very high prediction accuracy: the coefficient of determination was 0.90 and the root mean square error (RMSE) was 0.54. Our model makes it possible to evaluate the impact on service and determine if action to maintain the temperature environment is required. We also clarify the effect of explanatory variables and training data of the machine learning on the model accuracy.

Download Full-text

Privacy-Preserving Gradient Boosting Decision Trees

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5422 ◽

2020 ◽

Vol 34 (01) ◽

pp. 784-791 ◽

Cited By ~ 1

Author(s):

Qinbin Li ◽

Zhaomin Wu ◽

Zeyi Wen ◽

Bingsheng He

Keyword(s):

Machine Learning ◽

Differential Privacy ◽

Training Data ◽

Gradient Boosting ◽

Training Algorithm ◽

Model Accuracy ◽

Machine Learning Model ◽

Improve Model ◽

Privacy Budget ◽

Privacy Level

The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective privacy budget allocations (especially across different trees in the GBDT model). Loose sensitivity bounds lead to more noise to obtain a fixed privacy level. Ineffective privacy budget allocations worsen the accuracy loss especially when the number of trees is large. Therefore, we propose a new GBDT training algorithm that achieves tighter sensitivity bounds and more effective noise allocations. Specifically, by investigating the property of gradient and the contribution of each tree in GBDTs, we propose to adaptively control the gradients of training data for each iteration and leaf node clipping in order to tighten the sensitivity bounds. Furthermore, we design a novel boosting framework to allocate the privacy budget between trees so that the accuracy loss can be further reduced. Our experiments show that our approach can achieve much better model accuracy than other baselines.

Download Full-text

Prediction of Thermal Properties of Zeolites through Machine Learning

10.26434/chemrxiv-2021-m67lk-v3 ◽

2022 ◽

Author(s):

Maxime Ducamp ◽

François-Xavier Coudert

Keyword(s):

Machine Learning ◽

Thermal Properties ◽

Large Scale ◽

Materials Science ◽

Pore Space ◽

Harmonic Approximation ◽

Chemical Properties ◽

Training Data ◽

Gradient Boosting ◽

Geometric Descriptors

The use of machine learning for the prediction of physical and chemical properties of crystals based on their structure alone is currently an area of intense research in computational materials science. In this work, we studied the possibility of using machine learning-trained algorithms in order to calculate the thermal properties of siliceous zeolite frameworks. We used as training data the thermal properties of 120 zeolites, calculated at the DFT level, in the quasi-harmonic approximation. We compared the statistical accuracy of trained models (based on the gradient boosting regression technique) using different types of descriptors, including ad hoc geometrical features, topology, pore space, and general geometric descriptors. While geometric descriptors were found to perform best, we also identified limitations on the accuracy of the predictions, especially for a small group of materials with very highly negative thermal expansion coefficients. We then studied the generalizability of the technique, demonstrating that the predictions were not sensitive to the refinement of framework structures at a high level of theory. Therefore, the models are suitable for the exploration and screening of large-scale databases of hypothetical frameworks, which we illustrate on the PCOD2 database of zeolites containing around 600,000 hypothetical structures.

Download Full-text