scholarly journals Personalized prediction of rehabilitation outcomes in multiple sclerosis: a proof-of-concept using clinical data, digital health metrics, and machine learning

Author(s):  
Christoph M. Kanzler ◽  
Ilse Lamers ◽  
Peter Feys ◽  
Roger Gassert ◽  
Olivier Lambercy

AbstractBackgroundA personalized prediction of upper limb neurorehabilitation outcomes in persons with multiple sclerosis (pwMS) promises to optimize the allocation of therapy and to stratify individuals for resource-demanding clinical trials. Previous research identified predictors on a population level through linear models and clinical data, including conventional assessments describing sensorimotor impairments. The objective of this work was to explore the feasibility of providing an individualized and more accurate prediction of rehabilitation outcomes in pwMS by leveraging non-linear machine learning models, clinical data, and digital health metrics characterizing sensorimotor impairments.MethodsClinical data and digital health metrics were recorded from eleven pwMS undergoing neurorehabilitation. Machine learning models were trained on data recorded pre-intervention. The dependent variables indicated whether a considerable improvement on the activity level was observed across the intervention or not (binary classification), as defined by the Action Research Arm Test (ARAT), Box and Block Test (BBT), or Nine Hole Peg Test (NHPT).ResultsIn a cross-validation, considerable improvements in ARAT or BBT could be accurately predicted (94% balanced accuracy) by only relying on patient master data. Considerable improvements in NHPT could be accurately predicted (89% balanced accuracy), but required knowledge about sensorimotor impairments. Assessing these with digital health metrics instead of conventional scales allowed increasing the balanced accuracy by +17% . Non-linear machine-learning models improved the predictive accuracy for the NHPT by +25% compared to linear models.ConclusionsThis work demonstrates the feasibility of a personalized prediction of upper limb neurorehabilitation outcomes in pwMS using multi-modal data collected before neurorehabilitation and machine learning. Information from digital health metrics about sensorimotor impairment was necessary to predict changes in dexterous hand control, thereby underlining their potential to provide a more sensitive and fine-grained assessment than conventional scales. Non-linear models outperformed ones, suggesting that the commonly assumed linearity of neurorehabilitation is oversimplified.clinicaltrials.gov registration number: NCT02688231

Author(s):  
Christoph M. Kanzler ◽  
Ilse Lamers ◽  
Peter Feys ◽  
Roger Gassert ◽  
Olivier Lambercy

AbstractPredicting upper limb neurorehabilitation outcomes in persons with multiple sclerosis (pwMS) is essential to optimize therapy allocation. Previous research identified population-level predictors through linear models and clinical data. This work explores the feasibility of predicting individual neurorehabilitation outcomes using machine learning, clinical data, and digital health metrics. Machine learning models were trained on clinical data and digital health metrics recorded pre-intervention in 11 pwMS. The dependent variables indicated whether pwMS considerably improved across the intervention, as defined by the Action Research Arm Test (ARAT), Box and Block Test (BBT), or Nine Hole Peg Test (NHPT). Improvements in ARAT or BBT could be accurately predicted (88% and 83% accuracy) using only patient master data. Improvements in NHPT could be predicted with moderate accuracy (73%) and required knowledge about sensorimotor impairments. Assessing these with digital health metrics over clinical scales increased accuracy by 10%. Non-linear models improved accuracy for the BBT (+ 9%), but not for the ARAT (-1%) and NHPT (-2%). This work demonstrates the feasibility of predicting upper limb neurorehabilitation outcomes in pwMS, which justifies the development of more representative prediction models in the future. Digital health metrics improved the prediction of changes in hand control, thereby underlining their advanced sensitivity.


2021 ◽  
Author(s):  
Siddharth Ghule ◽  
Sayan Bagchi ◽  
Kumar Vanka

<div>Electricity generation is a major contributing factor for greenhouse gas emissions. Energy storage systems available today have a combined capacity to store less than 1% of the electricity being consumed worldwide. Redox Flow Batteries (RFBs) are promising candidates for green and efficient energy storage systems. RFBs are being used in renewable energy systems, but their widespread adoption is limited due to high production costs and toxicity associated with the transition-metal-based redox-active species. Therefore, cheaper and greener alternative organic redox-active species are being investigated. Recent reports have shown organic molecules based on phenazine are promising candidates for redox-active species in RFBs. However, the large number of available organic compounds makes the conventional experimental and DFT methods impractical to screen thousands of molecules in a reasonable amount of time. In contrast, machine-learning models have low development time, short prediction time, and high accuracy; thus, are being heavily investigated for virtual screening applications. In this work, we developed machine-learning models to predict the redox potential of phenazine derivatives in DME solvent using a small dataset of 185 molecules. 2D, 3D, and Molecular Fingerprint features were computed using readily available and easy-to-use python libraries, making our approach easily adaptable to similar work. Twenty linear and non-linear machine-learning models were investigated in this work. These models achieved excellent performance on the unseen data (i.e., R<sup>2</sup> > 0.98, MSE < 0.008 V2 and MAE < 0.07 V). Model performance was assessed in a consistent manner using the training and evaluation pipeline developed in this work. We showed that 2D molecular features are most informative and achieve the best prediction accuracy among four feature sets. We also showed that often less preferred but relatively faster linear models could perform better than non-linear models when the feature set contains different types of features (i.e., 2D, 3D, and Molecular Fingerprints). Further investigations revealed that it is possible to reduce the training and inference time without sacrificing prediction accuracy by using a small subset of features. Moreover, models were able to predict the previously reported promising redox-active compounds with high accuracy. Also, significantly low prediction errors were observed for the functional groups. Although some functional groups had only one compound in the training set, best-performing models could achieve errors (MAPE) less than 10%. The major source of error was a lack of data near-zero and in the positive region. Therefore, this work shows that it is possible to develop accurate machine-learning models that could potentially screen millions of compounds in a short amount of time with a small training set and limited number of easy to compute features. Thus, results obtained in this report would help in the adoption of green energy by accelerating the field of materials discovery for energy storage applications.</div>


JAMIA Open ◽  
2021 ◽  
Vol 4 (3) ◽  
Author(s):  
Suparno Datta ◽  
Jan Philipp Sachs ◽  
Harry FreitasDa Cruz ◽  
Tom Martensen ◽  
Philipp Bode ◽  
...  

Abstract Objectives The development of clinical predictive models hinges upon the availability of comprehensive clinical data. Tapping into such resources requires considerable effort from clinicians, data scientists, and engineers. Specifically, these efforts are focused on data extraction and preprocessing steps required prior to modeling, including complex database queries. A handful of software libraries exist that can reduce this complexity by building upon data standards. However, a gap remains concerning electronic health records (EHRs) stored in star schema clinical data warehouses, an approach often adopted in practice. In this article, we introduce the FlexIBle EHR Retrieval (FIBER) tool: a Python library built on top of a star schema (i2b2) clinical data warehouse that enables flexible generation of modeling-ready cohorts as data frames. Materials and Methods FIBER was developed on top of a large-scale star schema EHR database which contains data from 8 million patients and over 120 million encounters. To illustrate FIBER’s capabilities, we present its application by building a heart surgery patient cohort with subsequent prediction of acute kidney injury (AKI) with various machine learning models. Results Using FIBER, we were able to build the heart surgery cohort (n = 12 061), identify the patients that developed AKI (n = 1005), and automatically extract relevant features (n = 774). Finally, we trained machine learning models that achieved area under the curve values of up to 0.77 for this exemplary use case. Conclusion FIBER is an open-source Python library developed for extracting information from star schema clinical data warehouses and reduces time-to-modeling, helping to streamline the clinical modeling process.


2021 ◽  
Author(s):  
Scott Kulm ◽  
Lior Kofman ◽  
Jason Mezey ◽  
Olivier Elemento

ABSTRACTA patient’s risk for cancer is usually estimated through simple linear models that sum effect sizes of proven risk factors. In theory, more advanced machine learning models can be used for the same task. Using data from the UK Biobank, a large prospective health study, we have developed linear and machine learning models for the prediction of 12 different cancers diagnoses within a 10 year time span. We find that the top machine learning algorithm, XGBoost (XGB), trained on 707 features generated an average area under the receiver operator curve of 0.736 (with a range of 0.65-0.85). Linear models trained with only 10 features were found to be statistically indifferent from the machine learning performance. The linear models were significantly more accurate than the prominent QCancer models (p = 0.0019), which are trained on 45 million patient records and available to over 4,000 United Kingdom general practices. The increase in accuracy may be caused by the consideration of often omitted feature types, including survey answers, census records, and genetic information. This approach led to the discovery of significant novel risk features, including self-reported happiness with own health (relevant to 12 cancers), measured testosterone (relevant to 8 cancers), and ICD codes for rehabilitation procedures (relevant to 3 cancers). These ten feature models can be easily implemented within the clinic, allowing for personalized screening schedules that may increase the cancer survival within a population.


2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 285-285
Author(s):  
Vanessa Rotondo ◽  
Dan Tulpan ◽  
Katharine M Wood ◽  
Marlene Paibomesai ◽  
Vern R Osborne

Abstract The objective of this study is to investigate how linear body measurements relate to and can be used to predict calf body weight using linear and machine learning models. To meet these objectives, a total of 103 Angus cross calves were enrolled in the study from wk 2 - 8. Calves were weighed and linear measurements were collected weekly, such as: poll to nose, width across the eyes (WE), width across the right ear, neck length, wither height, heart girth (HG), midpiece height (MH), midpiece circumference, midpiece width (MW), midpiece depth (MD), hook height, hook width, pin height, top of pin bones width (PW), width across the ends of pin bones, nose to tail body length, the length between the withers and pins, forearm to hoof, cannon bone to hoof. These measurements were taken using a commercial soft tape measure and calipers. To assess relationships between traits and to fit a model to predict BW, data were analyzed using the Weka (The University of Waikato, New Zealand) software using both linear regression (LR) and random forest (RF) machine learning models. The models were trained using a 10-fold cross-validation approach. The automatically derived LR model used 11 traits to fit the data to weekly BW (r2 = 0.97), where the traits with the highest coefficients were HG, PW and WE. The RF model improved further the BW predictions (r2= 0.98). Additionally, sex differences were examined. Although the BW model continued to fit well (r2 0.97), some of the top linear traits differed. The results of this study suggest that linear models built on linear measurements can accurately estimate body weight in beef calves, and that machine learning can further improve the model fit.


2018 ◽  
Vol 1 (3-4) ◽  
pp. 421-449 ◽  
Author(s):  
Ippei Obayashi ◽  
Yasuaki Hiraoka ◽  
Masao Kimura

Author(s):  
Suleka Helmini ◽  
Nadheesh Jihan ◽  
Malith Jayasinghe ◽  
Srinath Perera

In the retail domain, estimating the sales before actual sales become known plays a key role in maintaining a successful business. This is due to the fact that most crucial decisions are bound to be based on these forecasts. Statistical sales forecasting models like ARIMA (Auto-Regressive Integrated Moving Average), can be identified as one of the most traditional and commonly used forecasting methodologies. Even though these models are capable of producing satisfactory forecasts for linear time series data they are not suitable for analyzing non-linear data. Therefore, machine learning models (such as Random Forest Regression, XGBoost) have been employed frequently as they were able to achieve better results using non-linear data. The recent research shows that deep learning models (e.g. recurrent neural networks) can provide higher accuracy in predictions compared to machine learning models due to their ability to persist information and identify temporal relationships. In this paper, we adopt a special variant of Long Short Term Memory (LSTM) network called LSTM model with peephole connections for sales prediction. We first build our model using historical features for sales forecasting. We compare the results of this initial LSTM model with multiple machine learning models, namely, the Extreme Gradient Boosting model (XGB) and Random Forest Regressor model(RFR). We further improve the prediction accuracy of the initial model by incorporating features that describe the future that is known to us in the current moment, an approach that has not been explored in previous state-of-the-art LSTM based forecasting models. The initial LSTM model we develop outperforms the machine learning models achieving 12% - 14% improvement whereas the improved LSTM model achieves 11\% - 13\% improvement compared to the improved machine learning models. Furthermore, we also show that our improved LSTM model can obtain a 20% - 21% improvement compared to the initial LSTM model, achieving significant improvement.


Entropy ◽  
2020 ◽  
Vol 22 (2) ◽  
pp. 205 ◽  
Author(s):  
Sandy Rodrigues ◽  
Gerhard Mütter ◽  
Helena Geirinhas Ramos ◽  
F. Morgado-Dias

Photovoltaic (PV) system energy production is non-linear because it is influenced by the random nature of weather conditions. The use of machine learning techniques to model the PV system energy production is recommended since there is no known way to deal well with non-linear data. In order to detect PV system faults, the machine learning models should provide accurate outputs. The aim of this work is to accurately predict the DC energy of six PV strings of a utility-scale PV system and to accurately detect PV string faults by benchmarking the results of four machine learning methodologies known to improve the accuracy of the machine learning models, such as the data mining methodology, machine learning technique benchmarking methodology, hybrid methodology, and the ensemble methodology. A new hybrid methodology is proposed in this work which combines the use of a fuzzy system and the use of a machine learning system containing five different trained machine learning models, such as the regression tree, artificial neural networks, multi-gene genetic programming, Gaussian process, and support vector machines for regression. The results showed that the hybrid methodology provided the most accurate machine learning predictions of the PV string DC energy, and consequently the PV string fault detection is successful.


2020 ◽  
pp. 135481662097695
Author(s):  
Jian-Wu Bi ◽  
Tian-Yu Han ◽  
Hui Li

This study explores how to select the optimal number of lagged inputs (NLIs) in international tourism demand forecasting. With international tourist arrivals at 10 European countries, the performances of eight machine learning models are evaluated using different NLIs. The results show that: (1) as NLIs increases, the error of most machine learning models first decreases rapidly and then tends to be stable (or fluctuates around a certain value) when NLIs reaches a certain cutoff point. The cutoff point is related to 12 and its multiples. This trend is not affected by the size of the test set; (2) for nonlinear and ensemble models, it is better to select one cycle of the data as the NLIs, while for linear models, multiple cycles are a better choice; (3) significantly different prediction results are obtained by different categories of models when the optimal NLIs are used.


PLoS ONE ◽  
2020 ◽  
Vol 15 (3) ◽  
pp. e0230219 ◽  
Author(s):  
Ruggiero Seccia ◽  
Daniele Gammelli ◽  
Fabio Dominici ◽  
Silvia Romano ◽  
Anna Chiara Landi ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document