Are Machine Learning Models Used to Represent Accelerometry Data Robust to Age Differences?

Abstract Regular and sufficient amounts of physical activity (PA) are significant in increasing health benefits and mitigating health risks. Given the growing popularity of wrist-worn devices across all age groups, a rigorous evaluation for recognizing hallmark measures of physical activities and estimating energy expenditure is needed to compare their accuracy across the lifespan. The goal of the study was to build machine learning models to recognizing the hallmark measures of PA and estimating energy expenditure (EE), and to test the hypothesis that model performance varies across age-group: young [20-50 years], middle (50-70 years], and old (70-89 years]. Participants (n = 253, 62% women, aged 20-89 years old) performed a battery of 33 daily activities in a standardized laboratory setting while wearing a portable metabolic unit to measure EE that was used to gauge metabolic intensity. Participants also wore a Tri-axial accelerometer on the right wrist. Results from random forests algorithm were quite accurate at recognizing PA type; the F1-Score range across age groups was: sedentary [0.955 – 0.973], locomotion [0.942 – 0.964], and lifestyle [0.913 – 0.949]. Recognizing PA intensity resulted in lower performance; the F1-Score range across age groups was: sedentary [0.919 – 0.947], light [0.813 – 0.828], and moderate [0.846–0.875]. The root mean square error range was [0.835–1.009] for the estimation of EE. The F1-Score range for recognizing individual PAs was [0.263–0.784]. In conclusion, machine learning models used to represent accelerometry data are robust to age differences and a generalizable approach might be sufficient to utilize in accelerometer-based wearables.

Download Full-text

Data-Driven Approach for Predicting and Explaining the Risk of Long-Term Unemployment

E3S Web of Conferences ◽

10.1051/e3sconf/202021401023 ◽

2020 ◽

Vol 214 ◽

pp. 01023

Author(s):

Linan (Frank) Zhao

Keyword(s):

Machine Learning ◽

Age Groups ◽

Learning Models ◽

Public Authorities ◽

Ensemble Machine Learning ◽

European Public ◽

Data Driven Approach ◽

Using Data ◽

Machine Learning Models

Long-term unemployment has significant societal impact and is of particular concerns for policymakers with regard to economic growth and public finances. This paper constructs advanced ensemble machine learning models to predict citizens’ risks of becoming long-term unemployed using data collected from European public authorities for employment service. The proposed model achieves 81.2% accuracy on identifying citizens with high risks of long-term unemployment. This paper also examines how to dissect black-box machine learning models by offering explanations at both a local and global level using SHAP, a state-of-the-art model-agnostic approach to explain factors that contribute to long-term unemployment. Lastly, this paper addresses an under-explored question when applying machine learning in the public domain, that is, the inherent bias in model predictions. The results show that popular models such as gradient boosted trees may produce unfair predictions against senior age groups and immigrants. Overall, this paper sheds light on the recent increasing shift for governments to adopt machine learning models to profile and prioritize employment resources to reduce the detrimental effects of long-term unemployment and improve public welfare.

Download Full-text

Chapter 15. Human-Centered Concept Explanations for Neural Networks

10.3233/faia210362 ◽

2021 ◽

Author(s):

Chih-Kuan Yeh ◽

Been Kim ◽

Pradeep Ravikumar

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Case Studies ◽

Real World ◽

Deep Neural Networks ◽

Learning Models ◽

Real World Applications ◽

The Right ◽

Concept Activation ◽

Machine Learning Models

Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the model is making its predictions at the right level of abstraction. For example, providing importance weights to individual pixels in an image can only express which parts of that particular image is important to the model, but humans may prefer an explanation which explains the prediction by concept-based thinking. In this work, we review the emerging area of concept based explanations. We start by introducing concept explanations including the class of Concept Activation Vectors (CAV) which characterize concepts using vectors in appropriate spaces of neural activations, and discuss different properties of useful concepts, and approaches to measure the usefulness of concept vectors. We then discuss approaches to automatically extract concepts, and approaches to address some of their caveats. Finally, we discuss some case studies that showcase the utility of such concept-based explanations in synthetic settings and real world applications.

Download Full-text

Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1177/0018720816672308 ◽

2017 ◽

Vol 59 (1) ◽

pp. 134-146 ◽

Cited By ~ 8

Author(s):

Brett J. Borghetti ◽

Joseph J. Giametta ◽

Christina F. Rusnock

Keyword(s):

Machine Learning ◽

Adaptive Systems ◽

Model Performance ◽

Machine Learning Algorithms ◽

Training Data ◽

State Assessments ◽

Learning Models ◽

Dynamic Task ◽

Operator Workload ◽

Machine Learning Models

Objective: We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Background: Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. Method: We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Results: Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. Conclusion: We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. Application: These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.

Download Full-text

Forecasting admissions in psychiatric hospitals before and during Covid-19

10.1101/2021.07.16.21260200 ◽

2021 ◽

Author(s):

Jan Wolff ◽

Ansgar Klimke ◽

Michael Marschollek ◽

Tim Kacprowski

Keyword(s):

Machine Learning ◽

Time Series ◽

Hospital Admissions ◽

Model Performance ◽

Psychiatric Hospitals ◽

Time Series Models ◽

Learning Models ◽

One Step ◽

Machine Learning Models ◽

Better Than

Introduction The COVID-19 pandemic has strong effects on most health care systems and individual services providers. Forecasting of admissions can help for the efficient organisation of hospital care. We aimed to forecast the number of admissions to psychiatric hospitals before and during the COVID-19 pandemic and we compared the performance of machine learning models and time series models. This would eventually allow to support timely resource allocation for optimal treatment of patients. Methods We used admission data from 9 psychiatric hospitals in Germany between 2017 and 2020. We compared machine learning models with time series models in weekly, monthly and yearly forecasting before and during the COVID-19 pandemic. Our models were trained and validated with data from the first two years and tested in prospectively sliding time-windows in the last two years. Results A total of 90,686 admissions were analysed. The models explained up to 90% of variance in hospital admissions in 2019 and 75% in 2020 with the effects of the COVID-19 pandemic. The best models substantially outperformed a one-step seasonal naive forecast (seasonal mean absolute scaled error (sMASE) 2019: 0.59, 2020: 0.76). The best model in 2019 was a machine learning model (elastic net, mean absolute error (MAE): 7.25). The best model in 2020 was a time series model (exponential smoothing state space model with Box-Cox transformation, ARMA errors and trend and seasonal components, MAE: 10.44), which adjusted more quickly to the shock effects of the COVID-19 pandemic. Models forecasting admissions one week in advance did not perform better than monthly and yearly models in 2019 but they did in 2020. The most important features for the machine learning models were calendrical variables. Conclusion Model performance did not vary much between different modelling approaches before the COVID-19 pandemic and established forecasts were substantially better than one-step seasonal naive forecasts. However, weekly time series models adjusted quicker to the COVID-19 related shock effects. In practice, different forecast horizons could be used simultaneously to allow both early planning and quick adjustments to external effects.

Download Full-text

Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: longitudinal cohort study using cardiovascular disease as exemplar

BMJ ◽

10.1136/bmj.m3919 ◽

2020 ◽

pp. m3919

Author(s):

Yan Li ◽

Matthew Sperrin ◽

Darren M Ashcroft ◽

Tjeerd Pieter van Staa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Cardiovascular Disease ◽

Cohort Study ◽

Statistical Models ◽

Model Performance ◽

Population Level ◽

Survival Models ◽

Learning Models ◽

Machine Learning Models

AbstractObjectiveTo assess the consistency of machine learning and statistical techniques in predicting individual level and population level risks of cardiovascular disease and the effects of censoring on risk predictions.DesignLongitudinal cohort study from 1 January 1998 to 31 December 2018.Setting and participants3.6 million patients from the Clinical Practice Research Datalink registered at 391 general practices in England with linked hospital admission and mortality records.Main outcome measuresModel performance including discrimination, calibration, and consistency of individual risk prediction for the same patients among models with comparable model performance. 19 different prediction techniques were applied, including 12 families of machine learning models (grid searched for best models), three Cox proportional hazards models (local fitted, QRISK3, and Framingham), three parametric survival models, and one logistic model.ResultsThe various models had similar population level performance (C statistics of about 0.87 and similar calibration). However, the predictions for individual risks of cardiovascular disease varied widely between and within different types of machine learning and statistical models, especially in patients with higher risks. A patient with a risk of 9.5-10.5% predicted by QRISK3 had a risk of 2.9-9.2% in a random forest and 2.4-7.2% in a neural network. The differences in predicted risks between QRISK3 and a neural network ranged between –23.2% and 0.1% (95% range). Models that ignored censoring (that is, assumed censored patients to be event free) substantially underestimated risk of cardiovascular disease. Of the 223 815 patients with a cardiovascular disease risk above 7.5% with QRISK3, 57.8% would be reclassified below 7.5% when using another model.ConclusionsA variety of models predicted risks for the same patients very differently despite similar model performances. The logistic models and commonly used machine learning models should not be directly applied to the prediction of long term risks without considering censoring. Survival models that consider censoring and that are explainable, such as QRISK3, are preferable. The level of consistency within and between models should be routinely assessed before they are used for clinical decision making.

Download Full-text

Machine Learning based COVID-19 Diagnosis from Blood Tests with Robustness to Domain Shifts

10.1101/2021.04.06.21254997 ◽

2021 ◽

Author(s):

Theresa Roland ◽

Carl Boeck ◽

Thomas Tschoellitsch ◽

Alexander Maletzky ◽

Sepp Hochreiter ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Hospital Setting ◽

Model Performance ◽

Learning Models ◽

Negative Effects ◽

Testing Procedures ◽

Blood Tests ◽

Over Time ◽

Machine Learning Models

We investigate machine learning models that identify COVID-19 positive patients and estimate the mortality risk based on routinely acquired blood tests in a hospital setting. However, during pandemics or new outbreaks, disease and testing characteristics change, thus we face domain shifts. Domain shifts can be caused, e.g., by changes in the disease prevalence (spreading or tested population), by refined RT-PCR testing procedures (taking samples, laboratory), or by virus mutations. Therefore, machine learning models for diagnosing COVID-19 or other diseases may not be reliable and degrade in performance over time. To countermand this effect, we propose methods that first identify domain shifts and then reverse their negative effects on the model performance. Frequent re-training and re-assessment, as well as stronger weighting of more recent samples, keeps model performance and credibility at a high level over time. Our diagnosis models are constructed and tested on large-scale data sets, steadily adapt to observed domain shifts, and maintain high ROC AUC values along pandemics.

Download Full-text

Employing Machine Learning to Estimate Hallmark Measures of Physical Activities from Wrist-worn Devices Across Age Groups

10.20944/preprints202103.0333.v1 ◽

2021 ◽

Author(s):

Mamoun T. Mardini ◽

Chen Bai ◽

Amal A. Wanigatunga ◽

Santiago Saldana ◽

Ramon Casanova ◽

...

Keyword(s):

Physical Activity ◽

Machine Learning ◽

Energy Expenditure ◽

Age Groups ◽

Physical Activities ◽

Accelerometer Data ◽

Activity Type ◽

Physical Activity Intensity ◽

Activity Intensity ◽

Score Range

Wrist-worn fitness trackers and smartwatches are proliferating with an incessant attention towards health tracking. Given the growing popularity of wrist-worn devices across all age groups, a rigorous evaluation for recognizing hallmark measures of physical activities and estimating energy expenditure is needed to compare their accuracy across the lifespan. The goal of the study was to build machine learning models to recognize physical activity type (sedentary, locomotion, and lifestyle) and intensity (low, light, and moderate), identify individual physical activities, and estimate energy expenditure. The primary aim of this study was to build and compare models for different age groups: young [20-50 years], middle (50-70 years], and old (70-89 years]. Participants (n = 253, 62% women, aged 20-89 years old) performed a battery of 33 daily activities in a standardized laboratory setting while wearing a portable metabolic unit to measure energy expenditure that was used to gauge metabolic intensity. Tri-axial accelerometer collected data at 80-100 Hz from the right wrist that was processed for 49 features. Results from random forests algorithm were quite accurate in recognizing physical activity type, the F1-Score range across age groups was: sedentary [0.955 – 0.973], locomotion [0.942 – 0.964], and lifestyle [0.913 – 0.949]. Recognizing physical activity intensity resulted in lower performance, the F1-Score range across age groups was: sedentary [0.919 – 0.947], light [0.813 – 0.828], and moderate [0.846 – 0.875]. The root mean square error range was [0.835 – 1.009] for the estimation of energy expenditure. The F1-Score range for recognizing individual physical activities was [0.263 – 0.784]. Performances were relatively similar and the accelerometer data features were ranked similarly between age groups. In conclusion, data features derived from wrist worn accelerometers lead to high-moderate accuracy estimating physical activity type, intensity and energy expenditure and are robust to potential age-differences.

Download Full-text

On Missingness Features in Machine Learning Models for Critical Care: Observational Study (Preprint)

10.2196/preprints.25022 ◽

2020 ◽

Author(s):

Janmajay Singh ◽

Masahiro Sato ◽

Tomoko Ohkuma

Keyword(s):

Machine Learning ◽

Length Of Stay ◽

Electronic Health Records ◽

Prediction Models ◽

External Validation ◽

Model Performance ◽

Learning Models ◽

Health Records ◽

Electronic Health ◽

Machine Learning Models

BACKGROUND Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient’s health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated. OBJECTIVE The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings. METHODS A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration. RESULTS Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections. CONCLUSIONS This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further.

Download Full-text

Three-level Sleep Stage Classification Based on Wrist-worn Accelerometry Data Alone

10.1101/2021.08.10.455812 ◽

2021 ◽

Author(s):

Jian Hu ◽

Haochang Shou

Keyword(s):

Machine Learning ◽

Sequential Analysis ◽

Short Term Memory ◽

Sleep Stage ◽

Model Performance ◽

Training Data ◽

Sleep Stages ◽

Learning Models ◽

Local Variability ◽

Machine Learning Models

Objective: The use of wearable sensor devices on daily basis to track real-time movements during wake and sleep has provided opportunities for automatic sleep quantification using such data. Existing algorithms for classifying sleep stages often require large training data and multiple input signals including heart rate and respiratory data. We aimed to examine the capability of classifying sleep stages using sensible features directly from accelerometers only with the aid of advanced recurrent neural networks. Materials and Methods: We analyzed a publicly available dataset with accelerometry data in 5s epoch length and polysomnography assessments. We developed long short-term memory (LSTM) models that take the 3-axis accelerations, angles, and temperatures from concurrent and historic observation windows to predict wake, REM and non-REM sleep. Leave-one-subject-out experiments were conducted to compare and evaluate the model performance with conventional nonsequential machine learning models using metrics such as multiclass training and testing accuracy, weighted precision, F1 score and area-under-the-curve (AUC). Results: Our sequential analysis framework outperforms traditional non-sequential models in all aspects of model evaluation metrics. We achieved an average of 65% and a maximum of 81% validation accuracy for classifying three sleep labels even with a relatively small training sample of clinical visitors. The presence of two additional derived variables, local variability and range, have shown to strongly improve the model performance. Discussion : Results indicate that it is crucial to account for deep temporal dependency and assess local variability of the features. The post-hoc analysis of individual model performances on subjects' demographic characteristics also suggest the need of including pathological samples in the training data in order to develop robust machine learning models that are capable of capturing normal and anomaly sleep patterns in the population.

Download Full-text

An Adaptive Deep Ensemble Learning Method for Dynamic Evolving Diagnostic Task Scenarios

Diagnostics ◽

10.3390/diagnostics11122288 ◽

2021 ◽

Vol 11 (12) ◽

pp. 2288

Author(s):

Kaixiang Su ◽

Jiao Wu ◽

Dongxiao Gu ◽

Shanlin Yang ◽

Shuyuan Deng ◽

...

Keyword(s):

Machine Learning ◽

Ensemble Learning ◽

Model Performance ◽

Optimal Number ◽

Training Data ◽

Learning Method ◽

Learning Models ◽

Proposed Model ◽

Public Datasets ◽

Machine Learning Models

Increasingly, machine learning methods have been applied to aid in diagnosis with good results. However, some complex models can confuse physicians because they are difficult to understand, while data differences across diagnostic tasks and institutions can cause model performance fluctuations. To address this challenge, we combined the Deep Ensemble Model (DEM) and tree-structured Parzen Estimator (TPE) and proposed an adaptive deep ensemble learning method (TPE-DEM) for dynamic evolving diagnostic task scenarios. Different from previous research that focuses on achieving better performance with a fixed structure model, our proposed model uses TPE to efficiently aggregate simple models more easily understood by physicians and require less training data. In addition, our proposed model can choose the optimal number of layers for the model and the type and number of basic learners to achieve the best performance in different diagnostic task scenarios based on the data distribution and characteristics of the current diagnostic task. We tested our model on one dataset constructed with a partner hospital and five UCI public datasets with different characteristics and volumes based on various diagnostic tasks. Our performance evaluation results show that our proposed model outperforms other baseline models on different datasets. Our study provides a novel approach for simple and understandable machine learning models in tasks with variable datasets and feature sets, and the findings have important implications for the application of machine learning models in computer-aided diagnosis.

Download Full-text