Determining jumping performance from a single body-worn accelerometer using machine learning

External peak power in the countermovement jump is frequently used to monitor athlete training. The gold standard method uses force platforms, but they are unsuitable for field-based testing. However, alternatives based on jump flight time or Newtonian methods applied to inertial sensor data have not been sufficiently accurate for athlete monitoring. Instead, we developed a machine learning model based on characteristic features (functional principal components) extracted from a single body-worn accelerometer. Data were collected from 69 male and female athletes at recreational, club or national levels, who performed 696 jumps in total. We considered vertical countermovement jumps (with and without arm swing), sensor anatomical locations, machine learning models and whether to use resultant or triaxial signals. Using a novel surrogate model optimisation procedure, we obtained the lowest errors with a support vector machine when using the resultant signal from a lower back sensor in jumps without arm swing. This model had a peak power RMSE of 2.3 W·kg-1 (5.1% of the mean), estimated using nested cross validation and supported by an independent holdout test (2.0 W·kg-1). This error is lower than in previous studies, although it is not yet sufficiently accurate for a field-based method. Our results demonstrate that functional data representations work well in machine learning by reducing model complexity in applications where signals are aligned in time. Our optimisation procedure also was shown to be robust can be used in wider applications with low-cost, noisy objective functions.

Download Full-text

Generalisable FPCA-based Models for Predicting Peak Power in Vertical Jumping using Accelerometer Data

10.23889/suthesis.58286 ◽

2021 ◽

Author(s):

◽

Mark G. E. White

Keyword(s):

Peak Power ◽

Process Model ◽

Learning Algorithm ◽

Inertial Sensor ◽

Functional Principal Component Analysis ◽

Sensor Data ◽

Model Parameters ◽

Anatomical Location ◽

Accelerometer Data ◽

Gold Standard Method

Peak power in the countermovement jump is correlated with various measures of sports performance and can be used to monitor athlete training. The gold standard method for determining peak power uses force platforms, but they are unsuitable for field-based testing favoured by practitioners. Alternatives include predicting peak power from jump flight times, or using Newtonian methods based on body-worn inertial sensor data, but so far neither has yielded sufficiently accurate estimates. This thesis aims to develop a generalisable model for predicting peak power based on Functional Principal Component Analysis applied to body-worn accelerometer data. Data was collected from 69 male and female adults, engaged in sports at recreational, club or national levels. They performed up to 16 countermovement jumps each, with and without arm swing, 696 jumps in total. Peak power criterion measures were obtained from force platforms, and characteristic features from accelerometer data were extracted from four sensors attached to the lower back, upper back and both shanks. The best machine learning algorithm, jump type and sensor anatomical location were determined in this context. The investigation considered signal representation (resultant, triaxial or a suitable transform), preprocessing (smoothing, time window and curve registration), feature selection and data augmentation (signal rotations and SMOTER). A novel procedure optimised the model parameters based on Particle Swarm applied to a surrogate Gaussian Process model. Model selection and evaluation were based on nested cross validation (Monte Carlo design). The final optimal model had an RMSE of 2.5 W·kg-1, which compares favourably to earlier research (4.9 ± 1.7 W·kg-1 for flight-time formulae and 10.7 ± 6.3 W·kg-1 for Newtonian sensor-based methods). Whilst this is not yet sufficiently accurate for applied practice, this thesis has developed and comprehensively evaluated new techniques, which will be valuable to future biomechanical applications.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div> </div>

Download Full-text

Automatic Identification of Upper Extremity Rehabilitation Exercise Type and Dose Using Body-Worn Sensors and Machine Learning: A Pilot Study

Digital Biomarkers ◽

10.1159/000516619 ◽

2021 ◽

pp. 158-166

Author(s):

Noah Balestra ◽

Gaurav Sharma ◽

Linda M. Riek ◽

Ania Busza

Keyword(s):

Machine Learning ◽

Upper Extremity ◽

Sensor Data ◽

Inpatient Setting ◽

Accelerometer Data ◽

Data Set ◽

Machine Learning Classification ◽

Exercise Type ◽

Exercise Dose ◽

Rehabilitation Exercises

Background: Prior studies suggest that participation in rehabilitation exercises improves motor function poststroke; however, studies on optimal exercise dose and timing have been limited by the technical challenge of quantifying exercise activities over multiple days. Objectives: The objectives of this study were to assess the feasibility of using body-worn sensors to track rehabilitation exercises in the inpatient setting and investigate which recording parameters and data analysis strategies are sufficient for accurately identifying and counting exercise repetitions. Methods: MC10 BioStampRC® sensors were used to measure accelerometer and gyroscope data from upper extremities of healthy controls (n = 13) and individuals with upper extremity weakness due to recent stroke (n = 13) while the subjects performed 3 preselected arm exercises. Sensor data were then labeled by exercise type and this labeled data set was used to train a machine learning classification algorithm for identifying exercise type. The machine learning algorithm and a peak-finding algorithm were used to count exercise repetitions in non-labeled data sets. Results: We achieved a repetition counting accuracy of 95.6% overall, and 95.0% in patients with upper extremity weakness due to stroke when using both accelerometer and gyroscope data. Accuracy was decreased when using fewer sensors or using accelerometer data alone. Conclusions: Our exploratory study suggests that body-worn sensor systems are technically feasible, well tolerated in subjects with recent stroke, and may ultimately be useful for developing a system to measure total exercise “dose” in poststroke patients during clinical rehabilitation or clinical trials.

Download Full-text

Applying Machine Learning for Improving Performance Classification on Driving Behavior

IJITEE (International Journal of Information Technology and Electrical Engineering) ◽

10.22146/ijitee.56919 ◽

2021 ◽

Vol 4 (1) ◽

pp. 8

Author(s):

Ahmad Iwan Fadli ◽

Selo Sulistyo ◽

Sigit Wibowo

Keyword(s):

Machine Learning ◽

Traffic Accident ◽

Large Scale ◽

Detection System ◽

Difficult Problem ◽

Sensor Data ◽

Driving Safety ◽

Support Vector ◽

Classification Methods ◽

Machine Learning Classification

Traffic accident is a very difficult problem to handle on a large scale in a country. Indonesia is one of the most populated, developing countries that use vehicles for daily activities as its main transportation. It is also the country with the largest number of car users in Southeast Asia, so driving safety needs to be considered. Using machine learning classification method to determine whether a driver is driving safely or not can help reduce the risk of driving accidents. We created a detection system to classify whether the driver is driving safely or unsafely using trip sensor data, which include Gyroscope, Acceleration, and GPS. The classification methods used in this study are Random Forest (RF) classification algorithm, Support Vector Machine (SVM), and Multilayer Perceptron (MLP) by improving data preprocessing using feature extraction and oversampling methods. This study shows that RF has the best performance with 98% accuracy, 98% precision, and 97% sensitivity using the proposed preprocessing stages compared to SVM or MLP.

Download Full-text

Detecting Face Touching Using Smartwatches to Mitigate the Spread of COVID-19: Pilot Study (Preprint)

10.2196/preprints.28799 ◽

2021 ◽

Author(s):

Chen Bai ◽

Yu-Peng Chen ◽

Adam Wolach ◽

Lisa Anthony ◽

Mamoun Mardini

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Respiratory Diseases ◽

Window Size ◽

Support Vector ◽

Accelerometer Data ◽

Respiratory Illnesses ◽

Motion Data ◽

Machine Learning Methods

BACKGROUND Frequent spontaneous facial self-touches, predominantly during outbreaks, have the theoretical potential to be a mechanism of contracting and transmitting diseases. Despite the recent advent of vaccines, behavioral approaches remain an integral part of reducing the spread of COVID-19 and other respiratory illnesses. Real-time biofeedback of face touching can potentially mitigate the spread of respiratory diseases. The gap addressed in this study is the lack of an on-demand platform that utilizes motion data from smartwatches to accurately detect face touching. OBJECTIVE The aim of this study was to utilize the functionality and the spread of smartwatches to develop a smartwatch application to identifying motion signatures that are mapped accurately to face touching. METHODS Participants (n=10, 50% women, aged 20-83) performed 10 physical activities classified into: face touching (FT) and non-face touching (NFT) categories, in a standardized laboratory setting. We developed a smartwatch application on Samsung Galaxy Watch to collect raw accelerometer data from participants. Then, data features were extracted from consecutive non-overlapping windows varying from 2-16 seconds. We examined the performance of state-of-the-art machine learning methods on face touching movements recognition (FT vs NFT) and individual activity recognition (IAR): logistic regression, support vector machine, decision trees and random forest. RESULTS Machine learning models were accurate in recognizing face touching categories; logistic regression achieved the best performance across all metrics (Accuracy: 0.93 +/- 0.08, Recall: 0.89 +/- 0.16, Precision: 0.93 +/- 0.08, F1-score: 0.90 +/- 0.11, AUC: 0.95 +/- 0.07) at the window size of 5 seconds. IAR models resulted in lower performance; the random forest classifier achieved the best performance across all metrics (Accuracy: 0.70 +/- 0.14, Recall: 0.70 +/- 0.14, Precision: 0.70 +/- 0.16, F1-score: 0.67 +/- 0.15) at the window size of 9 seconds. CONCLUSIONS Wearable devices, powered with machine learning, are effective in detecting facial touches. This is highly significant during respiratory infection outbreaks, as it has a great potential to refrain people from touching their faces and potentially mitigate the possibility of transmitting COVID-19 and future respiratory diseases.

Download Full-text

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

mBio ◽

10.1128/mbio.00434-20 ◽

2020 ◽

Vol 11 (3) ◽

Cited By ~ 9

Author(s):

Begüm D. Topçuoğlu ◽

Nicholas A. Lesniak ◽

Mack T. Ruffin ◽

Jenna Wiens ◽

Patrick D. Schloss

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Sequence Data ◽

Characteristic Curve ◽

Predictive Performance ◽

Model Complexity ◽

Support Vector ◽

Classification Problems ◽

Microbial Biomarkers

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Download Full-text

Machine Learning for Long Cycle Maintenance Prediction of Wind Turbine

Sensors ◽

10.3390/s19071671 ◽

2019 ◽

Vol 19 (7) ◽

pp. 1671 ◽

Cited By ~ 5

Author(s):

Chia-Hung Yeh ◽

Min-Hui Lin ◽

Chien-Hung Lin ◽

Cheng-En Yu ◽

Mei-Juan Chen

Keyword(s):

Machine Learning ◽

Wind Turbine ◽

Wind Turbines ◽

Sensor Data ◽

Hybrid Network ◽

Support Vector ◽

Significant Information ◽

Long Cycle ◽

Power Company ◽

Maintenance Time

Within Internet of Things (IoT) sensors, the challenge is how to dig out the potentially valuable information from the collected data to support decision making. This paper proposes a method based on machine learning to predict long cycle maintenance time of wind turbines for efficient management in the power company. Long cycle maintenance time prediction makes the power company operate wind turbines as cost-effectively as possible to maximize the profit. Sensor data including operation data, maintenance time data, and event codes are collected from 31 wind turbines in two wind farms. Data aggregation is performed to filter out some errors and get significant information from the data. Then, the hybrid network is built to train the predictive model based on the convolutional neural network (CNN) and support vector machine (SVM). The experimental results show that the prediction of the proposed method reaches high accuracy, which helps drive up the efficiency of wind turbine maintenance.

Download Full-text

System Identification: A Machine Learning Perspective

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-053018-023744 ◽

2019 ◽

Vol 2 (1) ◽

pp. 281-304 ◽

Cited By ~ 4

Author(s):

A. Chiuso ◽

G. Pillonetto

Keyword(s):

Machine Learning ◽

System Identification ◽

Reproducing Kernel ◽

Model Complexity ◽

Support Vector ◽

Model Order Selection ◽

Model Order ◽

Selection Step ◽

Kernel Hilbert Spaces ◽

Gaussian Regression

Estimation of functions from sparse and noisy data is a central theme in machine learning. In the last few years, many algorithms have been developed that exploit Tikhonov regularization theory and reproducing kernel Hilbert spaces. These are the so-called kernel-based methods, which include powerful approaches like regularization networks, support vector machines, and Gaussian regression. Recently, these techniques have also gained popularity in the system identification community. In both linear and nonlinear settings, kernels that incorporate information on dynamic systems, such as the smoothness and stability of the input–output map, can challenge consolidated approaches based on parametric model structures. In the classical parametric setting, the complexity of the model (the model order) needs to be chosen, typically from a finite family of alternatives, by trading bias and variance. This (discrete) model order selection step may be critical, especially when the true model does not belong to the model class. In regularization-based approaches, model complexity is controlled by tuning (continuous) regularization parameters, making the model selection step more robust. In this article, we review these new kernel-based system identification approaches and discuss extensions based on nuclear and [Formula: see text] norms.

Download Full-text

Classification of Children’s Sitting Postures Using Machine Learning Algorithms

Applied Sciences ◽

10.3390/app8081280 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1280 ◽

Cited By ~ 14

Author(s):

Yong Kim ◽

Youngdoo Son ◽

Wonjoon Kim ◽

Byungki Jin ◽

Myung Yun

Keyword(s):

Neural Network ◽

Machine Learning ◽

Monitoring System ◽

Multinomial Logistic Regression ◽

Learning Algorithms ◽

Feedback System ◽

Machine Learning Algorithms ◽

Sensor Data ◽

Future Research ◽

Support Vector

Sitting on a chair in an awkward posture or sitting for a long period of time is a risk factor for musculoskeletal disorders. A postural habit that has been formed cannot be changed easily. It is important to form a proper postural habit from childhood as the lumbar disease during childhood caused by their improper posture is most likely to recur. Thus, there is a need for a monitoring system that classifies children’s sitting postures. The purpose of this paper is to develop a system for classifying sitting postures for children using machine learning algorithms. The convolutional neural network (CNN) algorithm was used in addition to the conventional algorithms: Naïve Bayes classifier (NB), decision tree (DT), neural network (NN), multinomial logistic regression (MLR), and support vector machine (SVM). To collect data for classifying sitting postures, a sensing cushion was developed by mounting a pressure sensor mat (8 × 8) inside children’s chair seat cushion. Ten children participated, and sensor data was collected by taking a static posture for the five prescribed postures. The accuracy of CNN was found to be the highest as compared with those of the other algorithms. It is expected that the comprehensive posture monitoring system would be established through future research on enhancing the classification algorithm and providing an effective feedback system.

Download Full-text

Comparison of common machine learning algorithms trained with multi-zone models for identifying the location and strength of indoor pollutant sources

Indoor and Built Environment ◽

10.1177/1420326x20931576 ◽

2020 ◽

pp. 1420326X2093157

Author(s):

Yu Huang ◽

Zhi Gao ◽

Hongguang Zhang

Keyword(s):

Machine Learning ◽

Meteorological Parameters ◽

Human Life ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Identification Accuracy ◽

Sensor Data ◽

Support Vector ◽

Accurate Identification ◽

Pollutant Sources

The accurate identification of the characteristics of pollutant sources can effectively prevent the loss of human life and property damage caused by the sudden release of harmful chemicals in emergency situations. Machine learning algorithms, artificial neural network (ANN), support vector machine (SVM), k-nearest neighbour (KNN) and naive Bayesian (NB) classification can be used to identify the location of pollutant sources with limited sensor data inputs. In this study, the identification accuracy of the four above-mentioned machine learning algorithms was investigated and compared, considering the different sensor layouts, eigenvector inputs, meteorological parameters and number of samples. The results show that the collection of pollutant concentrations over an extended period of time could improve identification accuracy. Additional sensors were required to reach the same identification accuracy after the introduction of distributed meteorological parameters. Increasing the number of trained samples by a factor of five improved the identification accuracy of KNN by 22% and that of SVM by 1.7%; however, ANN and NB classification remained basically unchanged. When identifying the release mass of the pollutant source, multiple linear, ANN and SVM regression models were adopted. Results show that ANN performs best, whereas SVM provides the least optimal performance.

Download Full-text