Combining Internal- and External-Training-Loads to Predict Non-Contact Injuries in Soccer

The large amount of features recorded from GPS and inertial sensors (external load) and well-being questionnaires (internal load) can be used together in a multi-dimensional non-linear machine learning based model for a better prediction of non-contact injuries. In this study we put forward the main hypothesis that the use of such models would be able to inform better about injury risks by considering the evolution of both internal and external loads over two horizons (one week and one month). Predictive models were trained with data collected by both GPS and subjective questionnaires and injury data from 40 elite male soccer players over one season. Various classification machine-learning algorithms that performed best on external and internal loads features were compared using standard performance metrics such as accuracy, precision, recall and the area under the receiver operator characteristic curve. In particular, tree-based algorithms based on non-linear models with an important interpretation aspect were privileged as they can help to understand internal and external load features impact on injury risk. For 1-week injury prediction, internal load features data were more accurate than external load features while for 1-month injury prediction, the best performances of classifiers were reached by combining internal and external load features.

Download Full-text

A machine learning approach to predict ethnicity using personal name and census location in Canada

PLoS ONE ◽

10.1371/journal.pone.0241239 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241239

Author(s):

Kai On Wong ◽

Osmar R. Zaïane ◽

Faith G. Davis ◽

Yutaka Yasui

Keyword(s):

Machine Learning ◽

First Nations ◽

Predictive Value ◽

Large Scale ◽

Performance Metrics ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approach ◽

Machine Learning Approach

Background Canada is an ethnically-diverse country, yet its lack of ethnicity information in many large databases impedes effective population research and interventions. Automated ethnicity classification using machine learning has shown potential to address this data gap but its performance in Canada is largely unknown. This study conducted a large-scale machine learning framework to predict ethnicity using a novel set of name and census location features. Methods Using census 1901, the multiclass and binary class classification machine learning pipelines were developed. The 13 ethnic categories examined were Aboriginal (First Nations, Métis, Inuit, and all-combined)), Chinese, English, French, Irish, Italian, Japanese, Russian, Scottish, and others. Machine learning algorithms included regularized logistic regression, C-support vector, and naïve Bayes classifiers. Name features consisted of the entire name string, substrings, double-metaphones, and various name-entity patterns, while location features consisted of the entire location string and substrings of province, district, and subdistrict. Predictive performance metrics included sensitivity, specificity, positive predictive value, negative predictive value, F1, Area Under the Curve for Receiver Operating Characteristic curve, and accuracy. Results The census had 4,812,958 unique individuals. For multiclass classification, the highest performance achieved was 76% F1 and 91% accuracy. For binary classifications for Chinese, French, Italian, Japanese, Russian, and others, the F1 ranged 68–95% (median 87%). The lower performance for English, Irish, and Scottish (F1 ranged 63–67%) was likely due to their shared cultural and linguistic heritage. Adding census location features to the name-based models strongly improved the prediction in Aboriginal classification (F1 increased from 50% to 84%). Conclusions The automated machine learning approach using only name and census location features can predict the ethnicity of Canadians with varying performance by specific ethnic categories.

Download Full-text

Machine Learning Outperforms Logistic Regression Analysis to Predict Next-Season NHL Player Injury: An Analysis of 2322 Players From 2007 to 2017

Orthopaedic Journal of Sports Medicine ◽

10.1177/2325967120953404 ◽

2020 ◽

Vol 8 (9) ◽

pp. 232596712095340

Author(s):

Bryan C. Luu ◽

Audrey L. Wright ◽

Heather S. Haeberle ◽

Jaret M. Karnuta ◽

Mark S. Schickendantz ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Injury Risk ◽

Operating Characteristic ◽

Performance Metrics ◽

Historical Data ◽

Characteristic Curve ◽

National Hockey League ◽

K Nearest Neighbors ◽

Machine Learning Model

Background: The opportunity to quantitatively predict next-season injury risk in the National Hockey League (NHL) has become a reality with the advent of advanced computational processors and machine learning (ML) architecture. Unlike static regression analyses that provide a momentary prediction, ML algorithms are dynamic in that they are readily capable of imbibing historical data to build a framework that improves with additive data. Purpose: To (1) characterize the epidemiology of publicly reported NHL injuries from 2007 to 2017, (2) determine the validity of a machine learning model in predicting next-season injury risk for both goalies and position players, and (3) compare the performance of modern ML algorithms versus logistic regression (LR) analyses. Study Design: Descriptive epidemiology study. Methods: Professional NHL player data were compiled for the years 2007 to 2017 from 2 publicly reported databases in the absence of an official NHL-approved database. Attributes acquired from each NHL player from each professional year included age, 85 performance metrics, and injury history. A total of 5 ML algorithms were created for both position player and goalie data: random forest, K Nearest Neighbors, Naïve Bayes, XGBoost, and Top 3 Ensemble. LR was also performed for both position player and goalie data. Area under the receiver operating characteristic curve (AUC) primarily determined validation. Results: Player data were generated from 2109 position players and 213 goalies. For models predicting next-season injury risk for position players, XGBoost performed the best with an AUC of 0.948, compared with an AUC of 0.937 for LR ( P < .0001). For models predicting next-season injury risk for goalies, XGBoost had the highest AUC with 0.956, compared with an AUC of 0.947 for LR ( P < .0001). Conclusion: Advanced ML models such as XGBoost outperformed LR and demonstrated good to excellent capability of predicting whether a publicly reportable injury is likely to occur the next season.

Download Full-text

Machine Learning Outperforms Regression Analysis to Predict Next-Season Major League Baseball Player Injuries: Epidemiology and Validation of 13,982 Player-Years From Performance and Injury Profile Trends, 2000-2017

Orthopaedic Journal of Sports Medicine ◽

10.1177/2325967120963046 ◽

2020 ◽

Vol 8 (11) ◽

pp. 232596712096304

Author(s):

Jaret M. Karnuta ◽

Bryan C. Luu ◽

Heather S. Haeberle ◽

Paul M. Saluan ◽

Salvatore J. Frangiamore ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Injury Risk ◽

Performance Metrics ◽

Major League Baseball ◽

Characteristic Curve ◽

Ensemble Classification ◽

Anatomic Site ◽

Predictive Algorithm ◽

Major League

Background: Machine learning (ML) allows for the development of a predictive algorithm capable of imbibing historical data on a Major League Baseball (MLB) player to accurately project the player's future availability. Purpose: To determine the validity of an ML model in predicting the next-season injury risk and anatomic injury location for both position players and pitchers in the MLB. Study Design: Descriptive epidemiology study. Methods: Using 4 online baseball databases, we compiled MLB player data, including age, performance metrics, and injury history. A total of 84 ML algorithms were developed. The output of each algorithm reported whether the player would sustain an injury the following season as well as the injury’s anatomic site. The area under the receiver operating characteristic curve (AUC) primarily determined validation. Results: Player data were generated from 1931 position players and 1245 pitchers, with a mean follow-up of 4.40 years (13,982 player-years) between the years of 2000 and 2017. Injured players spent a total of 108,656 days on the disabled list, with a mean of 34.21 total days per player. The mean AUC for predicting next-season injuries was 0.76 among position players and 0.65 among pitchers using the top 3 ensemble classification. Back injuries had the highest AUC among both position players and pitchers, at 0.73. Advanced ML models outperformed logistic regression in 13 of 14 cases. Conclusion: Advanced ML models generally outperformed logistic regression and demonstrated fair capability in predicting publicly reportable next-season injuries, including the anatomic region for position players, although not for pitchers.

Download Full-text

An Investigation of Machine Learning Algorithms on COVID-19 Dataset

10.21203/rs.3.rs-70985/v1 ◽

2020 ◽

Author(s):

Prasannavenkatesan Theerthagiri ◽

I.Jeena Jacob ◽

A.Usha Ruby ◽

Y.Vamsidhar

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Characteristic Curve ◽

Confusion Matrix ◽

True Positive Rate ◽

Receiver Operator Characteristic Curve ◽

Machine Learning Algorithms ◽

Kappa Score ◽

Machine Learning Classification ◽

Positive Rate

Abstract This paper studies the different machine learning classification algorithms to predict the COVID-19 recovered and deceased cases. The k-fold cross-validation resampling technique is used to validate the prediction model. The prediction scores of each algorithm are evaluated with performance metrics such as prediction accuracy, precision, recall, mean square error, confusion matrix, and kappa score. For the given dataset, the k-nearest neighbour (KNN) classification algorithm produces 80.4 % of predication accuracy and 1.5 to 3.3 % of improved accuracy over other algorithms. The KNN algorithm predicts 92 % (true positive rate) of the deceased cases correctly with 0.077 % of misclassification. Further, the KNN algorithm produces the lowest error rate as 0.19 on the prediction of accurate COVID-19 cases than the other algorithm. Also, it produces the receiver operator characteristic curve with the output value of 82 %.

Download Full-text

Associations between Well-Being State and Match External and Internal Load in Amateur Referees

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18063322 ◽

2021 ◽

Vol 18 (6) ◽

pp. 3322

Author(s):

Eñaut Ozaeta ◽

Javier Yanci ◽

Carlo Castagna ◽

Estibaliz Romaratezabala ◽

Daniel Castillo

Keyword(s):

External Load ◽

Contact Time ◽

Well Being ◽

Ground Contact ◽

Internal Load ◽

Match Play ◽

Power Meter ◽

Ground Contact Time ◽

Training Impulse ◽

External Loads

The main aim of this paper was to examine the association between prematch well-being status with match internal and external load in field (FR) and assistant (AR) soccer referees. Twenty-three FR and 46 AR participated in this study. The well-being state was assessed using the Hooper Scale and the match external and internal loads were monitored with Stryd Power Meter and heart monitors. While no significant differences were found in Hooper indices between match officials, FR registered higher external loads (p < 0.01; ES: 0.75 to 5.78), spent more time in zone 4 and zone 5, and recorded a greater training impulse (TRIMP) value (p < 0.01; ES: 1.35 to 1.62) than AR. Generally, no associations were found between the well-being variables and external loads for FR and AR. Additionally, no associations were found between the Hooper indices and internal loads for FR and AR. However, several relationships with different magnitudes were found between internal and external match loads, for FR, between power and speed with time spent in zone 2 (p < 0.05; r = −0.43), ground contact time with zone 2 and zone 3 (p < 0.05; r = 0.50 to 0.60) and power, speed, cadence and ground contact time correlated with time spent in zone 5 and TRIMP (p < 0.05 to 0.01; r = 0.42 to 0.64). Additionally, for AR, a relationship between speed and time in zone 1 was found (p < 0.05; r = −0.30; CL = 0.22). These results suggest that initial well-being state is not related to match officials’ performances during match play. In addition, the Stryd Power Meter can be a useful device to calculate the external load on soccer match officials.

Download Full-text

AI-Enabled Support System for Melanoma Detection and Classification

International Journal of Reliable and Quality E-Healthcare ◽

10.4018/ijrqeh.2021100104 ◽

2021 ◽

Vol 10 (4) ◽

pp. 58-75

Author(s):

Vivek Sen Saxena ◽

Prashant Johri ◽

Avneesh Kumar

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Skin Lesion ◽

Performance Metrics ◽

Similarity Index ◽

Skin Lesions ◽

Machine Learning Algorithms ◽

Lesion Area ◽

Melanoma Detection ◽

Grabcut Algorithm

Skin lesion melanoma is the deadliest type of cancer. Artificial intelligence provides the power to classify skin lesions as melanoma and non-melanoma. The proposed system for melanoma detection and classification involves four steps: pre-processing, resizing all the images, removing noise and hair from dermoscopic images; image segmentation, identifying the lesion area; feature extraction, extracting features from segmented lesion and classification; and categorizing lesion as malignant (melanoma) and benign (non-melanoma). Modified GrabCut algorithm is employed to generate skin lesion. Segmented lesions are classified using machine learning algorithms such as SVM, k-NN, ANN, and logistic regression and evaluated on performance metrics like accuracy, sensitivity, and specificity. Results are compared with existing systems and achieved higher similarity index and accuracy.

Download Full-text

Attack and Anomaly Detection in IoT Networks Using Supervised Machine Learning Approaches

Revue d intelligence artificielle ◽

10.18280/ria.350102 ◽

2021 ◽

Vol 35 (1) ◽

pp. 11-21

Author(s):

Himani Tyagi ◽

Rajendra Kumar

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Detection System ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Testing Time ◽

Learning Approaches ◽

Reduction Techniques ◽

Share Data

IoT is characterized by communication between things (devices) that constantly share data, analyze, and make decisions while connected to the internet. This interconnected architecture is attracting cyber criminals to expose the IoT system to failure. Therefore, it becomes imperative to develop a system that can accurately and automatically detect anomalies and attacks occurring in IoT networks. Therefore, in this paper, an Intrsuion Detection System (IDS) based on extracted novel feature set synthesizing BoT-IoT dataset is developed that can swiftly, accurately and automatically differentiate benign and malicious traffic. Instead of using available feature reduction techniques like PCA that can change the core meaning of variables, a unique feature set consisting of only seven lightweight features is developed that is also IoT specific and attack traffic independent. Also, the results shown in the study demonstrates the effectiveness of fabricated seven features in detecting four wide variety of attacks namely DDoS, DoS, Reconnaissance, and Information Theft. Furthermore, this study also proves the applicability and efficiency of supervised machine learning algorithms (KNN, LR, SVM, MLP, DT, RF) in IoT security. The performance of the proposed system is validated using performance Metrics like accuracy, precision, recall, F-Score and ROC. Though the accuracy of Decision Tree (99.9%) and Randon Forest (99.9%) Classifiers are same but other metrics like training and testing time shows Random Forest comparatively better.

Download Full-text

Supermarket Sales Prediction Using Regression

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/951022021 ◽

2021 ◽

Vol 10 (2) ◽

pp. 1153-1157

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Low Cost ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Customer Data ◽

Sales Data ◽

Online Marketplace ◽

Sales Prediction ◽

The Future

Sales forecasting is an important when it comes to companies who are engaged in retailing, logistics, manufacturing, marketing and wholesaling. It allows companies to allocate resources efficiently, to estimate revenue of the sales and to plan strategies which are better for company’s future. In this paper, predicting product sales from a particular store is done in a way that produces better performance compared to any machine learning algorithms. The dataset used for this project is Big Mart Sales data of the 2013.Nowadays shopping malls and Supermarkets keep track of the sales data of the each and every individual item for predicting the future demand of the customer. It contains large amount of customer data and the item attributes. Further, the frequent patterns are detected by mining the data from the data warehouse. Then the data can be used for predicting the sales of the future with the help of several machine learning techniques (algorithms) for the companies like Big Mart. In this project, we propose a model using the Xgboost algorithm for predicting sales of companies like Big Mart and founded that it produces better performance compared to other existing models. An analysis of this model with other models in terms of their performance metrics is made in this project. Big Mart is an online marketplace where people can buy or sell or advertise your merchandise at low cost. The goal of the paper is to make Big Mart the shopping paradise for the buyers and a marketing solutions for the sellers as well. The ultimate aim is the complete satisfaction of the customers. The project “SUPERMARKET SALES PREDICTION” builds a predictive model and finds out the sales of each of the product at a particular store. The Big Mart use this model to under the properties of the products which plays a major role in increasing the sales. This can also be done on the basis hypothesis that should be done before looking at the data

Download Full-text

Predicting Metabolic Syndrome With Machine Learning Models Using a Decision Tree Algorithm: Retrospective Cohort Study (Preprint)

10.2196/preprints.17110 ◽

2019 ◽

Author(s):

Cheng-Sheng Yu ◽

Yu-Jiun Lin ◽

Chang-Hsien Lin ◽

Sen-Te Wang ◽

Shiyng-Yu Lin ◽

...

Keyword(s):

Machine Learning ◽

Metabolic Syndrome ◽

Logistic Regression ◽

Decision Tree ◽

Characteristic Curve ◽

Significant Risk ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Health Examination ◽

Multivariate Logistic Regression

BACKGROUND Metabolic syndrome is a cluster of disorders that significantly influence the development and deterioration of numerous diseases. FibroScan is an ultrasound device that was recently shown to predict metabolic syndrome with moderate accuracy. However, previous research regarding prediction of metabolic syndrome in subjects examined with FibroScan has been mainly based on conventional statistical models. Alternatively, machine learning, whereby a computer algorithm learns from prior experience, has better predictive performance over conventional statistical modeling. OBJECTIVE We aimed to evaluate the accuracy of different decision tree machine learning algorithms to predict the state of metabolic syndrome in self-paid health examination subjects who were examined with FibroScan. METHODS Multivariate logistic regression was conducted for every known risk factor of metabolic syndrome. Principal components analysis was used to visualize the distribution of metabolic syndrome patients. We further applied various statistical machine learning techniques to visualize and investigate the pattern and relationship between metabolic syndrome and several risk variables. RESULTS Obesity, serum glutamic-oxalocetic transaminase, serum glutamic pyruvic transaminase, controlled attenuation parameter score, and glycated hemoglobin emerged as significant risk factors in multivariate logistic regression. The area under the receiver operating characteristic curve values for classification and regression trees and for the random forest were 0.831 and 0.904, respectively. CONCLUSIONS Machine learning technology facilitates the identification of metabolic syndrome in self-paid health examination subjects with high accuracy.

Download Full-text

A new classification system for autism based on machine learning of artificial intelligence

Technology and Health Care ◽

10.3233/thc-213032 ◽

2021 ◽

pp. 1-18

Author(s):

Seyed Reza Shahamiri ◽

Fadi Thabtah ◽

Neda Abdelhamid

Keyword(s):

Machine Learning ◽

Scoring Function ◽

Autistic Traits ◽

Well Being ◽

Learning Technologies ◽

Machine Learning Algorithms ◽

The Social ◽

Hidden Patterns ◽

The Individual ◽

Fold Cross Validation

BACKGROUND: Autistic Spectrum Disorder (ASD) is a neurodevelopment condition that is normally linked with substantial healthcare costs. Typical ASD screening techniques are time consuming, so the early detection of ASD could reduce such costs and help limit the development of the condition. OBJECTIVE: We propose an automated approach to detect autistic traits that replaces the scoring function used in current ASD screening with a more intelligent and less subjective approach. METHODS: The proposed approach employs deep neural networks (DNNs) to detect hidden patterns from previously labelled cases and controls, then applies the knowledge derived to classify the individual being screened. Specificity, sensitivity, and accuracy of the proposed approach are evaluated using ten-fold cross-validation. A comparative analysis has also been conducted to compare the DNNs’ performance with other prominent machine learning algorithms. RESULTS: Results indicate that deep learning technologies can be embedded within existing ASD screening to assist the stakeholders in the early identification of ASD traits. CONCLUSION: The proposed system will facilitate access to needed support for the social, physical, and educational well-being of the patient and family by making ASD screening more intelligent and accurate.

Download Full-text