Data-driven detection of counterpressing in professional football

Gradient Boosting ◽

Professional Football ◽

Match Analysis ◽

Video Footage ◽

Extreme Gradient Boosting

AbstractDetecting counterpressing is an important task for any professional match-analyst in football (soccer), but is being done exclusively manually by observing video footage. The purpose of this paper is not only to automatically identify this strategy, but also to derive metrics that support coaches with the analysis of transition situations. Additionally, we want to infer objective influence factors for its success and assess the validity of peer-created rules of thumb established in by practitioners. Based on a combination of positional and event data we detect counterpressing situations as a supervised machine learning task. Together, with professional match-analysis experts we discussed and consolidated a consistent definition, extracted 134 features and manually labeled more than 20, 000 defensive transition situations from 97 professional football matches. The extreme gradient boosting model—with an area under the curve of $$87.4\%$$ 87.4 % on the labeled test data—enabled us to judge how quickly teams can win the ball back with counterpressing strategies, how many shots they create or allow immediately afterwards and to determine what the most important success drivers are. We applied this automatic detection on all matches from six full seasons of the German Bundesliga and quantified the defensive and offensive consequences when applying counterpressing for each team. Automating the task saves analysts a tremendous amount of time, standardizes the otherwise subjective task, and allows to identify trends within larger data-sets. We present an effective way of how the detection and the lessons learned from this investigation are integrated effectively into common match-analysis processes.

Machine Learning Models for COVID-19 Detection in Brazil Based on Symptoms (Preprint)

10.2196/preprints.27293 ◽

2021 ◽

Author(s):

Íris Viana dos Santos Santana ◽

Andressa C. M. da Silveira ◽

Álvaro Sobrinho ◽

Lenardo Chaves e Silva ◽

Leandro Dias da Silva ◽

...

Keyword(s):

Machine Learning ◽

Early Stage ◽

Area Under The Curve ◽

Gradient Boosting ◽

Support Vector ◽

Accuracy Score ◽

K Nearest Neighbors ◽

Runny Nose ◽

Extreme Gradient Boosting

BACKGROUND controlling the COVID-19 outbreak in Brazil is considered a challenge of continental proportions due to the high population and urban density, weak implementation and maintenance of social distancing strategies, and limited testing capabilities. OBJECTIVE to contribute to addressing such a challenge, we present the implementation and evaluation of supervised Machine Learning (ML) models to assist the COVID-19 detection in Brazil based on early-stage symptoms. METHODS firstly, we conducted data preprocessing and applied the Chi-squared test in a Brazilian dataset, mainly composed of early-stage symptoms, to perform statistical analyses. Afterward, we implemented ML models using the Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), K-Nearest Neighbors (KNN), Decision Tree (DT), Gradient Boosting Machine (GBM), and Extreme Gradient Boosting (XGBoost) algorithms. We evaluated the ML models using precision, accuracy score, recall, the area under the curve, and the Friedman and Nemenyi tests. Based on the comparison, we grouped the top five ML models and measured feature importance. RESULTS the MLP model presented the highest mean accuracy score, with more than 97.85%, when compared to GBM (> 97.39%), RF (> 97.36%), DT (> 97.07%), XGBoost (> 97.06%), KNN (> 95.14%), and SVM (> 94.27%). Based on the statistical comparison, we grouped MLP, GBM, DT, RF, and XGBoost, as the top five ML models, because the evaluation results are statistically indistinguishable. The ML models` importance of features used during predictions varies from gender, profession, fever, sore throat, dyspnea, olfactory disorder, cough, runny nose, taste disorder, and headache. CONCLUSIONS supervised ML models effectively assist the decision making in medical diagnosis and public administration (e.g., testing strategies), based on early-stage symptoms that do not require advanced and expensive exams.

An Autoencoder and Machine Learning Model to Predict Suicidal Ideation with Brain Structural Imaging

Journal of Clinical Medicine ◽

10.3390/jcm9030658 ◽

2020 ◽

Vol 9 (3) ◽

pp. 658 ◽

Cited By ~ 1

Author(s):

Jun-Cheng Weng ◽

Tung-Yeh Lin ◽

Yuan-Hsiung Tsai ◽

Man Teng Cheok ◽

Yi-Peng Eve Chang ◽

...

Keyword(s):

Machine Learning ◽

Suicidal Ideation ◽

Learning Algorithm ◽

Area Under The Curve ◽

Learning Model ◽

Gradient Boosting ◽

Machine Learning Model ◽

Depressive Patients

It is estimated that at least one million people die by suicide every year, showing the importance of suicide prevention and detection. In this study, an autoencoder and machine learning model was employed to predict people with suicidal ideation based on their structural brain imaging. The subjects in our generalized q-sampling imaging (GQI) dataset consisted of three groups: 41 depressive patients with suicidal ideation (SI), 54 depressive patients without suicidal thoughts (NS), and 58 healthy controls (HC). In the GQI dataset, indices of generalized fractional anisotropy (GFA), isotropic values of the orientation distribution function (ISO), and normalized quantitative anisotropy (NQA) were separately trained in different machine learning models. A convolutional neural network (CNN)-based autoencoder model, the supervised machine learning algorithm extreme gradient boosting (XGB), and logistic regression (LR) were used to discriminate SI subjects from NS and HC subjects. After five-fold cross validation, separate data were tested to obtain the accuracy, sensitivity, specificity, and area under the curve of each result. Our results showed that the best pattern of structure across multiple brain locations can classify suicidal ideates from NS and HC with a prediction accuracy of 85%, a specificity of 100% and a sensitivity of 75%. The algorithms developed here might provide an objective tool to help identify suicidal ideation risk among depressed patients alongside clinical assessment.

Clinical and Laboratory Predictors of In-hospital Mortality in Patients With Coronavirus Disease-2019: A Cohort Study in Wuhan, China

Clinical Infectious Diseases ◽

10.1093/cid/ciaa538 ◽

2020 ◽

Vol 71 (16) ◽

pp. 2079-2088 ◽

Cited By ~ 52

Author(s):

Kun Wang ◽

Peiyuan Zuo ◽

Yuwei Liu ◽

Meng Zhang ◽

Xiaofang Zhao ◽

...

Keyword(s):

Hospital Mortality ◽

Prediction Models ◽

Area Under The Curve ◽

Mortality Prediction ◽

Gradient Boosting ◽

Laboratory Model ◽

Training Cohort ◽

Clinical Model ◽

Mortality Prediction Models

Abstract Background This study aimed to develop mortality-prediction models for patients with coronavirus disease-2019 (COVID-19). Methods The training cohort included consecutive COVID-19 patients at the First People’s Hospital of Jiangxia District in Wuhan, China, from 7 January 2020 to 11 February 2020. We selected baseline data through the stepwise Akaike information criterion and ensemble XGBoost (extreme gradient boosting) model to build mortality-prediction models. We then validated these models by randomly collected COVID-19 patients in Union Hospital, Wuhan, from 1 January 2020 to 20 February 2020. Results A total of 296 COVID-19 patients were enrolled in the training cohort; 19 died during hospitalization and 277 discharged from the hospital. The clinical model developed using age, history of hypertension, and coronary heart disease showed area under the curve (AUC), 0.88 (95% confidence interval [CI], .80–.95); threshold, −2.6551; sensitivity, 92.31%; specificity, 77.44%; and negative predictive value (NPV), 99.34%. The laboratory model developed using age, high-sensitivity C-reactive protein, peripheral capillary oxygen saturation, neutrophil and lymphocyte count, d-dimer, aspartate aminotransferase, and glomerular filtration rate had a significantly stronger discriminatory power than the clinical model (P = .0157), with AUC, 0.98 (95% CI, .92–.99); threshold, −2.998; sensitivity, 100.00%; specificity, 92.82%; and NPV, 100.00%. In the subsequent validation cohort (N = 44), the AUC (95% CI) was 0.83 (.68–.93) and 0.88 (.75–.96) for the clinical model and laboratory model, respectively. Conclusions We developed 2 predictive models for the in-hospital mortality of patients with COVID-19 in Wuhan that were validated in patients from another center.

Performance Analysis of Boosting Classifiers in Recognizing Activities of Daily Living

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17031082 ◽

2020 ◽

Vol 17 (3) ◽

pp. 1082 ◽

Cited By ~ 7

Author(s):

Saifur Rahman ◽

Muhammad Irfan ◽

Mohsin Raza ◽

Khawaja Moyeezullah Ghori ◽

Shumayla Yaqoob ◽

...

Keyword(s):

Physical Activity ◽

Performance Analysis ◽

Activities Of Daily Living ◽

Daily Living ◽

Gradient Boosting ◽

Method Performance ◽

Light Gradient ◽

Boosting Algorithms

Physical activity is essential for physical and mental health, and its absence is highly associated with severe health conditions and disorders. Therefore, tracking activities of daily living can help promote quality of life. Wearable sensors in this regard can provide a reliable and economical means of tracking such activities, and such sensors are readily available in smartphones and watches. This study is the first of its kind to develop a wearable sensor-based physical activity classification system using a special class of supervised machine learning approaches called boosting algorithms. The study presents the performance analysis of several boosting algorithms (extreme gradient boosting—XGB, light gradient boosting machine—LGBM, gradient boosting—GB, cat boosting—CB and AdaBoost) in a fair and unbiased performance way using uniform dataset, feature set, feature selection method, performance metric and cross-validation techniques. The study utilizes the Smartphone-based dataset of thirty individuals. The results showed that the proposed method could accurately classify the activities of daily living with very high performance (above 90%). These findings suggest the strength of the proposed system in classifying activity of daily living using only the smartphone sensor’s data and can assist in reducing the physical inactivity patterns to promote a healthier lifestyle and wellbeing.

Gully Erosion Susceptibility Mapping in Highly Complex Terrain Using Machine Learning Models

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10100680 ◽

2021 ◽

Vol 10 (10) ◽

pp. 680

Author(s):

Annan Yang ◽

Chunmei Wang ◽

Guowei Pang ◽

Yongqing Long ◽

Lei Wang ◽

...

Keyword(s):

Machine Learning ◽

Complex Terrain ◽

Large Scale ◽

Area Under The Curve ◽

Gully Erosion ◽

Susceptibility Mapping ◽

Weight Of Evidence ◽

Gradient Boosting ◽

Machine Learning Classification ◽

Extreme Gradient Boosting

Gully erosion is the most severe type of water erosion and is a major land degradation process. Gully erosion susceptibility mapping (GESM)’s efficiency and interpretability remains a challenge, especially in complex terrain areas. In this study, a WoE-MLC model was used to solve the above problem, which combines machine learning classification algorithms and the statistical weight of evidence (WoE) model in the Loess Plateau. The three machine learning (ML) algorithms utilized in this research were random forest (RF), gradient boosted decision trees (GBDT), and extreme gradient boosting (XGBoost). The results showed that: (1) GESM were well predicted by combining both machine learning regression models and WoE-MLC models, with the area under the curve (AUC) values both greater than 0.92, and the latter was more computationally efficient and interpretable; (2) The XGBoost algorithm was more efficient in GESM than the other two algorithms, with the strongest generalization ability and best performance in avoiding overfitting (averaged AUC = 0.947), followed by the RF algorithm (averaged AUC = 0.944), and GBDT algorithm (averaged AUC = 0.938); and (3) slope gradient, land use, and altitude were the main factors for GESM. This study may provide a possible method for gully erosion susceptibility mapping at large scale.

Big Data Analytics for Short and Medium-Term Electricity Load Forecasting Using an AI Techniques Ensembler

Energies ◽

10.3390/en13195193 ◽

2020 ◽

Vol 13 (19) ◽

pp. 5193

Author(s):

Nasir Ayub ◽

Muhammad Irfan ◽

Muhammad Awais ◽

Usman Ali ◽

Tariq Ali ◽

...

Keyword(s):

Feature Selection ◽

Load Forecasting ◽

Energy Generation ◽

Gradient Boosting ◽

Support Vector ◽

Hybrid Techniques ◽

Electricity Load ◽

Electricity Load Forecasting

Electrical load forecasting provides knowledge about future consumption and generation of electricity. There is a high level of fluctuation behavior between energy generation and consumption. Sometimes, the energy demand of the consumer becomes higher than the energy already generated, and vice versa. Electricity load forecasting provides a monitoring framework for future energy generation, consumption, and making a balance between them. In this paper, we propose a framework, in which deep learning and supervised machine learning techniques are implemented for electricity-load forecasting. A three-step model is proposed, which includes: feature selection, extraction, and classification. The hybrid of Random Forest (RF) and Extreme Gradient Boosting (XGB) is used to calculate features’ importance. The average feature importance of hybrid techniques selects the most relevant and high importance features in the feature selection method. The Recursive Feature Elimination (RFE) method is used to eliminate the irrelevant features in the feature extraction method. The load forecasting is performed with Support Vector Machines (SVM) and a hybrid of Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN). The meta-heuristic algorithms, i.e., Grey Wolf Optimization (GWO) and Earth Worm Optimization (EWO) are applied to tune the hyper-parameters of SVM and CNN-GRU, respectively. The accuracy of our enhanced techniques CNN-GRU-EWO and SVM-GWO is 96.33% and 90.67%, respectively. Our proposed techniques CNN-GRU-EWO and SVM-GWO perform 7% and 3% better than the State-Of-The-Art (SOTA). In the end, a comparison with SOTA techniques is performed to show the improvement of the proposed techniques. This comparison showed that the proposed technique performs well and results in the lowest performance error rates and highest accuracy rates as compared to other techniques.

An ARDS Severity Recognition Model based on XGBoost

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012009 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012009

Author(s):

Huimin Zhang ◽

Renshuang Ding ◽

Qi Zhang ◽

Mingxing Fang ◽

Guanghua Zhang ◽

...

Keyword(s):

Real Time ◽

Pearson Correlation ◽

Area Under The Curve ◽

Gradient Boosting ◽

Blood Oxygen Saturation ◽

Recognition Model ◽

Model Based ◽

Mimic Iii

Abstract Given the subjectivity and non-real-time of disease scoring system and invasive parameters in evaluating the development of acute respiratory distress syndrome (ARDS), combined with noninvasive parameters, this paper proposed an ARDS severity recognition model based on extreme gradient boosting (XGBoost). Firstly, the physiological parameters of patients were extracted based on the MIMIC-III database for statistical analysis, and the outliers and unbalanced samples were processed by the interquartile range and synthetic minority oversampling technique. Then, Pearson correlation coefficient and random forest were used as hybrid feature selection to score the noninvasive parameters comprehensively, and essential parameters for identifying diseases were obtained. Finally, XGBoost combined with grid search cross-validation to determine the best hyper-parameters of the model to realize the accurate classification of disease degree. The experimental results show that the model’s area under the curve (AUC) is as high as 0.98, and the accuracy is 0.90; the total score of blood oxygen saturation (SpO2) is 0.625, which could be used as an essential parameter to evaluate the severity of ARDS. Compared with traditional methods, this model has excellent advantages in real-time and accuracy and could provide more accurate diagnosis and treatment suggestions for medical staff.

Identification of five important genes to predict GBM subtypes

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab144 ◽

2021 ◽

Author(s):

Yang Tang ◽

Maleeha A Qazi ◽

Kevin R Brown ◽

Nicholas Mikolajewicz ◽

Jason Moffat ◽

...

Keyword(s):

Machine Learning ◽

Gene Signature ◽

Gradient Boosting ◽

Tissue Cell ◽

Learning Approach ◽

Gene Set ◽

Primary Brain Tumour ◽

Machine Learning Approach

Abstract Background Glioblastoma (GBM), the most common and aggressive primary brain tumour in adults, has been classified into three subtypes: classical, mesenchymal and proneural. While the original classification relied on an 840 gene-set, further clarification on true GBM subtypes uses a 150-gene signature to accurately classify GBM into the three subtypes. We hypothesized whether a machine learning approach could be used to identify a smaller gene-set to accurately predict GBM subtype. Methods Using a supervised machine learning approach, extreme gradient boosting (XGBoost), we developed a classifier to predict the three subtypes of glioblastoma (GBM): classical, mesenchymal and proneural. We tested the classifier on in-house GBM tissue, cell lines and xenograft samples to predict their subtype. Results We identified the five most important genes for characterizing the three subtypes based on genes that often exhibited high Importance Scores in our XGBoost analyses. On average, this approach achieved 80.12% accuracy in predicting these three subtypes of GBM. Furthermore, we applied our five-gene classifier to successfully predict the subtype of GBM samples at our centre. Conclusion Our 5-gene set classifier is the smallest classifier to date that can predict GBM subtypes with high accuracy, which could facilitate the future development of a five-gene subtype diagnostic biomarker for routine assays in GBM samples.

Advances in Logistics, Operations, and Management Science - Handbook of Research on Management Techniques and Sustainability Strategies for Handling Disruptive Situations in Corporate Settings ◽

Prediction of the Disappearance of Companies From the Market in Bogotá, Colombia Using Machine Learning

10.4018/978-1-7998-8185-8.ch011 ◽

2021 ◽

pp. 227-246

Author(s):

William Stive Fajardo-Moreno ◽

Rubén Dario Acosta Velásquez ◽

Ivan Dario Castaño Pérez ◽

Leonardo Espinosa-Leal

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Area Under The Curve ◽

Local Economy ◽

Gradient Boosting ◽

Grid Search ◽

Learning Machine ◽

Available Information ◽

Fold Cross Validation

In this chapter, the results concerning the modeling of companies' disappearance from Bogota's market using machine learning methods are presented. The authors use the available information from Bogota's Chamber of Commerce, where the companies are registered yearly. The dataset comprises the years 2017 to 2020 with almost 3 million registries. In this work, a deep analysis of the different features of the data is presented and explained. Next, four state-of-the-art machine learning models are trained for comparison: logistic regression (LR), extreme learning machine (ELM), random forest (RF), and extreme gradient boosting (XGBoost), all with five-fold cross-validation and 50 steps in the randomized grid search. All methods showed excellent performance, with an average of 0.895 in the area under the curve (AUC), being the latter algorithm the best overall (0.97). These results are in agreement with the state-of-the-art values in the field and will be of paramount importance to assess companies' stability for Bogota's local economy.

Determination of Antiepileptic Drugs Withdrawal Through EEG Hjorth Parameter Analysis

International Journal of Neural Systems ◽

10.1142/s0129065720500367 ◽

2020 ◽

Vol 30 (11) ◽

pp. 2050036

Author(s):

Chen-Sen Ouyang ◽

Rei-Cheng Yang ◽

Rong-Ching Wu ◽

Ching-Tai Chiang ◽

Lung-Chang Lin

Keyword(s):

Area Under The Curve ◽

Critical Issue ◽

Quantitative Eeg ◽

Parameter Analysis ◽

Gradient Boosting ◽

Seizure Recurrence ◽

Epileptiform Discharges ◽

Recurrence Group ◽

Nonrecurrence Group

The decision to continue or to stop antiepileptic drug (AED) treatment in patients with prolonged seizure remission is a critical issue. Previous studies have used certain risk factors or electroencephalogram (EEG) findings to predict seizure recurrence after the withdrawal of AEDs. However, validated biomarkers to guide the withdrawal of AEDs are lacking. In this study, we used quantitative EEG analysis to establish a method for predicting seizure recurrence after the withdrawal of AEDs. A total of 34 patients with epilepsy were divided into two groups, 17 patients in the recurrence group and the other 17 patients in the nonrecurrence group. All patients were seizure free for at least two years. Before AED withdrawal, an EEG was performed for each patient that showed no epileptiform discharges. These EEG recordings were classified using Hjorth parameter-based EEG features. We found that the Hjorth complexity values were higher in patients in the recurrence group than in the nonrecurrence group. The extreme gradient boosting classification method achieved the highest performance in terms of accuracy, area under the curve, sensitivity, and specificity (84.76%, 88.77%, 89.67%, and 80.47%, respectively). Our proposed method is a promising tool to help physicians determine AED withdrawal for seizure-free patients.