scholarly journals Multifactorial analysis of factors influencing elite Australian football match outcomes: a machine learning approach

2019 ◽  
Vol 18 (3) ◽  
pp. 100-124
Author(s):  
J. Fahey-Gilmour ◽  
B. Dawson ◽  
P. Peeling ◽  
J. Heasman ◽  
B. Rogalski

Abstract In Australian football (AF), few studies have assessed combinations of pre- game factors and their relation to game outcomes (win/loss) in multivariable analyses. Further, previous research has mostly been confined to association-based linear approaches and post-game prediction, with limited assessment of predictive machine learning (ML) models in a pre-game setting. Therefore, our aim was to use ML techniques to predict game outcomes and produce a hierarchy of important (win/loss) variables. A total of 152 variables (79 absolute and 73 differentials) were used from the 2013–2018 Australian Football League (AFL) seasons. Various ML models were trained (cross-validation) on the 2013–2017 seasons with the–2018 season used as an independent test set. Model performance varied (66.5-73.3% test set accuracy), although the best model (glmnet – 73.3%) rivalled bookmaker predictions in the same period (70.9%). The glmnet model revealed measures of team quality (a player-based rating and a team-based) in their relative form as the most important variables for prediction. Models that contained in-built feature selection or could model non-linear relationships generally performed better. These findings show that AFL game outcomes can be predicted using ML methods and provide a hierarchy of predictors that maximize the chance of winning.

2021 ◽  
Vol 20 (1) ◽  
pp. 55-78
Author(s):  
J. Fahey-Gilmour ◽  
J. Heasman ◽  
B. Rogalski ◽  
B. Dawson ◽  
P. Peeling

Abstract In elite Australian football (AF) many studies have investigated individual player performance using a variety of outcomes (e.g. team selection, game running, game rating etc.), however, none have attempted to predict a player’s performance using combinations of pre-game factors. Therefore, our aim was to investigate the ability of commonly reported individual player and team characteristics to predict individual Australian Football League (AFL) player performance, as measured through the official AFL player rating (AFLPR) (Champion Data). A total of 158 variables were derived for players (n = 64) from one AFL team using data collected during the 2014-2019 AFL seasons. Various machine learning models were trained (cross-validation) on the 2014-2018 seasons, with the 2019 season used as an independent test set. Model performance, assessed using root mean square error (RMSE), varied (4.69-5.03 test set RMSE) but was generally poor when compared to a singular variable prediction (AFLPR pre-game rating: 4.72 test set RMSE). Variation in model performance (range RMSE: 0.14 excusing worst model) was low, indicating different approaches produced similar results, however, glmnet models were marginally superior (4.69 RMSE test set). This research highlights the limited utility of currently collected pre-game variables to predict week-to-week game performance more accurately than simple singular variable baseline models.


2019 ◽  
Author(s):  
Oskar Flygare ◽  
Jesper Enander ◽  
Erik Andersson ◽  
Brjánn Ljótsson ◽  
Volen Z Ivanov ◽  
...  

**Background:** Previous attempts to identify predictors of treatment outcomes in body dysmorphic disorder (BDD) have yielded inconsistent findings. One way to increase precision and clinical utility could be to use machine learning methods, which can incorporate multiple non-linear associations in prediction models. **Methods:** This study used a random forests machine learning approach to test if it is possible to reliably predict remission from BDD in a sample of 88 individuals that had received internet-delivered cognitive behavioral therapy for BDD. The random forest models were compared to traditional logistic regression analyses. **Results:** Random forests correctly identified 78% of participants as remitters or non-remitters at post-treatment. The accuracy of prediction was lower in subsequent follow-ups (68%, 66% and 61% correctly classified at 3-, 12- and 24-month follow-ups, respectively). Depressive symptoms, treatment credibility, working alliance, and initial severity of BDD were among the most important predictors at the beginning of treatment. By contrast, the logistic regression models did not identify consistent and strong predictors of remission from BDD. **Conclusions:** The results provide initial support for the clinical utility of machine learning approaches in the prediction of outcomes of patients with BDD. **Trial registration:** ClinicalTrials.gov ID: NCT02010619.


Author(s):  
Miss. Aakansha P. Tiwari

Abstract: Effective contact tracing of SARS-CoV-2 enables quick and efficient diagnosis of COVID-19 and might mitigate the burden on healthcare system. Prediction models that combine several features to approximate the danger of infection are developed. These aim to help medical examiners worldwide in treatment of patients, especially within the context of limited healthcare resources. They established a machine learning approach that trained on records from 51,831 tested individuals (of whom 4769 were confirmed to own COVID-19 coronavirus). Test set contained data from the upcoming week (47,401 tested individuals of whom 3624 were confirmed to own COVID-19 disease). Their model predicted COVID-19 test results with highest accuracy using only eight binary features: sex, age ≥60 years, known contact with infected patients, and also the appearance of 5 initial clinical symptoms appeared. Generally, supported the nationwide data publicly reported by the Israeli Ministry of Health, they developed a model that detects COVID-19 cases by simple features accessed by asking basic inquiries to the affected patient. Their framework may be used, among other considerations, to prioritize testing for COVID-19 when testing resources are limited and important. Keywords: Machine Learning, SARS-COV-2, COVID-19, Coronavirus.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Lei Li ◽  
Desheng Wu

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.


2020 ◽  
Author(s):  
Amir Mosavi

Several epidemiological models are being used around the world to project the number of infected individuals and the mortality rates of the COVID-19 outbreak. Advancing accurate prediction models is of utmost importance to take proper actions. Due to a high level of uncertainty or even lack of essential data, the standard epidemiological models have been challenged regarding the delivery of higher accuracy for long-term prediction. As an alternative to the susceptible-infected-resistant (SIR)-based models, this study proposes a hybrid machine learning approach to predict the COVID-19 and we exemplify its potential using data from Hungary. The hybrid machine learning methods of adaptive network-based fuzzy inference system (ANFIS) and multi-layered perceptron-imperialist competitive algorithm (MLP-ICA) are used to predict time series of infected individuals and mortality rate. The models predict that by late May, the outbreak and the total morality will drop substantially. The validation is performed for nine days with promising results, which confirms the model accuracy. It is expected that the model maintains its accuracy as long as no significant interruption occurs. Based on the results reported here, and due to the complex nature of the COVID-19 outbreak and variation in its behavior from nation-to-nation, this study suggests machine learning as an effective tool to model the outbreak. This paper provides an initial benchmarking to demonstrate the potential of machine learning for future research.


2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kerry E. Poppenberg ◽  
Vincent M. Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


2019 ◽  
Author(s):  
Abdul Karim ◽  
Vahid Riahi ◽  
Avinash Mishra ◽  
Abdollah Dehzangi ◽  
M. A. Hakim Newton ◽  
...  

Abstract Representing molecules in the form of only one type of features and using those features to predict their activities is one of the most important approaches for machine-learning-based chemical-activity-prediction. For molecular activities like quantitative toxicity prediction, the performance depends on the type of features extracted and the machine learning approach used. For such cases, using one type of features and machine learning model restricts the prediction performance to specific representation and model used. In this paper, we study quantitative toxicity prediction and propose a machine learning model for the same. Our model uses an ensemble of heterogeneous predictors instead of typically using homogeneous predictors. The predictors that we use vary either on the type of features used or on the deep learning architecture employed. Each of these predictors presumably has its own strengths and weaknesses in terms of toxicity prediction. Our motivation is to make a combined model that utilizes different types of features and architectures to obtain better collective performance that could go beyond the performance of each individual predictor. We use six predictors in our model and test the model on four standard quantitative toxicity benchmark datasets. Experimental results show that our model outperforms the state-of-the-art toxicity prediction models in 8 out of 12 accuracy measures. Our experiments show that ensembling heterogeneous predictor improves the performance over single predictors and homogeneous ensembling of single predictors.The results show that each data representation or deep learning based predictor has its own strengths and weaknesses, thus employing a model ensembling multiple heterogeneous predictors could go beyond individual performance of each data representation or each predictor type.


2021 ◽  
Vol 8 ◽  
Author(s):  
Daniele Roberto Giacobbe ◽  
Alessio Signori ◽  
Filippo Del Puente ◽  
Sara Mora ◽  
Luca Carmisciano ◽  
...  

Sepsis is a major cause of death worldwide. Over the past years, prediction of clinically relevant events through machine learning models has gained particular attention. In the present perspective, we provide a brief, clinician-oriented vision on the following relevant aspects concerning the use of machine learning predictive models for the early detection of sepsis in the daily practice: (i) the controversy of sepsis definition and its influence on the development of prediction models; (ii) the choice and availability of input features; (iii) the measure of the model performance, the output, and their usefulness in the clinical practice. The increasing involvement of artificial intelligence and machine learning in health care cannot be disregarded, despite important pitfalls that should be always carefully taken into consideration. In the long run, a rigorous multidisciplinary approach to enrich our understanding in the application of machine learning techniques for the early recognition of sepsis may show potential to augment medical decision-making when facing this heterogeneous and complex syndrome.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi137-vi137
Author(s):  
Niklas Tillmanns ◽  
Avery Lum ◽  
W R Brim ◽  
Harry Subramanian ◽  
Ming Lin ◽  
...  

Abstract PURPOSE Generalizability, reproducibility and objectivity are critical elements that need to be considered when translating machine learning models into clinical practice. While a large body of literature has been published on machine learning methods for segmentation of brain tumors, a systematic evaluation of paper quality and reproducibility has not been done. We investigated the use of “Transparent Reporting of studies on prediction models for Individual Prognosis Or Diagnosis” (TRIPOD) items, among papers published in this relatively new and growing field. METHODS According to PRISMA a literature review was performed on four databases, Ovid Embase, Ovid MEDLINE, Cochrane trials (CENTRAL) and Web of science core-collection first in October 2020 and a second time in February 2021. Keywords and controlled vocabulary included artificial intelligence, machine learning, deep learning, radiomics, magnetic resonance imaging, glioma, and glioblastoma. The publications were assessed in order to the TRIPOD items. RESULTS 37 publications from our database search were screened in TRIPOD and yielded an average score of 12.08 with the maximum score being 16 and the minimum score 7. The best scoring item was interpretation (item 19) where all papers scored a point. The lowest scoring items were the title, the abstract, risk groups and the model performance (items number 1, 2, 11 and 16), where no paper scored a point. Less than 1% of the papers discussed the problem of missing data (item 9) and the funding of research (item 22). CONCLUSION TRIPOD analysis showed that a majority of the papers do not score high on critical elements that allow reproducibility, translation, and objectivity of research. An average score of 12.08 (40%) indicates that the publications usually achieve a relatively low score. The categories that were consistently poorly described include the ML network description, measuring model performance, title details and inclusion of information into the abstract.


Sign in / Sign up

Export Citation Format

Share Document