scholarly journals Building an NCAA men’s basketball predictive model and quantifying its success

Author(s):  
Michael J. Lopez ◽  
Gregory J. Matthews

AbstractComputing and machine learning advancements have led to the creation of many cutting-edge predictive algorithms, some of which have been demonstrated to provide more accurate forecasts than traditional statistical tools. In this manuscript, we provide evidence that the combination of modest statistical methods with informative data can meet or exceed the accuracy of more complex models when it comes to predicting the NCAA men’s basketball tournament. First, we describe a prediction model that merges the point spreads set by Las Vegas sportsbooks with possession based team efficiency metrics by using logistic regressions. The set of probabilities generated from this model most accurately predicted the 2014 tournament, relative to approximately 400 competing submissions, as judged by the log loss function. Next, we attempt to quantify the degree to which luck played a role in the success of this model by simulating tournament outcomes under different sets of true underlying game probabilities. We estimate that under the most optimistic of game probability scenarios, our entry had roughly a 12% chance of outscoring all competing submissions and just less than a 50% chance of finishing with one of the ten best scores.

2021 ◽  
pp. 20210525
Author(s):  
Daisuke Kawahara ◽  
Yuji Murakami ◽  
Shigeyuki Tani ◽  
Yasushi Nagata

Objective: To propose the prediction model for degree of differentiation for locally advanced esophageal cancer patients from the planning CT image by radiomics analysis with machine learning. Methods: Data of 104 patients with esophagus cancer, who underwent chemoradiotherapy followed by surgery at the Hiroshima University hospital from 2003 to 2016 were analyzed. The treatment outcomes of these tumors were known prior to the study. The data were split into 3 sets: 57/16 tumors for the training/validation and 31 tumors for model testing. The degree of differentiation of squamous cell carcinoma was classified into two groups. The first group (Group I) was a poorly differentiated (POR) patients. The second group (Group II) was well and moderately differentiated patients. The radiomics feature was extracted in the tumor and around the tumor regions. A total number of 3480 radiomics features per patient image were extracted from radiotherapy planning CT scan. Models were built with the least absolute shrinkage and selection operator (LASSO) logistic regression and applied to the set of candidate predictors. The radiomics features were used for the input data in the machine learning. To build predictive models with radiomics features, neural network classifiers was used. The precision, accuracy, sensitivity by generating confusion matrices, the area under the curve (AUC) of receiver operating characteristic curve were evaluated. Results: By the LASSO analysis of the training data, we found 13 radiomics features from CT images for the classification. The accuracy of the prediction model was highest for using only CT radiomics features. The accuracy, specificity, and sensitivity of the predictive model were 85.4%, 88.6%, 80.0%, and the AUC was 0.92. Conclusion: The proposed predictive model showed high accuracy for the classification of the degree of the differentiation of esophagus cancer. Because of the good prediction ability of the method, the method may contribute to reducing the pathological examination by biopsy and predicting the local control. Advances in knowledge: For esophageal cancer, the differentiation of degree is the import indexes reflecting the aggressiveness. The current study proposed the prediction model for the differentiation of degree with radiomics analysis.


2020 ◽  
Vol 38 (4_suppl) ◽  
pp. 456-456
Author(s):  
Yuji Murakami ◽  
Yasushi Nagata ◽  
Daisuke Kawahara

456 Background: The pathologic complete response (PCR) rate by neoadjuvant chemoradiotherapy (NCRT) for resectable locally advanced esophageal squamous cell carcinoma (ESCC) is about 40%. If we could predict a PCR from pre-treatment image data, it might be possible to select patients who can be cured by organ-preserving CRT. The purpose of this study is to construct a predictive model for PCR by NCRT in patients with locally advanced ESCC using radiomics and machine-learning. Methods: We used data of 98 ESCC patients who underwent NCRT and surgery from 2003 to 2016. Firstly, we fused the radiotherapy treatment planning CT images and PET images scanned before treatment. Then using target delineations on planning CT images, we created eight kinds of target regions on PET images. Secondly, we generated a total of 6968 features per patient using the PET image data within these target regions that were preprocessed by radiomics technique. Among them, we extracted the optimal features for machine-learning using the least absolute shrinkage and selection operator (LASSO) logistic regression. Thirdly, artificial neural networks were used as a machine-learning method to create a predictive model. The extracted radiomics features were used as input values, and the information of ‘PCR’ or ‘not PCR’ was used as output values. We used data of randomly selected 58 patients for training and constructed a predictive model. Then we used data of 15 patients to validate the models and created the optimal model. Finally, we evaluated the predictive model using the test data of 25 patients. Results: By the LASSO analysis, 32 radiomics features were extracted for machine-learning classification. This predictive model predicted pathological findings after NCRT in 24 of 25 test data. The accuracy, specificity and sensitivity in the prediction of PCR after NCRT by this predictive model were 96.0%, 93.8%, and 100%, respectively. Conclusions: A prediction model based on PET images using radiomics and machine-learning could predict pathological findings after NCRT for resectable locally advanced ESCC.


2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e18025-e18025
Author(s):  
Indranil Mallick ◽  
Saheli Saha ◽  
Sanjoy Chatterjee ◽  
Paromita Roy

e18025 Background: The current approach to neck treatment in clinical T1-2 oral cancers is to offer elective nodal dissection to all patients, despite the fact that the majority of patients are pathologically node negative. This is due to the poor predictive ability of clinico-radiological assessment and subsequently poorer survival in those in whom neck dissection is omitted based on this. A robust prediction model for pathological nodal status may allow individualized decisions for neck dissection. Our aim was to develop a multiparameter prediction model to identify pathological node-negative status using machine learning. Methods: We identified 497 patients with cT1-2 oral cancer from a single institutional database from 2011-2018 who underwent primary resection and neck dissection. We compared the sensitivity, positive predictive value and accuracy of prediction of pathologically negative neck from clinico-radiological staging alone vs. a model created from multiple parameters including clinical features (clinico-radiological nodal status, ages, sex, subsite of primary lesion) and pathological features of the resected primary tumor (maximum dimension, depth of invasion, lymphovascular invasion, perineural invasion, grade and margins of resection). The multiparameter model was built from a training dataset of the first 400 patients using an ensemble of logistic regression, random forests and support vector machines. A cohort of 97 patients was used for independent validation. Results: In this cohort 232 (47%) were clinico-radiologically node negative, while 307(62%) were pathologically node negative. The sensitivity, positive predictive value and accuracy of the clinico-radiologically assigned nodal status was 56%, 74% and 61%, while that of the multiparameter machine learning model was 87%, 89% and 89% respectively. The area under curve (AUC) of the clinico-radiological prediction was 0.62 whereas that of the multiparameter predictive model was 0.91. In the validation dataset, 58/62 pathologically node negative patients were predicted correctly by the model. The accuracy of the model on the external validation dataset was 82%. Conclusions: The performance of the multiparameter predictive model was considerably superior to clinico-radiological neck staging for prediction of pathological node negative neck. This could be validated on an independent dataset. This could be considered for prospective clinical evaluation of individualized neck dissection.


PLoS ONE ◽  
2020 ◽  
Vol 15 (11) ◽  
pp. e0242028
Author(s):  
Hiroaki Haga ◽  
Hidenori Sato ◽  
Ayumi Koseki ◽  
Takafumi Saito ◽  
Kazuo Okumoto ◽  
...  

In recent years, the development of diagnostics using artificial intelligence (AI) has been remarkable. AI algorithms can go beyond human reasoning and build diagnostic models from a number of complex combinations. Using next-generation sequencing technology, we identified hepatitis C virus (HCV) variants resistant to directing-acting antivirals (DAA) by whole genome sequencing of full-length HCV genomes, and applied these variants to various machine-learning algorithms to evaluate a preliminary predictive model. HCV genomic RNA was extracted from serum from 173 patients (109 with subsequent sustained virological response [SVR] and 64 without) before DAA treatment. HCV genomes from the 109 SVR and 64 non-SVR patients were randomly divided into a training data set (57 SVR and 29 non-SVR) and a validation-data set (52 SVR and 35 non-SVR). The training data set was subject to nine machine-learning algorithms selected to identify the optimized combination of functional variants in relation to SVR status following DAA therapy. Subsequently, the prediction model was tested by the validation-data set. The most accurate learning method was the support vector machine (SVM) algorithm (validation accuracy, 0.95; kappa statistic, 0.90; F-value, 0.94). The second-most accurate learning algorithm was Multi-layer perceptron. Unfortunately, Decision Tree, and Naive Bayes algorithms could not be fitted with our data set due to low accuracy (< 0.8). Conclusively, with an accuracy rate of 95.4% in the generalization performance evaluation, SVM was identified as the best algorithm. Analytical methods based on genomic analysis and the construction of a predictive model by machine-learning may be applicable to the selection of the optimal treatment for other viral infections and cancer.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4606
Author(s):  
Sunguk Hong ◽  
Cheoljeong Park ◽  
Seongjin Cho

Predicting the rail temperature of a railway system is important for establishing a rail management plan against railway derailment caused by orbital buckling. The rail temperature, which is directly responsible for track buckling, is closely related to air temperature, which continuously increases due to global warming effects. Moreover, railway systems are increasingly installed with continuous welded rails (CWRs) to reduce train vibration and noise. Unfortunately, CWRs are prone to buckling. This study develops a reliable and highly accurate novel model that can predict rail temperature using a machine learning method. To predict rail temperature over the entire network with high-prediction performance, the weather effect and solar effect features are used. These features originate from the analysis of the thermal environment around the rail. Precisely, the presented model has a higher performance for predicting high rail temperature than other models. As a convenient structural health-monitoring application, the train-speed-limit alarm-map (TSLAM) was also proposed, which visually maps the predicted rail-temperature deviations over the entire network for railway safety officers. Combined with TSLAM, our rail-temperature prediction model is expected to improve track safety and train timeliness.


2021 ◽  
Vol 11 (4) ◽  
pp. 1742
Author(s):  
Ignacio Rodríguez-Rodríguez ◽  
José-Víctor Rodríguez ◽  
Wai Lok Woo ◽  
Bo Wei ◽  
Domingo-Javier Pardo-Quiles

Type 1 diabetes mellitus (DM1) is a metabolic disease derived from falls in pancreatic insulin production resulting in chronic hyperglycemia. DM1 subjects usually have to undertake a number of assessments of blood glucose levels every day, employing capillary glucometers for the monitoring of blood glucose dynamics. In recent years, advances in technology have allowed for the creation of revolutionary biosensors and continuous glucose monitoring (CGM) techniques. This has enabled the monitoring of a subject’s blood glucose level in real time. On the other hand, few attempts have been made to apply machine learning techniques to predicting glycaemia levels, but dealing with a database containing such a high level of variables is problematic. In this sense, to the best of the authors’ knowledge, the issues of proper feature selection (FS)—the stage before applying predictive algorithms—have not been subject to in-depth discussion and comparison in past research when it comes to forecasting glycaemia. Therefore, in order to assess how a proper FS stage could improve the accuracy of the glycaemia forecasted, this work has developed six FS techniques alongside four predictive algorithms, applying them to a full dataset of biomedical features related to glycaemia. These were harvested through a wide-ranging passive monitoring process involving 25 patients with DM1 in practical real-life scenarios. From the obtained results, we affirm that Random Forest (RF) as both predictive algorithm and FS strategy offers the best average performance (Root Median Square Error, RMSE = 18.54 mg/dL) throughout the 12 considered predictive horizons (up to 60 min in steps of 5 min), showing Support Vector Machines (SVM) to have the best accuracy as a forecasting algorithm when considering, in turn, the average of the six FS techniques applied (RMSE = 20.58 mg/dL).


Sign in / Sign up

Export Citation Format

Share Document