scholarly journals Harnessing the Predictive Power of Lower-division Statistics of Cricketers to Predict Their Rates of Success at the International Level

Author(s):  
Vishal C V

Abstract: Statistics has always been an integral part of the sporting world. Selectors pick players based on numerous factors such as averages, strike-rates, runs scored or goals scored. Teams have exclusive ‘talent hunters’, who spend weeks, if not months, trying to uncover talent from different parts of the world. With the rise of this new niche field called Sports Analytics, teams can now perform player evaluations on tons of data that is available. This paper aims to examine the factors that truly indicate the capacity of cricket players to perform at the top-most level – international cricket. Though this research has been carried out on cricket data, it is hoped that similar methods can be used to hunt for true talent in other sports! Keywords: Cricket Analytics, Random Forest, Principal Component Analysis, Dimensionality Reduction.

2019 ◽  
Vol 59 (6) ◽  
pp. 1190 ◽  
Author(s):  
A. Bahri ◽  
S. Nawar ◽  
H. Selmi ◽  
M. Amraoui ◽  
H. Rouissi ◽  
...  

Rapid measurement optical techniques have the advantage over traditional methods of being faster and non-destructive. In this work visible and near-infrared spectroscopy (vis-NIRS) was used to investigate differences between measured values of key milk properties (e.g. fat, protein and lactose) in 30 samples of ewes milk according to three feed systems; faba beans, field peas and control diet. A mobile fibre-optic vis-NIR spectrophotometer (350–2500 nm) was used to collect reflectance spectra from milk samples. Principal component analysis was used to explore differences between milk samples according to the feed supplied, and a partial least-squares regression and random forest regression were adopted to develop calibration models for the prediction of milk properties. Results of the principal component analysis showed clear separation between the three groups of milk samples according to the diet of the ewes throughout the lactation period. Milk fat, protein and lactose were predicted with good accuracy by means of partial least-squares regression (R2 = 0.70–0.83 and ratio of prediction deviation, which is the ratio of standard deviation to root mean square error of prediction = 1.85–2.44). However, the best prediction results were obtained with random forest regression models (R2 = 0.86–0.90; ratio of prediction deviation = 2.73–3.26). The adoption of the vis-NIRS coupled with multivariate modelling tools can be recommended for exploring to differences between milk samples according to different feed systems, and to predict key milk properties, based particularly on the random forest regression modelling technique.


2021 ◽  
Author(s):  
Anwar Yahya Ebrahim ◽  
Hoshang Kolivand

The authentication of writers, handwritten autograph is widely realized throughout the world, the thorough check of the autograph is important before going to the outcome about the signer. The Arabic autograph has unique characteristics; it includes lines, and overlapping. It will be more difficult to realize higher achievement accuracy. This project attention the above difficulty by achieved selected best characteristics of Arabic autograph authentication, characterized by the number of attributes representing for each autograph. Where the objective is to differentiate if an obtain autograph is genuine, or a forgery. The planned method is based on Discrete Cosine Transform (DCT) to extract feature, then Spars Principal Component Analysis (SPCA) to selection significant attributes for Arabic autograph handwritten recognition to aid the authentication step. Finally, decision tree classifier was achieved for signature authentication. The suggested method DCT with SPCA achieves good outcomes for Arabic autograph dataset when we have verified on various techniques.


2022 ◽  
pp. 146808742110707
Author(s):  
Aran Mohammad ◽  
Reza Rezaei ◽  
Christopher Hayduk ◽  
Thaddaeus Delebinski ◽  
Saeid Shahpouri ◽  
...  

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.


2019 ◽  
Vol 8 (5) ◽  
pp. 136
Author(s):  
John Rennie Short ◽  
Justin Vélez-Hagan ◽  
Leah Dubots

There are now a wide variety of global indicators that measure different economic, political and social attributes of countries in the world. This paper seeks to answer two questions. First, what is the degree of overlap between these different measures? Are they, in fact, measuring the same underlying dimension? To answer this question, we employ a principal component analysis (PCA) to 15 indices across 145 countries. The results demonstrate that there is one underlying dimension that combines economic development and social progress with state stability. Second, how do countries score on this dimension? The results of the PCA allow us to produce categorical divisions of the world. The threefold division identifies a world composed of what we describe and map as rich, poor and middle countries. A five-group classification provided a more nuanced categorization described as: The very rich, free and stable; affluent and free; upper middle; lower middle; poor and not free.


Author(s):  
Ade Jamal ◽  
Annisa Handayani ◽  
Ali Akbar Septiandri ◽  
Endang Ripmiatin ◽  
Yunus Effendi

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.


2019 ◽  
Vol 11 (10) ◽  
pp. 1219 ◽  
Author(s):  
Lan Zhang ◽  
Hongjun Su ◽  
Jingwei Shen

Dimensionality reduction (DR) is an important preprocessing step in hyperspectral image applications. In this paper, a superpixelwise kernel principal component analysis (SuperKPCA) method for DR that performs kernel principal component analysis (KPCA) on each homogeneous region is proposed to fully utilize the KPCA’s ability to acquire nonlinear features. Moreover, for the proposed method, the differences in the DR results obtained based on different fundamental images (the first principal components obtained by principal component analysis (PCA), KPCA, and minimum noise fraction (MNF)) are compared. Extensive experiments show that when 5, 10, 20, and 30 samples from each class are selected, for the Indian Pines, Pavia University, and Salinas datasets: (1) when the most suitable fundamental image is selected, the classification accuracy obtained by SuperKPCA can be increased by 0.06%–0.74%, 3.88%–4.37%, and 0.39%–4.85%, respectively, when compared with SuperPCA, which performs PCA on each homogeneous region; (2) the DR results obtained based on different first principal components are different and complementary. By fusing the multiscale classification results obtained based on different first principal components, the classification accuracy can be increased by 0.54%–2.68%, 0.12%–1.10%, and 0.01%–0.08%, respectively, when compared with the method based only on the most suitable fundamental image.


Sign in / Sign up

Export Citation Format

Share Document