Harnessing the Predictive Power of Lower-division Statistics of Cricketers to Predict Their Rates of Success at the International Level

Abstract: Statistics has always been an integral part of the sporting world. Selectors pick players based on numerous factors such as averages, strike-rates, runs scored or goals scored. Teams have exclusive ‘talent hunters’, who spend weeks, if not months, trying to uncover talent from different parts of the world. With the rise of this new niche field called Sports Analytics, teams can now perform player evaluations on tons of data that is available. This paper aims to examine the factors that truly indicate the capacity of cricket players to perform at the top-most level – international cricket. Though this research has been carried out on cricket data, it is hoped that similar methods can be used to hunt for true talent in other sports! Keywords: Cricket Analytics, Random Forest, Principal Component Analysis, Dimensionality Reduction.

Download Full-text

Towards fine-scale population stratification modeling based on kernel principal component analysis and random forest

Genes & Genomics ◽

10.1007/s13258-021-01057-4 ◽

2021 ◽

Author(s):

Weiwen Zhang ◽

Lianglun Cheng ◽

Guoheng Huang

Keyword(s):

Principal Component Analysis ◽

Random Forest ◽

Population Stratification ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Fine Scale ◽

Scale Population

Download Full-text

Analysis of bath motion in MM-SQC dynamics via dimensionality reduction approach: Principal component analysis

The Journal of Chemical Physics ◽

10.1063/5.0039743 ◽

2021 ◽

Vol 154 (9) ◽

pp. 094122

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Reduction Approach

Download Full-text

Application of visible and near-infrared spectroscopy for evaluation of ewes milk with different feeds

Animal Production Science ◽

10.1071/an17240 ◽

2019 ◽

Vol 59 (6) ◽

pp. 1190 ◽

Cited By ~ 2

Author(s):

A. Bahri ◽

S. Nawar ◽

H. Selmi ◽

M. Amraoui ◽

H. Rouissi ◽

...

Keyword(s):

Principal Component Analysis ◽

Infrared Spectroscopy ◽

Random Forest ◽

Near Infrared Spectroscopy ◽

Partial Least Squares Regression ◽

Near Infrared ◽

Principal Component ◽

Least Squares Regression ◽

Random Forest Regression ◽

Milk Samples

Rapid measurement optical techniques have the advantage over traditional methods of being faster and non-destructive. In this work visible and near-infrared spectroscopy (vis-NIRS) was used to investigate differences between measured values of key milk properties (e.g. fat, protein and lactose) in 30 samples of ewes milk according to three feed systems; faba beans, field peas and control diet. A mobile fibre-optic vis-NIR spectrophotometer (350–2500 nm) was used to collect reflectance spectra from milk samples. Principal component analysis was used to explore differences between milk samples according to the feed supplied, and a partial least-squares regression and random forest regression were adopted to develop calibration models for the prediction of milk properties. Results of the principal component analysis showed clear separation between the three groups of milk samples according to the diet of the ewes throughout the lactation period. Milk fat, protein and lactose were predicted with good accuracy by means of partial least-squares regression (R2 = 0.70–0.83 and ratio of prediction deviation, which is the ratio of standard deviation to root mean square error of prediction = 1.85–2.44). However, the best prediction results were obtained with random forest regression models (R2 = 0.86–0.90; ratio of prediction deviation = 2.73–3.26). The adoption of the vis-NIRS coupled with multivariate modelling tools can be recommended for exploring to differences between milk samples according to different feed systems, and to predict key milk properties, based particularly on the random forest regression modelling technique.

Download Full-text

New Attributes Extraction System for Arabic Autograph as Genuine and Forged through a Classification Techniques

10.5772/intechopen.96561 ◽

2021 ◽

Author(s):

Anwar Yahya Ebrahim ◽

Hoshang Kolivand

Keyword(s):

Principal Component Analysis ◽

Decision Tree ◽

Discrete Cosine Transform ◽

Principal Component ◽

Component Analysis ◽

Extraction System ◽

Decision Tree Classifier ◽

Tree Classifier ◽

The World ◽

Handwritten Recognition

The authentication of writers, handwritten autograph is widely realized throughout the world, the thorough check of the autograph is important before going to the outcome about the signer. The Arabic autograph has unique characteristics; it includes lines, and overlapping. It will be more difficult to realize higher achievement accuracy. This project attention the above difficulty by achieved selected best characteristics of Arabic autograph authentication, characterized by the number of attributes representing for each autograph. Where the objective is to differentiate if an obtain autograph is genuine, or a forgery. The planned method is based on Discrete Cosine Transform (DCT) to extract feature, then Spars Principal Component Analysis (SPCA) to selection significant attributes for Arabic autograph handwritten recognition to aid the authentication step. Finally, decision tree classifier was achieved for signature authentication. The suggested method DCT with SPCA achieves good outcomes for Arabic autograph dataset when we have verified on various techniques.

Download Full-text

Physical-oriented and machine learning-based emission modeling in a diesel compression ignition engine: Dimensionality reduction and regression

International Journal of Engine Research ◽

10.1177/14680874211070736 ◽

2022 ◽

pp. 146808742110707

Author(s):

Aran Mohammad ◽

Reza Rezaei ◽

Christopher Hayduk ◽

Thaddaeus Delebinski ◽

Saeid Shahpouri ◽

...

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Factor Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Support Vector ◽

Emission Models ◽

Emission Modeling

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.

Download Full-text

Dimensionality Reduction with Principal Component Analysis

Mathematics for Machine Learning ◽

10.1017/9781108679930.012 ◽

2020 ◽

pp. 286-313

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis

Download Full-text

What Do Global Metrics Tell Us about the World?

Social Sciences ◽

10.3390/socsci8050136 ◽

2019 ◽

Vol 8 (5) ◽

pp. 136

Author(s):

John Rennie Short ◽

Justin Vélez-Hagan ◽

Leah Dubots

Keyword(s):

Principal Component Analysis ◽

Economic Development ◽

Principal Component ◽

Component Analysis ◽

Group Classification ◽

Social Progress ◽

The World ◽

Global Indicators ◽

Underlying Dimension ◽

Degree Of Overlap

There are now a wide variety of global indicators that measure different economic, political and social attributes of countries in the world. This paper seeks to answer two questions. First, what is the degree of overlap between these different measures? Are they, in fact, measuring the same underlying dimension? To answer this question, we employ a principal component analysis (PCA) to 15 indices across 145 countries. The results demonstrate that there is one underlying dimension that combines economic development and social progress with state stability. Second, how do countries score on this dimension? The results of the PCA allow us to produce categorical divisions of the world. The threefold division identifies a world composed of what we describe and map as rich, poor and middle countries. A five-group classification provided a more nuanced categorization described as: The very rich, free and stable; affluent and free; upper middle; lower middle; poor and not free.

Download Full-text

Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2018.v09.i03.p08 ◽

2018 ◽

pp. 192 ◽

Cited By ~ 2

Author(s):

Ade Jamal ◽

Annisa Handayani ◽

Ali Akbar Septiandri ◽

Endang Ripmiatin ◽

Yunus Effendi

Keyword(s):

Breast Cancer ◽

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Gradient Boosting ◽

Support Vector ◽

Breast Cancer Dataset ◽

Cancer Prediction ◽

Extreme Gradient Boosting

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.

Download Full-text

Dimensionality reduction using Principal Component Analysis for network intrusion detection

Perspectives in Science ◽

10.1016/j.pisc.2016.05.010 ◽

2016 ◽

Vol 8 ◽

pp. 510-512 ◽

Cited By ~ 48

Author(s):

K. Keerthi Vasan ◽

B. Surendiran

Keyword(s):

Principal Component Analysis ◽

Intrusion Detection ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Network Intrusion Detection ◽

Network Intrusion

Download Full-text

Hyperspectral Dimensionality Reduction Based on Multiscale Superpixelwise Kernel Principal Component Analysis

Remote Sensing ◽

10.3390/rs11101219 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1219 ◽

Cited By ~ 4

Author(s):

Lan Zhang ◽

Hongjun Su ◽

Jingwei Shen

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Components ◽

Classification Accuracy ◽

Hyperspectral Image ◽

Principal Component ◽

Component Analysis ◽

Homogeneous Region ◽

Kernel Principal Component Analysis ◽

Nonlinear Features

Dimensionality reduction (DR) is an important preprocessing step in hyperspectral image applications. In this paper, a superpixelwise kernel principal component analysis (SuperKPCA) method for DR that performs kernel principal component analysis (KPCA) on each homogeneous region is proposed to fully utilize the KPCA’s ability to acquire nonlinear features. Moreover, for the proposed method, the differences in the DR results obtained based on different fundamental images (the first principal components obtained by principal component analysis (PCA), KPCA, and minimum noise fraction (MNF)) are compared. Extensive experiments show that when 5, 10, 20, and 30 samples from each class are selected, for the Indian Pines, Pavia University, and Salinas datasets: (1) when the most suitable fundamental image is selected, the classification accuracy obtained by SuperKPCA can be increased by 0.06%–0.74%, 3.88%–4.37%, and 0.39%–4.85%, respectively, when compared with SuperPCA, which performs PCA on each homogeneous region; (2) the DR results obtained based on different first principal components are different and complementary. By fusing the multiscale classification results obtained based on different first principal components, the classification accuracy can be increased by 0.54%–2.68%, 0.12%–1.10%, and 0.01%–0.08%, respectively, when compared with the method based only on the most suitable fundamental image.

Download Full-text