Development of a Comprehensive Multi-Factor Method for Comparing Batting Performances in One-Day International Cricket

In cricket, one-day-international (ODI) batsmen have traditionally been compared on the dimensions of batting average (BA) and strike rate (SR). The conventional method of computing BA assumes that runs scored by a batsman follow an exponential or geometric distribution. This results in unreasonably equating batting inconsistency with batting mean. Our study shows that a Weibull distribution model gives a very sensible assessment of a batsman’s inconsistency, independent of his BA. It also provides a superior fit to batting scores of ODI batsmen. We also introduce a measure for ‘quality-runs’ scored by a batsman which takes into account the difficulty level of opposition. Additionally, longevity index and opposition diversity index are defined to make comparisons more holistic. A substantial amount of data engineering effort is made in segregating available data into home, away and neutral matches. The measures proposed in this paper are more comprehensive and granular than those found in the literature. Various combinations of these six criteria are used to rank a select group of great ODI batsmen by assigning objective weights derived from principal component analysis. Finally, multivariate statistical outlier detection procedure produces different lists of outstanding players corresponding to different combinations of criteria. Our proposed methodology may be gainfully used by a team management to select best batsmen in a given situation.

Download Full-text

Statistical Analsysis to Evaluate Heavy Metal Pollution in the Air Obatained by Moss Technique in Hanoi and its Surrounding Region

Communications in Physics ◽

10.15625/0868-3166/29/3si/14336 ◽

2019 ◽

Vol 29 (3SI) ◽

pp. 411

Author(s):

N. H. Quyet ◽

Le Hong Khiem ◽

V. D. Quan ◽

T. T. T. My ◽

M. V. Frontasieva ◽

...

Keyword(s):

Heavy Metal ◽

Statistical Analysis ◽

Metal Pollution ◽

Heavy Metal Pollution ◽

Principal Component ◽

Multivariate Statistical ◽

Five Factors ◽

Heavy Metal Elements ◽

Surrounding Areas ◽

Potential Pollution

The aim of this paper was the application of statistical analysis including principal component analysis to evaluate heavy metal pollution obtained by moss technique in the air of Ha Noi and its surrounding areas and to evaluate potential pollution sources. The concentrations of 33 heavy metal elements in 27 samples of Barbula Indica moss in the investigated region collected in December of 2016 in the investigated area have been examined using multivariate statistical analysis. Five factors explaining 80% of the total variance were identified and their potential sources have been discussed.

Download Full-text

It is better an approximate answer to the right question than the exact answer to the wrong question : the case of the psychometric analysis of the ASQ:SE

10.31234/osf.io/a5tdf ◽

2020 ◽

Author(s):

Luis Anunciacao ◽

janet squires ◽

J. Landeira-Fernandez

Keyword(s):

Internal Structure ◽

Statistical Methods ◽

Principal Component ◽

Psychological Theory ◽

Published Data ◽

Multivariate Statistical ◽

Exact Answer ◽

Wide Range ◽

Ages And Stages Questionnaire ◽

The Right

One of the main activities in psychometrics is to analyze the internal structure of a test. Multivariate statistical methods, including Exploratory Factor analysis (EFA) and Principal Component Analysis (PCA) are frequently used to do this, but the growth of Network Analysis (NA) places this method as a promising candidate. The results obtained by these methods are of valuable interest, as they not only produce evidence to explore if the test is measuring its intended construct, but also to deal with the substantive theory that motivated the test development. However, these different statistical methods come up with different answers, providing the basis for different analytical and theoretical strategies when one needs to choose a solution. In this study, we took advantage of a large volume of published data (n = 22,331) obtained by the Ages and Stages Questionnaire Social-Emotional (ASQ:SE), and formed a subset of 500 children to present and discuss alternative psychometric solutions to its internal structure, and also to its subjacent theory. The analyses were based on a polychoric matrix, the number of factors to retain followed several well-known rules of thumb, and a wide range of exploratory methods was fitted to the data, including EFA, PCA, and NA. The statistical outcomes were divergent, varying from 1 to 6 domains, allowing a flexible interpretation of the results. We argue that the use of statistical methods in the absence of a well-grounded psychological theory has limited applications, despite its appeal. All data and codes are available at https://osf.io/z6gwv/.

Download Full-text

Characterization of the Volatile Profile of Cultivated and Wild-Type Italian Celery (Apium graveolens L.) Varieties by HS-SPME/GC-MS

Applied Sciences ◽

10.3390/app11135855 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5855

Author(s):

Samantha Reale ◽

Valter Di Cecco ◽

Francesca Di Donato ◽

Luciano Di Martino ◽

Aurelio Manzi ◽

...

Keyword(s):

Solid Phase ◽

Principal Component ◽

Apium Graveolens ◽

Gas Chromatography Mass Spectrometry ◽

Bioactive Metabolites ◽

Chemical Components ◽

Volatile Profile ◽

Wild Type ◽

Multivariate Statistical ◽

Clear Differentiation

Celery (Apium graveolens L.) is a vegetable belonging to the Apiaceae family that is widely used for its distinct flavor and contains a variety of bioactive metabolites with healthy properties. Some celery ecotypes cultivated in specific territories of Italy have recently attracted the attention of consumers and scientists because of their peculiar sensorial and nutritional properties. In this work, the volatile profiles of white celery “Sedano Bianco di Sperlonga” Protected Geographical Indication (PGI) ecotype, black celery “Sedano Nero di Torricella Peligna” and wild-type celery were investigated using head-space solid-phase microextraction combined with gas-chromatography/mass spectrometry (HS-SPME/GC-MS) and compared to that of the common ribbed celery. Exploratory multivariate statistical analyses were conducted using principal component analysis (PCA) on HS-SPME/GC-MS patterns, separately collected from celery leaves and petioles, to assess similarity/dissimilarity in the flavor composition of the investigated varieties. PCA revealed a clear differentiation of wild-type celery from the cultivated varieties. Among the cultivated varieties, black celery “Sedano Nero di Torricella Peligna” exhibited a significantly different composition in volatile profile in both leaves and petioles compared to the white celery and the prevalent commercial variety. The chemical components of aroma, potentially useful for the classification of celery according to the variety/origin, were identified.

Download Full-text

Chemical Fingerprinting of Cryptic Species and Genetic Lineages of Aneura pinguis (L.) Dumort. (Marchantiophyta, Metzgeriidae)

Molecules ◽

10.3390/molecules26041180 ◽

2021 ◽

Vol 26 (4) ◽

pp. 1180

Author(s):

Rafał Wawrzyniak ◽

Wiesław Wasiak ◽

Beata Jasiewicz ◽

Alina Bączkiewicz ◽

Katarzyna Buczkowska

Keyword(s):

Cluster Analysis ◽

Cryptic Species ◽

Plant Material ◽

Chemical Constituents ◽

Principal Component ◽

Molecular Data ◽

Rapid Identification ◽

Multivariate Statistical ◽

Genetic Lineages ◽

Marker Compounds

Aneura pinguis (L.) Dumort. is a representative of the simple thalloid liverworts, one of the three main types of liverwort gametophytes. According to classical taxonomy, A. pinguis represents one morphologically variable species; however, genetic data reveal that this species is a complex consisting of 10 cryptic species (named by letters from A to J), of which four are further subdivided into two or three evolutionary lineages. The objective of this work was to develop an efficient method for the characterisation of plant material using marker compounds. The volatile chemical constituents of cryptic species within the liverwort A. pinguis were analysed by GC-MS. The compounds were isolated from plant material using the HS-SPME technique. Of the 66 compounds examined, 40 were identified. Of these 40 compounds, nine were selected for use as marker compounds of individual cryptic species of A. pinguis. A guide was then developed that clarified how these markers could be used for the rapid identification of the genetic lineages of A. pinguis. Multivariate statistical analyses (principal component and cluster analysis) revealed that the chemical compounds in A. pinguis made it possible to distinguish individual cryptic species (including genetic lineages), with the exception of cryptic species G and H. The classification of samples based on the volatile compounds by cluster analysis reflected phylogenetic relationships between cryptic species and genetic lineages of A. pinguis revealed based on molecular data.

Download Full-text

Multivariate Analysis as a Tool for Quantification of Conformational Transitions in DNA Thin Films

Applied Sciences ◽

10.3390/app11135895 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5895

Author(s):

Kristina Serec ◽

Sanja Dolanski Babić

Keyword(s):

Thin Films ◽

Learning Algorithm ◽

Principal Component Regression ◽

Principal Component ◽

Conformational Transitions ◽

Cancer Diagnostics ◽

Dna Conformation ◽

Support Vector ◽

Multivariate Statistical ◽

The Impact

The double-stranded B-form and A-form have long been considered the two most important native forms of DNA, each with its own distinct biological roles and hence the focus of many areas of study, from cellular functions to cancer diagnostics and drug treatment. Due to the heterogeneity and sensitivity of the secondary structure of DNA, there is a need for tools capable of a rapid and reliable quantification of DNA conformation in diverse environments. In this work, the second paper in the series that addresses conformational transitions in DNA thin films utilizing FTIR spectroscopy, we exploit popular chemometric methods: the principal component analysis (PCA), support vector machine (SVM) learning algorithm, and principal component regression (PCR), in order to quantify and categorize DNA conformation in thin films of different hydrated states. By complementing FTIR technique with multivariate statistical methods, we demonstrate the ability of our sample preparation and automated spectral analysis protocol to rapidly and efficiently determine conformation in DNA thin films based on the vibrational signatures in the 1800–935 cm−1 range. Furthermore, we assess the impact of small hydration-related changes in FTIR spectra on automated DNA conformation detection and how to avoid discrepancies by careful sampling.

Download Full-text

STATISTICAL MULTIVARIATE ANALYSIS APPLIED TO ENVIRONMENTAL CHARACTERIZATION OF SOIL IN SEMIARID REGION1

Revista Caatinga ◽

10.1590/1983-21252019v32n120rc ◽

2019 ◽

Vol 32 (1) ◽

pp. 200-210

Author(s):

Antônio Italcy de Oliveira Júnior ◽

Luiz Alberto Ribeiro Mendonça ◽

Sávio de Brito Fontenele ◽

Adriana Oliveira Araújo ◽

Maria Gorethe de Sousa Lima Brito

Keyword(s):

Organic Matter Content ◽

Principal Component ◽

Matter Content ◽

Multivariate Statistical ◽

Statistical Tool ◽

Environmental Characterization ◽

Hierarchical Grouping ◽

Two Factors ◽

Total Data

ABSTRACT Soil is a dynamic and complex system that requires a considerable number of samples for analysis and research purposes. Using multivariate statistical methods, favorable conditions can be created by analyzing the samples, i.e., structural reduction and simplification of the data. The objective of this study was to use multivariate statistical analysis, including factorial analysis (FA) and hierarchical groupings, for the environmental characterization of soils in semiarid regions, considering anthropic (land use and occupation) and topographic aspects (altitude, moisture, granulometry, PR, and organic-matter content). As a case study, the São José Hydrographic Microbasin, which is located in the Cariri region of Ceará, was considered. An FA was performed using the principal component method, with normalized varimax rotation. In hierarchical grouping analysis, the “farthest neighbor” method was used as the hierarchical criterion for grouping, with the measure of dissimilarity given by the “square Euclidean distance.” The FA indicated that two factors explain 75.76% of the total data variance. In the analysis of hierarchical groupings, the samples were agglomerated in three groups with similar characteristics: one with samples collected in an area of the preserved forest and two with samples collected in areas with more anthropized soils. This indicates that the statistical tool used showed sensitivity to distinguish the most conserved soils and soils with different levels of anthropization.

Download Full-text

Continuum Power CCA: A Unified Approach for Isolating Coupled Modes

Journal of Climate ◽

10.1175/jcli-d-14-00451.1 ◽

2015 ◽

Vol 28 (3) ◽

pp. 1016-1030 ◽

Cited By ~ 2

Author(s):

Erik Swenson

Keyword(s):

Signal To Noise Ratio ◽

Full Range ◽

Synthetic Data ◽

Principal Component Regression ◽

Principal Component ◽

Accurate Estimate ◽

Unified Approach ◽

Coupled Modes ◽

Multivariate Statistical ◽

Sample Covariance

Abstract Various multivariate statistical methods exist for analyzing covariance and isolating linear relationships between datasets. The most popular linear methods are based on singular value decomposition (SVD) and include canonical correlation analysis (CCA), maximum covariance analysis (MCA), and redundancy analysis (RDA). In this study, continuum power CCA (CPCCA) is introduced as one extension of continuum power regression for isolating pairs of coupled patterns whose temporal variation maximizes the squared covariance between partially whitened variables. Similar to the whitening transformation, the partial whitening transformation acts to decorrelate individual variables but only to a partial degree with the added benefit of preconditioning sample covariance matrices prior to inversion, providing a more accurate estimate of the population covariance. CPCCA is a unified approach in the sense that the full range of solutions bridges CCA, MCA, RDA, and principal component regression (PCR). Recommended CPCCA solutions include a regularization for CCA, a variance bias correction for MCA, and a regularization for RDA. Applied to synthetic data samples, such solutions yield relatively higher skill in isolating known coupled modes embedded in noise. Provided with some crude prior expectation of the signal-to-noise ratio, the use of asymmetric CPCCA solutions may be justifiable and beneficial. An objective parameter choice is offered for regularization with CPCCA based on the covariance estimate of O. Ledoit and M. Wolf, and the results are quite robust. CPCCA is encouraged for a range of applications.

Download Full-text

Penerapan Kansei Engineering dalam Perbandingan Desain Aplikasi Mobile Marketplace di Indonesia

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v6i2.2705 ◽

2020 ◽

Vol 6 (2) ◽

Author(s):

Nucky Vilano ◽

Setia Budi

Keyword(s):

Principal Component Analysis ◽

Factor Analysis ◽

Mobile Application ◽

Principal Component ◽

Kansei Engineering ◽

Multivariate Statistical ◽

Christian University ◽

Emotional Factors ◽

Coefficient Corrélation ◽

Major Factors

The company's application design is very important because it displays the company's image and to attract more users to purchase/utilize the application. This research applies Kansei Engineering Method to analyze the emotion or feelings of the user towards the design of a mobile application interface. Six Kansei Words and three specimens are utilised in this research, where Kansei words are selected from words related to user experience. The participants of this research consist of 54 students from Maranatha Christian University. Participants’ responses are studied using multivariate statistical analysis (e.g., Coefficient Correlation Analysis, Principal Component Analysis, and Factor Analysis). This study explores the emotional factors that occur in designing an application. This analysis shows that there are some major factors that greatly influence the design of a mobile application interface.

Download Full-text

Evaluation of the anthropogenic impact in Suat Ugurlu Dam lake using multivariate statistical techniques

Global NEST Journal ◽

10.30955/gnj.002387 ◽

2018 ◽

Vol 20 (1) ◽

pp. 161-168 ◽

Cited By ~ 1

Keyword(s):

Anthropogenic Impact ◽

Sediment Quality ◽

Principal Component ◽

Statistical Techniques ◽

Multivariate Statistical Techniques ◽

Cluster Number ◽

Multivariate Statistical ◽

Oxidation Reduction ◽

Oxidation Reduction Potential ◽

And Cluster Analysis

Sediments play an important role in the quality of aquatic ecosystems in the Dam Lake where they can either be a sink or a source of contaminants, depending on the management. This purpose of this study is to identify the sediment quality in order to find out the causes for the malodor and the eutrophication that is causing a bad scenario. Solutions for improving the dam are proposed. Multivariate statistical techniques, such as a principal component analysis (PCA) and cluster analysis (CA), were applied to the data regarding sediment quality in relation to anthropogenic impact in Suat Ugurlu Dam Lake. This data was generated during 2014-2015, with monitoring at four sites for 11 parameters. A PCA and CA were used in the study of the samples. The total variance of 84.1%, 74.3%, 87.4% and 91.5% suggest 4, 3, 3 and 4 principle components (PCs) in the four locations: LC1, LC2, LC3 and LC4, respectively. Also, a CA was applied to both the variables and the observations. Some variables and observations showed a high similarity based on the results of variables in the CA. Also, the similarity ratio of temperature-mercury (Hg) and oxidation reduction potential (ORP) was high and generally, the cluster number of variables was 5, according to the selected similarity level.

Download Full-text

Quality Assessment of Goldenrod, Milkweed and Multifloral Honeys Based on Botanical Origin, Antioxidant Capacity and Mineral Content

International Journal of Molecular Sciences ◽

10.3390/ijms23020769 ◽

2022 ◽

Vol 23 (2) ◽

pp. 769

Author(s):

Marianna Kocsis ◽

Alexandra Bodó ◽

Tamás Kőszegi ◽

Rita Csepregi ◽

Rita Filep ◽

...

Keyword(s):

Antioxidant Capacity ◽

Mineral Content ◽

Principal Component ◽

Pollen Spectrum ◽

Botanical Origin ◽

Multivariate Statistical ◽

Zn Content ◽

Five Elements ◽

Multifloral Honey ◽

Melissopalynological Analysis

The goal of the study was to evaluate the pollen spectrum, antioxidant capacity and mineral content of four Hungarian honey types, using multivariate statistical analysis. The light colored honeys were represented by milkweed honey and a multifloral (MF) honey with dominant pollen frequency of linden (MF-Tilia); the darker ones were goldenrod honey and a multifloral honey with Lamiaceae pollen majority (MF-Lamiaceae). The pollen spectrum of the samples was established with melissopalynological analysis. The absorbance of the honeys positively correlated with the antioxidant capacity determined with three of the used methods (TRC, TEAC, DPPH), but not with ORAC. The latter method correlated negatively also with other antioxidant methods and with most of the mineral values. MF-Tilia had high ORAC value, K and Na content. The MF-Lamiaceae had the highest K, Mg, P, S, Cu and Zn content, the last five elements showing strict correlation with the TRC method. The darker goldenrod honey had higher SET values and total mineral content, than the milkweed honey. The above character-sets facilitate identification of each honey type and serve as indicators of variety. The antioxidant levels and mineral content of honeys allowed their clear separation by principal component analysis (PCA).

Download Full-text