Some Results from Classical Statistics

Author(s):  
Jon Wakefield
Keyword(s):  
Author(s):  
Zaigham Tahir ◽  
Hina Khan ◽  
Muhammad Aslam ◽  
Javid Shabbir ◽  
Yasar Mahmood ◽  
...  

AbstractAll researches, under classical statistics, are based on determinate, crisp data to estimate the mean of the population when auxiliary information is available. Such estimates often are biased. The goal is to find the best estimates for the unknown value of the population mean with minimum mean square error (MSE). The neutrosophic statistics, generalization of classical statistics tackles vague, indeterminate, uncertain information. Thus, for the first time under neutrosophic statistics, to overcome the issues of estimation of the population mean of neutrosophic data, we have developed the neutrosophic ratio-type estimators for estimating the mean of the finite population utilizing auxiliary information. The neutrosophic observation is of the form $${Z}_{N}={Z}_{L}+{Z}_{U}{I}_{N}\, {\rm where}\, {I}_{N}\in \left[{I}_{L}, {I}_{U}\right], {Z}_{N}\in [{Z}_{l}, {Z}_{u}]$$ Z N = Z L + Z U I N where I N ∈ I L , I U , Z N ∈ [ Z l , Z u ] . The proposed estimators are very helpful to compute results when dealing with ambiguous, vague, and neutrosophic-type data. The results of these estimators are not single-valued but provide an interval form in which our population parameter may have more chance to lie. It increases the efficiency of the estimators, since we have an estimated interval that contains the unknown value of the population mean provided a minimum MSE. The efficiency of the proposed neutrosophic ratio-type estimators is also discussed using neutrosophic data of temperature and also by using simulation. A comparison is also conducted to illustrate the usefulness of Neutrosophic Ratio-type estimators over the classical estimators.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Abdulkadir Canatar ◽  
Blake Bordelon ◽  
Cengiz Pehlevan

AbstractA theoretical understanding of generalization remains an open problem for many machine learning models, including deep networks where overparameterization leads to better performance, contradicting the conventional wisdom from classical statistics. Here, we investigate generalization error for kernel regression, which, besides being a popular machine learning method, also describes certain infinitely overparameterized neural networks. We use techniques from statistical mechanics to derive an analytical expression for generalization error applicable to any kernel and data distribution. We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep networks in the infinite-width limit. We elucidate an inductive bias of kernel regression to explain data with simple functions, characterize whether a kernel is compatible with a learning task, and show that more data may impair generalization when noisy or not expressible by the kernel, leading to non-monotonic learning curves with possibly many peaks.


1988 ◽  
Vol 68 (2) ◽  
pp. 209-221 ◽  
Author(s):  
C. Chang ◽  
T. G. SOMMERFELDT ◽  
T. ENTZ

Knowledge of the variability of soluble salt content in saline soils can assist in designing experiments or developing management practices to manage and reclaim salt-affected soils. Geostatistical theory enables the use of spatial dependence of soil properties to obtain information about locations in the field that are not actually measured, but classical statistical methods do not consider spatial correlation and the relative location of samples. A study was carried out using both classical statistics and geostatistical methods to delineate salinity and sand content and their variability in a small area of irrigated saline soil. Soil samples were taken for electrical conductivity (EC) and particle size distribution determinations at 64 locations from a 20 × 25-m area, on an 8 × 8-grid pattern at depth intervals of 0–15, 15–30, 30–60, 60–90 and 90–120 cm. The high coefficient of variation (CV) values of both EC and sand content indicated that the soil was highly variable with respect to these soil properties. The semivariograms of sand content of the first two depth intervals and EC of all the depth intervals showed strong spatial relationships. Contour maps, generated by block kriging, based on spatial relationships provide estimated variances which are smaller than general variances calculated by the classical statistical method. The interpolated EC results by both ordinary and universal kriging methods were compared and were almost identical. The kriged maps can provide information useful for designing experiments and for determining soil sampling strategy. Key words: Salinity, texture, variability, geostatistics, semivariogram, kriging


2020 ◽  
Vol 34 (1) ◽  
pp. 52-67 ◽  
Author(s):  
Igor Himelfarb ◽  
Margaret A. Seron ◽  
John K. Hyland ◽  
Andrew R. Gow ◽  
Nai-En Tang ◽  
...  

Objective: This article introduces changes made to the diagnostic imaging (DIM) domain of the Part IV of the National Board of Chiropractic Examiners examination and evaluates the effects of these changes in terms of item functioning and examinee performance. Methods: To evaluate item function, classical test theory and item response theory (IRT) methods were employed. Classical statistics were used for the assessment of item difficulty and the relation to the total test score. Item difficulties along with item discrimination were calculated using IRT. We also studied the decision accuracy of the redesigned DIM domain. Results: The diagnostic item analysis revealed similarity in item function across test forms and across administrations. The IRT models found a reasonable fit to the data. The averages of the IRT parameters were similar across test forms and across administrations. The classification of test takers into ability (theta) categories was consistent across groups (both norming and all examinees), across all test forms, and across administrations. Conclusion: This research signifies a first step in the evaluation of the transition to digital DIM high-stakes assessments. We hope that this study will spur further research into evaluations of the ability to interpret radiographic images. In addition, we hope that the results prove to be useful for chiropractic faculty, chiropractic students, and the users of Part IV scores.


2021 ◽  
Vol 15 ◽  
Author(s):  
Pedro Pozo-Jimenez ◽  
Javier Lucas-Romero ◽  
Jose A. Lopez-Garcia

As multielectrode array technology increases in popularity, accessible analytical tools become necessary. Simultaneous recordings from multiple neurons may produce huge amounts of information. Traditional tools based on classical statistics are either insufficient to analyze multiple spike trains or sophisticated and expensive in computing terms. In this communication, we put to the test the idea that AI algorithms may be useful to gather information about the effective connectivity of neurons in local nuclei at a relatively low computing cost. To this end, we decided to explore the capacity of the algorithm C5.0 to retrieve information from a large series of spike trains obtained from a simulated neuronal circuit with a known structure. Combinatory, iterative and recursive processes using C5.0 were built to examine possibilities of increasing the performance of a direct application of the algorithm. Furthermore, we tested the applicability of these processes to a reduced dataset obtained from original biological recordings with unknown connectivity. This was obtained in house from a mouse in vitro preparation of the spinal cord. Results show that this algorithm can retrieve neurons monosynaptically connected to the target in simulated datasets within a single run. Iterative and recursive processes can identify monosynaptic neurons and disynaptic neurons under favorable conditions. Application of these processes to the biological dataset gives clues to identify neurons monosynaptically connected to the target. We conclude that the work presented provides substantial proof of concept for the potential use of AI algorithms to the study of effective connectivity.


In a paper called "The Chemical constant of Hydrogen Vapour and the failure of Nernst's Heat Theorem," R. H. Fowler has investigated the vapour pressure of hydrogen crystals at low temperature; taking account of the existence of two sorts of hydrogen molecules, namely, ortho-hydrogen with even rotational quantum numbers and para-hydrogen with odd rotational quantum numbers, which retain their individuality over long periods at very low temperatures. By the use of the classical statistics, he was able to show that at very low temperatures hydrogen, as obtained by cooling hydrogen gas from ordinary temperatures, ought to have very nearly the experimentally observed chemical constant. Since the theory of the specific heat of hydrogen yielded correct values at low temperatures, it followed that at ordinary temperatures also his theory would yield a correct value for the chemical constant. Finally from the form of the partition function for hydrogen gas, Fowler attempted to obtain inferences concerning the validity of Nernst's heat theorem. By the use of the classical statistics fairly accurate results were obtained. But we shall find that when we make use of the Einstein-Bose statistics-the correct statistics for an assembly of hydrogen moleclues-a result will be obtained for the vapour pressure of hydrogen crystals at low temperatures which will furnish a value for the chemical constant of hydrogen in even closer agreement with experiment than Fowler's result.


2021 ◽  
Vol 73 (03) ◽  
pp. 25-30
Author(s):  
Srikanta Mishra ◽  
Jared Schuetter ◽  
Akhil Datta-Gupta ◽  
Grant Bromhal

Algorithms are taking over the world, or so we are led to believe, given their growing pervasiveness in multiple fields of human endeavor such as consumer marketing, finance, design and manufacturing, health care, politics, sports, etc. The focus of this article is to examine where things stand in regard to the application of these techniques for managing subsurface energy resources in domains such as conventional and unconventional oil and gas, geologic carbon sequestration, and geothermal energy. It is useful to start with some definitions to establish a common vocabulary. Data analytics (DA)—Sophisticated data collection and analysis to understand and model hidden patterns and relationships in complex, multivariate data sets Machine learning (ML)—Building a model between predictors and response, where an algorithm (often a black box) is used to infer the underlying input/output relationship from the data Artificial intelligence (AI)—Applying a predictive model with new data to make decisions without human intervention (and with the possibility of feedback for model updating) Thus, DA can be thought of as a broad framework that helps determine what happened (descriptive analytics), why it happened (diagnostic analytics), what will happen (predictive analytics), or how can we make something happen (prescriptive analytics) (Sankaran et al. 2019). Although DA is built upon a foundation of classical statistics and optimization, it has increasingly come to rely upon ML, especially for predictive and prescriptive analytics (Donoho 2017). While the terms DA, ML, and AI are often used interchangeably, it is important to recognize that ML is basically a subset of DA and a core enabling element of the broader application for the decision-making construct that is AI. In recent years, there has been a proliferation in studies using ML for predictive analytics in the context of subsurface energy resources. Consider how the number of papers on ML in the OnePetro database has been increasing exponentially since 1990 (Fig. 1). These trends are also reflected in the number of technical sessions devoted to ML/AI topics in conferences organized by SPE, AAPG, and SEG among others; as wells as books targeted to practitioners in these professions (Holdaway 2014; Mishra and Datta-Gupta 2017; Mohaghegh 2017; Misra et al. 2019). Given these high levels of activity, our goal is to provide some observations and recommendations on the practice of data-driven model building using ML techniques. The observations are motivated by our belief that some geoscientists and petroleum engineers may be jumping the gun by applying these techniques in an ad hoc manner without any foundational understanding, whereas others may be holding off on using these methods because they do not have any formal ML training and could benefit from some concrete advice on the subject. The recommendations are conditioned by our experience in applying both conventional statistical modeling and data analytics approaches to practical problems.


Sign in / Sign up

Export Citation Format

Share Document