scholarly journals Calibration Approach Product Type Estimators of Population Mean in Stratified Sampling with Single Constraint: A Comparison of Three Distance Measures

Author(s):  
Enang, Ekaette Inyang ◽  
Ojua, Doris Nkan ◽  
T. T. Ojewale

This study employed the method of calibration on product type estimator to propose calibration product type estimators using three distance measures namely; chi-square distance measure, the minimum entropy distance measure and the modified chi-square distance measure for single constraint. The estimators of variances of the proposed estimators were also obtained. An empirical study to ascertain the performance of these estimators was carried out using real life and stimulated data set. The result with the real life data showed that the proposed calibration product type estimator  produced better estimates of the population mean  compared to   and . Results from the simulation study showed that the proposed calibration product type estimators had a high gain in efficiency as compared to the product type estimator. The simulation result also showed that the proposed estimators were more consistent and reliable under the Gamma and Exponential distributions with the exponential distribution taking the lead. The conventional product type estimator however was found to be better if the underlying distributional assumption is normal in nature.

Author(s):  
D. N. Ojua ◽  
J. A. Abuchu ◽  
E. O. Ojua ◽  
E. I. Enang

Calibration approach adjusts the original design weights by incorporating an auxiliary variable into it, to make the estimator be in the form of a regression estimator. This method was employed to propose calibration product type estimators using three distance measures namely; chi-square distance measure, the minimum entropy distance measure and the modified chi-square distance measure using double constraints. The estimators of variances of the proposed estimators were also obtained. An empirical study to ascertain the performance of these estimators using a secondary data set and simulated data under underlying distributional assumptions of Gamma, Normal and Exponential distributions with varying sample sizes of 10%, 15%, 20% and 25% were carried out. The result with the real life data showed that the calibration product type estimator from chi-square distance measure estimated the population mean with minimum bias than and obtained from the other distance measures. The result from real life data also revealed that the estimator obtained from chi-square distance measure under two constraints was more efficient than the other three estimators. The result from simulation studies showed that the proposed calibration product type estimators outperform the conventional product type estimator in term of efficiency, consistency and reliability under the Gamma and Exponential distributions with the exponential distribution taking the lead. The conventional product type estimator however was found to be better under normal distribution. It was also observed that as sample size increases there was no significant change in the performance of these proposed estimators which justifies the preference with small sample size.


2021 ◽  
pp. 58-60
Author(s):  
Naziru Fadisanku Haruna ◽  
Ran Vijay Kumar Singh ◽  
Samsudeen Dahiru

In This paper a modied ratio-type estimator for nite population mean under stratied random sampling using single auxiliary variable has been proposed. The expression for mean square error and bias of the proposed estimator are derived up to the rst order of approximation. The expression for minimum mean square error of proposed estimator is also obtained. The mean square error the proposed estimator is compared with other existing estimators theoretically and condition are obtained under which proposed estimator performed better. A real life population data set has been considered to compare the efciency of the proposed estimator numerically.


2021 ◽  
Vol 19 (1) ◽  
pp. 2-20
Author(s):  
Piyush Kant Rai ◽  
Alka Singh ◽  
Muhammad Qasim

This article introduces calibration estimators under different distance measures based on two auxiliary variables in stratified sampling. The theory of the calibration estimator is presented. The calibrated weights based on different distance functions are also derived. A simulation study has been carried out to judge the performance of the proposed estimators based on the minimum relative root mean squared error criterion. A real-life data set is also used to confirm the supremacy of the proposed method.


SPE Journal ◽  
2021 ◽  
pp. 1-25
Author(s):  
Chang Gao ◽  
Juliana Y. Leung

Summary The steam-assisted gravity drainage (SAGD) recovery process is strongly impacted by the spatial distributions of heterogeneous shale barriers. Though detailed compositional flow simulators are available for SAGD recovery performance evaluation, the simulation process is usually quite computationally demanding, rendering their use over a large number of reservoir models for assessing the impacts of heterogeneity (uncertainties) to be impractical. In recent years, data-driven proxies have been widely proposed to reduce the computational effort; nevertheless, the proxy must be trained using a large data set consisting of many flow simulation cases that are ideally spanning the model parameter spaces. The question remains: is there a more efficient way to screen a large number of heterogeneous SAGD models? Such techniques could help to construct a training data set with less redundancy; they can also be used to quickly identify a subset of heterogeneous models for detailed flow simulation. In this work, we formulated two particular distance measures, flow-based and static-based, to quantify the similarity among a set of 3D heterogeneous SAGD models. First, to formulate the flow-based distance measure, a physics-basedparticle-tracking model is used: Darcy’s law and energy balance are integrated to mimic the steam chamber expansion process; steam particles that are located at the edge of the chamber would release their energy to the surrounding cold bitumen, while detailed fluid displacements are not explicitly simulated. The steam chamber evolution is modeled, and a flow-based distance between two given reservoir models is defined as the difference in their chamber sizes over time. Second, to formulate the static-based distance, the Hausdorff distance (Hausdorff 1914) is used: it is often used in image processing to compare two images according to their corresponding spatial arrangement and shapes of various objects. A suite of 3D models is constructed using representative petrophysical properties and operating constraints extracted from several pads in Suncor Energy’s Firebag project. The computed distance measures are used to partition the models into different groups. To establish a baseline for comparison, flow simulations are performed on these models to predict the actual chamber evolution and production profiles. The grouping results according to the proposed flow- and static-based distance measures match reasonably well to those obtained from detailed flow simulations. Significant improvement in computational efficiency is achieved with the proposed techniques. They can be used to efficiently screen a large number of reservoir models and facilitate the clustering of these models into groups with distinct shale heterogeneity characteristics. It presents a significant potential to be integrated with other data-driven approaches for reducing the computational load typically associated with detailed flow simulations involving multiple heterogeneous reservoir realizations.


2019 ◽  
Vol 59 (4) ◽  
pp. 722-741 ◽  
Author(s):  
Paul Phillips ◽  
Nuno Antonio ◽  
Ana de Almeida ◽  
Luís Nunes

This study examines the relationship between distance measures and a Portuguese data set consisting of 34,622 online hotel reviews extracted from Booking.com and TripAdvisor written in Portuguese, Spanish, and English. Based on the country of origin of each review author, a geographic and a psychic distance measure is calculated for Portugal. Data and text mining analysis provides additional insights into online hotel ratings. The authors confirm that online travelers’ evaluations are multifaceted constructs displaying varying patterns of rating behavior among the traveler base. By investigating the contemporary relevance of geographic and psychic distance, a key finding of this study is that travelers with less distance both in terms of psychic and geographic distance give a lower rating score than travelers with greater distance. The inclusion of psychic and geographic distance is advocated as a salient aspect for future researchers and for those practitioners who wish to enhance hotel product and service features.


Author(s):  
Abdul Haseeb Ganie ◽  
Surender Singh

AbstractPicture fuzzy set (PFS) is a direct generalization of the fuzzy sets (FSs) and intuitionistic fuzzy sets (IFSs). The concept of PFS is suitable to model the situations that involve more answers of the type yes, no, abstain, and refuse. In this study, we introduce a novel picture fuzzy (PF) distance measure on the basis of direct operation on the functions of membership, non-membership, neutrality, refusal, and the upper bound of the function of membership of two PFSs. We contrast the proposed PF distance measure with the existing PF distance measures and discuss the advantages in the pattern classification problems. The application of fuzzy and non-standard fuzzy models in the real data is very challenging as real data is always found in crisp form. Here, we also derive some conversion formulae to apply proposed method in the real data set. Moreover, we introduce a new multi-attribute decision-making (MADM) method using the proposed PF distance measure. In addition, we justify necessity of the newly proposed MADM method using appropriate counterintuitive examples. Finally, we contrast the performance of the proposed MADM method with the classical MADM methods in the PF environment.


2019 ◽  
Vol 2019 ◽  
pp. 1-21 ◽  
Author(s):  
Cong Liu ◽  
Qianqian Chen ◽  
Yingxia Chen ◽  
Jie Liu

Most of the existing clustering algorithms are often based on Euclidean distance measure. However, only using Euclidean distance measure may not be sufficient enough to partition a dataset with different structures. Thus, it is necessary to combine multiple distance measures into clustering. However, the weights for different distance measures are hard to set. Accordingly, it appears natural to keep multiple distance measures separately and to optimize them simultaneously by applying a multiobjective optimization technique. Recently a new clustering algorithm called ‘multiobjective evolutionary clustering based on combining multiple distance measures’ (MOECDM) was proposed to integrate Euclidean and Path distance measures together for partitioning the dataset with different structures. However, it is time-consuming due to the large-sized genes. This paper proposes a fast multiobjective fuzzy clustering algorithm for partitioning the dataset with different structures. In this algorithm, a real encoding scheme is adopted to represent the individual. Two fuzzy clustering objective functions are designed based on Euclidean and Path distance measures, respectively, to evaluate the goodness of each individual. An improved evolutionary operator is also introduced accordingly to increase the convergence speed and the diversity of the population. In the final generation, a set of nondominated solutions can be obtained. The best solution and the best distance measure are selected by using a semisupervised method. Afterwards, an updated algorithm is also designed to detect the optimal cluster number automatically. The proposed algorithms are applied to many datasets with different structures, and the results of eight artificial and six real-life datasets are shown in experiments. Experimental results have shown that the proposed algorithms can not only successfully partition the dataset with different structures, but also reduce the computational cost.


2022 ◽  
Vol 2022 ◽  
pp. 1-13
Author(s):  
Asad Ali ◽  
Muhammad Moeen Butt ◽  
Muhammad Zubair

Estimation of population mean of study variable Y suffers loss of precision in the presence of high variation in the data set. The use of auxiliary information incorporated in construction of an estimator under ranked set sampling scheme results in efficient estimation of population mean. In this paper, we propose an efficient generalized chain regression-cum-chain ratio type estimator to estimate finite population mean of study variable under stratified extreme-cum-median ranked set sampling utilizing information on two auxiliary variables. Mean square error (MSE) of the proposed generalized estimator is derived up to first order of approximation. The applications of the proposed estimator under symmetrical and asymmetrical probability distributions are discussed using simulation study and real-life data set for comparisons of efficiency. It is concluded that the proposed generalized estimator performs efficiently as compared to some existing estimators. It is also observed that the efficiency of the proposed estimator is directly proportional to the correlations between the study variable and its auxiliary variables.


Author(s):  
Srinjoy Das ◽  
Hrushikesh N. Mhaskar ◽  
Alexander Cloninger

This paper introduces kdiff, a novel kernel-based measure for estimating distances between instances of time series, random fields and other forms of structured data. This measure is based on the idea of matching distributions that only overlap over a portion of their region of support. Our proposed measure is inspired by MPdist which has been previously proposed for such datasets and is constructed using Euclidean metrics, whereas kdiff is constructed using non-linear kernel distances. Also, kdiff accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution. Comparing the cross similarity to self similarity allows for measures of similarity that are more robust to noise and partial occlusions of the relevant signals. Our proposed measure kdiff is a more general form of the well known kernel-based Maximum Mean Discrepancy distance estimated over the embeddings. Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems where the embedding distributions can be modeled as two component mixtures. Applications are demonstrated for clustering of synthetic and real-life time series and image data, and the performance of kdiff is compared to competing distance measures for clustering.


Stats ◽  
2021 ◽  
Vol 4 (2) ◽  
pp. 419-453
Author(s):  
Alex Ely Kossovsky

Benford’s Law predicts that the first significant digit on the leftmost side of numbers in real-life data is distributed between all possible 1 to 9 digits approximately as in LOG(1 + 1/digit), so that low digits occur much more frequently than high digits in the first place. Typically researchers, data analysts, and statisticians, rush to apply the chi-square test in order to verify compliance or deviation from this statistical law. In almost all cases of real-life data this approach is mistaken and without mathematical-statistics basis, yet it had become a dogma or rather an impulsive ritual in the field of Benford’s Law to apply the chi-square test for whatever data set the researcher is considering, regardless of its true applicability. The mistaken use of the chi-square test has led to much confusion and many errors, and has done a lot in general to undermine trust and confidence in the whole discipline of Benford’s Law. This article is an attempt to correct course and bring rationality and order to a field which had demonstrated harmony and consistency in all of its results, manifestations, and explanations. The first research question of this article demonstrates that real-life data sets typically do not arise from random and independent selections of data points from some larger universe of parental data as the chi-square approach supposes, and this conclusion is arrived at by examining how several real-life data sets are formed and obtained. The second research question demonstrates that the chi-square approach is actually all about the reasonableness of the random selection process and the Benford status of that parental universe of data and not solely about the Benford status of the data set under consideration, since the focus of the chi-square test is exclusively on whether the entire process of data selection was probable or too rare. In addition, a comparison of the chi-square statistic with the Sum of Squared Deviations (SSD) measure of distance from Benford is explored in this article, pitting one measure against the other, and concluding with a strong preference for the SSD measure.


Sign in / Sign up

Export Citation Format

Share Document