scholarly journals How does an incomplete sky coverage affect the Hubble Constant variance?

Author(s):  
Carlos A. P. Bengaly ◽  
Uendert Andrade ◽  
Jailson S. Alcaniz

Abstract We address the $$\simeq 4.4\sigma $$≃4.4σ tension between local and the CMB measurements of the Hubble Constant using simulated Type Ia Supernova (SN) data-sets. We probe its directional dependence by means of a hemispherical comparison through the entire celestial sphere as an estimator of the $$H_0$$H0 cosmic variance. We perform Monte Carlo simulations assuming isotropic and non-uniform distributions of data points, the latter coinciding with the real data. This allows us to incorporate observational features, such as the sample incompleteness, in our estimation. We obtain that this tension can be alleviated to $$3.4\sigma $$3.4σ for isotropic realizations, and $$2.7\sigma $$2.7σ for non-uniform ones. We also find that the $$H_0$$H0 variance is largely reduced if the data-sets are augmented to 4 and 10 times the current size. Future surveys will be able to tell whether the Hubble Constant tension happens due to unaccounted cosmic variance, or whether it is an actual indication of physics beyond the standard cosmological model.

2020 ◽  
pp. 1-20
Author(s):  
G.S. Sharov ◽  
E.S. Sinyakov

We analyze how predictions of cosmological models depend on a choice of described observational data, restrictions on flatness, and how this choice can alleviate the H tension. These effects are demonstrated in the wCDM model in comparison with the standard ΛCDM model. We describe the Pantheon sample observations of Type Ia supernovae, 31 Hubble parameter data points H(z) from cosmic chronometers, the extended sample with 57 H(z) data points and observational manifestations of cosmic microwave background radiation (CMB). For the wCDM and ΛCDM models in the flat case and with spatial curvature, we calculate χfunctions for all observed data in different combinations, estimate optimal values of model parameters and their expected intervals. For both considered models the results essentially depend on a choice of data sets. In particular, for the wCDM model with H(z) data, supernovae and CMB the 1σ estimations may vary from H = 67.52km /(s·Mpc) (for all N = 57 Hubble parameter data points) up to H = 70.87 /(s·Mpc) for the flat case (k = 0) and N = 31. These results might be a hint how to alleviate the problem of H tension: different estimates of the Hubble constant may be connected with filters and a choice of observational data.


2018 ◽  
Vol 11 (2) ◽  
pp. 53-67
Author(s):  
Ajay Kumar ◽  
Shishir Kumar

Several initial center selection algorithms are proposed in the literature for numerical data, but the values of the categorical data are unordered so, these methods are not applicable to a categorical data set. This article investigates the initial center selection process for the categorical data and after that present a new support based initial center selection algorithm. The proposed algorithm measures the weight of unique data points of an attribute with the help of support and then integrates these weights along the rows, to get the support of every row. Further, a data object having the largest support is chosen as an initial center followed by finding other centers that are at the greatest distance from the initially selected center. The quality of the proposed algorithm is compared with the random initial center selection method, Cao's method, Wu method and the method introduced by Khan and Ahmad. Experimental analysis on real data sets shows the effectiveness of the proposed algorithm.


2018 ◽  
Vol 8 (2) ◽  
pp. 377-406
Author(s):  
Almog Lahav ◽  
Ronen Talmon ◽  
Yuval Kluger

Abstract A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored, which is the structure stemming from the relationships between the coordinates. Specifically, we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space. We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan–Meier survival plot.


2006 ◽  
Vol 18 (6) ◽  
pp. 765-771 ◽  
Author(s):  
Haruhisa Okuda ◽  
◽  
Yasuo Kitaaki ◽  
Manabu Hashimoto ◽  
Shun’ichi Kaneko ◽  
...  

This paper presents a novel fast and highly accurate 3-D registration algorithm. The ICP (Iterative Closest Point) algorithm converges all the 3-D data points of two data sets to the best-matching points with minimum evaluation values. This algorithm is in widespread use because it has good validity for many applications, but it extracts a heavy computational cost and is very sensitive to error. This is because it uses all the data points of two data sets and least mean square optimization. We previously proposed the M-ICP algorithm, which uses M-estimation to realize robustness against outlying gross noise with the original ICP algorithm. In this paper, we propose a novel algorithm called HM-ICP (Hierarchical M-ICP), which is an extension of the M-ICP that selects regions for matching and hierarchical searching of selected regions. This method selects regions by evaluating the variance of distance values in the target region, and homogeneous topological mapping. Some fundamental experiments using real data sets of 3-D measurement demonstrate the effectiveness of the proposed method, achieving a reduction of more than ten thousand times for computational costs. We also confirmed an error of less than 0.1% for the measurement distance.


2012 ◽  
Vol 8 (4) ◽  
pp. 82-107 ◽  
Author(s):  
Renxia Wan ◽  
Yuelin Gao ◽  
Caixia Li

Up to now, several algorithms for clustering large data sets have been presented. Most clustering approaches for data sets are the crisp ones, which cannot be well suitable to the fuzzy case. In this paper, the authors explore a single pass approach to fuzzy possibilistic clustering over large data set. The basic idea of the proposed approach (weighted fuzzy-possibilistic c-means, WFPCM) is to use a modified possibilistic c-means (PCM) algorithm to cluster the weighted data points and centroids with one data segment as a unit. Experimental results on both synthetic and real data sets show that WFPCM can save significant memory usage when comparing with the fuzzy c-means (FCM) algorithm and the possibilistic c-means (PCM) algorithm. Furthermore, the proposed algorithm is of an excellent immunity to noise and can avoid splitting or merging the exact clusters into some inaccurate clusters, and ensures the integrity and purity of the natural classes.


2019 ◽  
Vol 8 (2) ◽  
pp. 159
Author(s):  
Morteza Marzjarani

Heteroscedasticity plays an important role in data analysis. In this article, this issue along with a few different approaches for handling heteroscedasticity are presented. First, an iterative weighted least square (IRLS) and an iterative feasible generalized least square (IFGLS) are deployed and proper weights for reducing heteroscedasticity are determined. Next, a new approach for handling heteroscedasticity is introduced. In this approach, through fitting a multiple linear regression (MLR) model or a general linear model (GLM) to a sufficiently large data set, the data is divided into two parts through the inspection of the residuals based on the results of testing for heteroscedasticity, or via simulations. The first part contains the records where the absolute values of the residuals could be assumed small enough to the point that heteroscedasticity would be ignorable. Under this assumption, the error variances are small and close to their neighboring points. Such error variances could be assumed known (but, not necessarily equal).The second or the remaining portion of the said data is categorized as heteroscedastic. Through real data sets, it is concluded that this approach reduces the number of unusual (such as influential) data points suggested for further inspection and more importantly, it will lowers the root MSE (RMSE) resulting in a more robust set of parameter estimates.


1997 ◽  
Vol 9 (8) ◽  
pp. 1805-1842 ◽  
Author(s):  
Marcelo Blatt ◽  
Shai Wiseman ◽  
Eytan Domany

We present a new approach to clustering, based on the physical properties of an inhomogeneous ferromagnet. No assumption is made regarding the underlying distribution of the data. We assign a Potts spin to each data point and introduce an interaction between neighboring points, whose strength is a decreasing function of the distance between the neighbors. This magnetic system exhibits three phases. At very low temperatures, it is completely ordered; all spins are aligned. At very high temperatures, the system does not exhibit any ordering, and in an intermediate regime, clusters of relatively strongly coupled spins become ordered, whereas different clusters remain uncorrelated. This intermediate phase is identified by a jump in the order parameters. The spin-spin correlation function is used to partition the spins and the corresponding data points into clusters. We demonstrate on three synthetic and three real data sets how the method works. Detailed comparison to the performance of other techniques clearly indicates the relative success of our method.


2021 ◽  
Author(s):  
Jakob Raymaekers ◽  
Peter J. Rousseeuw

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.


Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 62
Author(s):  
Zhengwei Liu ◽  
Fukang Zhu

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.


Sign in / Sign up

Export Citation Format

Share Document