scholarly journals Optimization of the Maximum Likelihood Estimator for Determining the Intrinsic Dimensionality of High–Dimensional Data

2015 ◽  
Vol 25 (4) ◽  
pp. 895-913 ◽  
Author(s):  
Rasa Karbauskaitė ◽  
Gintautas Dzemyda

AbstractOne of the problems in the analysis of the set of images of a moving object is to evaluate the degree of freedom of motion and the angle of rotation. Here the intrinsic dimensionality of multidimensional data, characterizing the set of images, can be used. Usually, the image may be represented by a high-dimensional point whose dimensionality depends on the number of pixels in the image. The knowledge of the intrinsic dimensionality of a data set is very useful information in exploratory data analysis, because it is possible to reduce the dimensionality of the data without losing much information. In this paper, the maximum likelihood estimator (MLE) of the intrinsic dimensionality is explored experimentally. In contrast to the previous works, the radius of a hypersphere, which covers neighbours of the analysed points, is fixed instead of the number of the nearest neighbours in the MLE. A way of choosing the radius in this method is proposed. We explore which metric—Euclidean or geodesic—must be evaluated in the MLE algorithm in order to get the true estimate of the intrinsic dimensionality. The MLE method is examined using a number of artificial and real (images) data sets.

2011 ◽  
Vol 16 (4) ◽  
pp. 387-402 ◽  
Author(s):  
Rasa Karbauskaitė ◽  
Gintautas Dzemyda ◽  
Edmundas Mazėtis

While analyzing multidimensional data, we often have to reduce their dimensionality so that to preserve as much information on the analyzed data set as possible. To this end, it is reasonable to find out the intrinsic dimensionality of the data. In this paper, two techniques for the intrinsic dimensionality are analyzed and compared, i.e., the maximum likelihood estimator (MLE) and ISOMAP method. We also propose the way how to get good estimates of the intrinsic dimensionality by the MLE method.


2021 ◽  
Author(s):  
Jakob Raymaekers ◽  
Peter J. Rousseeuw

AbstractMany real data sets contain numerical features (variables) whose distribution is far from normal (Gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box–Cox and Yeo–Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.


2021 ◽  
Author(s):  
Jan Graffelman

AbstractThe geometric series or niche preemption model is an elementary ecological model in biodiversity studies. The preemption parameter of this model is usually estimated by regression or iteratively by using May’s equation. This article proposes a maximum likelihood estimator for the niche preemption model, assuming a known number of species and multinomial sampling. A simulation study shows that the maximum likelihood estimator outperforms the classical estimators in this context in terms of bias and precision. We obtain the distribution of the maximum likelihood estimator and use it to obtain confidence intervals for the preemption parameter and to develop a preemption t test that can address the hypothesis of equal geometric decay in two samples. We illustrate the use of the new estimator with some empirical data sets taken from the literature and provide software for its use.


Author(s):  
A.J. WATKINS ◽  
R.M. GRIFFIN

This paper is concerned with the effect of various data recording errors on the estimation of parameters in models commonly used in the analysis of reliability data. We start by outlining sources of such errors, and propose some modeling strategies that allow these errors into the framework of our analysis. It is then shown that the estimation of model parameters needs to take into account a mis-specified model, with the consequence that any theoretical advantages nominally enjoyed by estimators are reduced. In particular, the results from a series of simulation experiments show that the maximum likelihood estimator is no longer asymptotically unbiased. We next outline an approach that generalizes the usual asymptotic theory to obtain expressions for both the mean and variance of the maximum likelihood estimator in the present framework; these expressions involve both the underlying distribution and parameters controlling the extent to which recording errors are present in a data set. We then link these expressions to results obtained in a series of simulation experiments, and show that this approach accommodates a general formulation of the effect of data recording errors. We conclude with a discussion of the practical consequences of this work.


2004 ◽  
Vol 51 (12) ◽  
pp. 2123-2128 ◽  
Author(s):  
J.C. de Munck ◽  
F. Bijma ◽  
P. Gaura ◽  
C.A. Sieluzycki ◽  
M.I. Branco ◽  
...  

Author(s):  
CHUN-GUANG LI ◽  
JUN GUO ◽  
BO XIAO

In this paper, a novel method to estimate the intrinsic dimensionality of high-dimensional data set is proposed. Based on neighborhood information, our method calculates the non-negative locally linear reconstruction coefficients from its neighbors for each data point, and the numbers of those dominant positive reconstruction coefficients are regarded as a faithful guide to the intrinsic dimensionality of data set. The proposed method requires no parametric assumption on data distribution and is easy to implement in the general framework of manifold learning. Experimental results on several synthesized data sets and real data sets have shown the benefits of the proposed method.


2020 ◽  
Vol 18 (2) ◽  
pp. 2-14
Author(s):  
Ehab A. Mahmood ◽  
Habshah Midi ◽  
Abdul Ghapor Hussin

The Maximum Likelihood Estimator (MLE) was used to estimate unknown parameters of the simple circular regression model. However, it is very sensitive to outliers in data set. A robust method to estimate model parameters is proposed.


Sign in / Sign up

Export Citation Format

Share Document