scholarly journals MAHALANOBIS DISTANCE AND ITS APPLICATION FOR DETECTING MULTIVARIATE OUTLIERS

Author(s):  
Hamid Ghorbani

While methods of detecting outliers is frequently implemented by statisticians when analyzing univariate data, identifying outliers in multivariate data pose challenges that univariate data do not. In this paper, after short reviewing some tools for univariate outliers detection, the Mahalanobis distance, as a famous multivariate statistical distances, and its ability to detect multivariate outliers are discussed. As an application the univariate and multivariate outliers of a real data set has been detected using R software environment for statistical computing.

Mathematics ◽  
2019 ◽  
Vol 7 (8) ◽  
pp. 665 ◽  
Author(s):  
Hao Ming ◽  
JinRong Wang ◽  
Michal Fečkan

In this paper, we apply Caputo-type fractional order calculus to simulate China’s gross domestic product (GDP) growth based on R software, which is a free software environment for statistical computing and graphics. Moreover, we compare the results for the fractional model with the integer order model. In addition, we show the importance of variables according to the BIC criterion. The study shows that Caputo fractional order calculus can produce a better model and perform more accurately in predicting the GDP values from 2012–2016.


2019 ◽  
Vol 12 (3) ◽  
pp. 205979911988428 ◽  
Author(s):  
Josh Lospinoso ◽  
Tom AB Snijders

We propose a Mahalanobis distance–based Monte Carlo goodness of fit testing procedure for the family of stochastic actor-oriented models for social network evolution. A modified model distance estimator is proposed to help the researcher identify model extensions that will remediate poor fit. A limited simulation study is provided to establish baseline legitimacy for the Mahalanobis distance–based Monte Carlo test and modified model distance estimator. A forward model selection workflow is proposed, and this procedure is demonstrated on a real data set.


2019 ◽  
Vol 21 (2) ◽  
pp. 114-121
Author(s):  
A A Korneenkov ◽  
S G Kuzmin ◽  
V B Dergachev ◽  
D N Borisov

A methodology is presented for developing nomograms for assessing and stratifying the risk of a clinical outcome based on the created virtual data set using the R software environment. The virtual data set included input numerical and factor variables (variable types correspond to the R software documentation) and outcome. For quantitative variables, descriptive statistics were calculated at all levels of the outcome variable, and mosaic diagrams were constructed for factor variables. As a model that describes the association of input variables with the outcome, a logistic regression model was used. A bootstrap method was applied to validate and evaluate the model performance. The calculated validity indicators showed an acceptable discriminatory ability of the predictive model. The statistical calibration demonstrated the proximity of the model’s calibration curve to the ideal calibration curve. Based on the logistic regression coefficients, a nomogram was constructed using which the risk value of a specific outcome was calculated for each subject (patient). It is shown that with the help of the presented technique it is possible to stratify patients effectively by the risk of an adverse outcome, thus adequately altering the diagnosis and treatment tactics. The use of a nomogram greatly simplifies risk assessment and can be used in paper form as a supplement to the patient examination protocol. The article contains the codes of the R programming language with explanations.


2017 ◽  
Vol 27 (4) ◽  
pp. 291 ◽  
Author(s):  
Pham Ngoc Son ◽  
Cao Dong Vu ◽  
Mai Quynh Anh

This report introduces a new computer program, having been developed initially at the Nuclear Research Institute at Dalat, for the multivariate data analysis techniques. In this preliminary version of the program, the size of a given data set to be analyzed is up to 50 variables and thousand observations, and can be used to perform some of the multivariate data analysis techniques such as principle component analysis, cluster analysis and data standardization. In comparison with other statistical analysis software, the same results are highly reproduced with MSAP.


2016 ◽  
Vol 47 (2) ◽  
pp. 207-239 ◽  
Author(s):  
Aurea Grané ◽  
Rosario Romera

Survey data are usually of mixed type (quantitative, multistate categorical, and/or binary variables). Multidimensional scaling (MDS) is one of the most extended methodologies to visualize the profile structure of the data. Since the past 60s, MDS methods have been introduced in the literature, initially in publications in the psychometrics area. Nevertheless, sensitivity and robustness of MDS configurations have been topics scarcely addressed in the specialized literature. In this work, we are interested in the construction of robust profiles for mixed-type data using a proper MDS configuration. To this end, we propose to compare different MDS configurations (coming from different metrics) through a combination of sensitivity and robust analysis. In particular, as an alternative to classical Gower’s metric, we propose a robust joint metric combining different distance matrices, avoiding redundant information, via related metric scaling. The search for robustness and identification of outliers is done through a distance-based procedure related to geometric variability notions. In this sense, we propose a statistic for detecting multivariate outliers in the context of mixed-type data and evaluate its performance through a simulation study. Finally, we apply these techniques to a real data set provided by the largest humanitarian organization involved in social programs in Spain, where we are able to find in a robust way the most relevant factors defining the profiles of people that were under risk of being socially excluded in the beginning of the 2008 economic crisis.


Author(s):  
Elizabeth A. Cudney ◽  
Kenneth M. Ragsdell ◽  
Kioumars Paryani

The Mahalanobis Taguchi System is a diagnosis and forecasting method for multivariate data. Mahalanobis distance is a measure based on correlations between the variables and different patterns that can be identified and analyzed with respect to a base or reference group. The Mahalanobis-Taguchi System is of interest because of its reported accuracy in forecasting small, correlated data sets. This is the type of data that is encountered with consumer vehicle ratings. MTS enables a reduction in dimensionality and the ability to develop a scale based on MD values. MTS identifies a set of useful variables from the complete data set with equivalent correlation and considerably less time and data. This paper presents the application of the Mahalanobis-Taguchi System and its application to identify a reduced set of useful variables in multidimensional systems.


2013 ◽  
Vol 2013 ◽  
pp. 1-20 ◽  
Author(s):  
F. Hosseinzadeh Lotfi ◽  
G. R. Jahanshahloo ◽  
M. Khodabakhshi ◽  
M. Rostamy-Malkhlifeh ◽  
Z. Moghaddas ◽  
...  

In the course of improving various abilities of data envelopment analysis (DEA) models, many investigations have been carried out for ranking decision-making units (DMUs). This is an important issue both in theory and practice. There exist a variety of papers which apply different ranking methods to a real data set. Here the ranking methods are divided into seven groups. As each of the existing methods can be viewed from different aspects, it is possible that somewhat these groups have an overlapping with the others. The first group conducts the evaluation by a cross-efficiency matrix where the units are self- and peer-evaluated. In the second one, the ranking units are based on the optimal weights obtained from multiplier model of DEA technique. In the third group, super-efficiency methods are dealt with which are based on the idea of excluding the unit under evaluation and analyzing the changes of frontier. The fourth group involves methods based on benchmarking, which adopts the idea of being a useful target for the inefficient units. The fourth group uses the multivariate statistical techniques, usually applied after conducting the DEA classification. The fifth research area ranks inefficient units through proportional measures of inefficiency. The sixth approach involves multiple-criteria decision methodologies with the DEA technique. In the last group, some different methods of ranking units are mentioned.


2020 ◽  
Vol 16 (2) ◽  
pp. 51-66
Author(s):  
A. Hassan ◽  
S. A. Dar ◽  
P. B. Ahmad ◽  
B. A. Para

AbstractIn this paper, we introduce a new generalization of Aradhana distribution called as Weighted Aradhana Distribution (WID). The statistical properties of this distribution are derived and the model parameters are estimated by maximum likelihood estimation. Simulation study of ML estimates of the parameters is carried out in R software. Finally, an application to real data set is presented to examine the significance of newly introduced model.


Author(s):  
Elizabeth A. Cudney ◽  
Kioumars Paryani ◽  
Kenneth M. Ragsdell

The Mahalanobis Taguchi System (MTS) is a diagnosis and forecasting method for multivariate data. Mahalanobis Distance (MD) is a measure based on correlations between the variables and different patterns that can be identified and analyzed with respect to a base or reference group. The MTS is of interest because of its reported accuracy in forecasting from small, correlated data sets. This is the type of data that is encountered with consumer vehicle ratings. MTS enables a reduction in dimensionality and the ability to develop a scale based on MD values. MTS identifies a set of useful variables from the complete data set with equivalent correlation and considerably less time and data. This paper presents the application of the MTS, its applicability in identifying a reduced set of useful variables in multidimensional systems.


Author(s):  
Michael schatz ◽  
Joachim Jäger ◽  
Marin van Heel

Lumbricus terrestris erythrocruorin is a giant oxygen-transporting macromolecule in the blood of the common earth worm (worm "hemoglobin"). In our current study, we use specimens (kindly provided by Drs W.E. Royer and W.A. Hendrickson) embedded in vitreous ice (1) to avoid artefacts encountered with the negative stain preparation technigue used in previous studies (2-4).Although the molecular structure is well preserved in vitreous ice, the low contrast and high noise level in the micrographs represent a serious problem in image interpretation. Moreover, the molecules can exhibit many different orientations relative to the object plane of the microscope in this type of preparation. Existing techniques of analysis requiring alignment of the molecular views relative to one or more reference images often thus yield unsatisfactory results.We use a new method in which first rotation-, translation- and mirror invariant functions (5) are derived from the large set of input images, which functions are subsequently classified automatically using multivariate statistical techniques (6). The different molecular views in the data set can therewith be found unbiasedly (5). Within each class, all images are aligned relative to that member of the class which contributes least to the classes′ internal variance (6). This reference image is thus the most typical member of the class. Finally the aligned images from each class are averaged resulting in molecular views with enhanced statistical resolution.


Sign in / Sign up

Export Citation Format

Share Document