minimum covariance determinant
Recently Published Documents


TOTAL DOCUMENTS

59
(FIVE YEARS 19)

H-INDEX

10
(FIVE YEARS 1)

2021 ◽  
Vol 2123 (1) ◽  
pp. 012021
Author(s):  
La Gubu ◽  
Dedi Rosadi ◽  
Abdurakhman

Abstract This paper shows how to create a robust portfolio selection with time series clustering by using some dissimilarity measure. Based on such dissimilarity measures, stocks are initially sorted into multiple clusters using the Partitioning Around Medoids (PAM) time series clustering approach. Following clustering, a portfolio is constructed by selecting one stock from each cluster. Stocks having the greatest Sharpe ratio are selected from each cluster. The optimum portfolio is then constructed using the robust Fast Minimum Covariance Determinant (FMCD) and robust S MV portfolio model. When there are a big number of stocks accessible for the portfolio formation process, we can use this approach to quickly generate the optimum portfolio. This approach is also resistant to the presence of any outliers in the data. The Sharpe ratio was used to evaluate the performance of the portfolios that were created. The daily closing price of stocks listed on the Indonesia Stock Exchange, which are included in the LQ-45 indexed from August 2017 to July 2018, was utilized as a case study. Empirical study revealed that portfolios constructed using PAM time series clustering with autocorrelation dissimilarity and a robust FMCD MV portfolio model outperformed portfolios created using other approaches.


2021 ◽  
Vol 3 (2) ◽  
pp. 36-64
Author(s):  
Sharifah Sakinah Syed Abd Mutalib ◽  
Siti Zanariah Satari ◽  
Wan Nur Syahidah Wan Yusoff

In multivariate data, outliers are difficult to detect especially when the dimension of the data increase. Mahalanobis distance (MD) has been one of the classical methods to detect outliers for multivariate data. However, the classical mean and covariance matrix in MD suffered from masking and swamping effects if the data contain outliers. Due to this problem, many studies used a robust estimator instead of the classical estimator of mean and covariance matrix. In this study, the performance of five robust estimators namely Fast Minimum Covariance Determinant (FMCD), Minimum Vector Variance (MVV), Covariance Matrix Equality (CME), Index Set Equality (ISE), and Test on Covariance (TOC) are investigated and compared. FMCD has been widely used and is known as among the best robust estimator. However, there are certain conditions that FMCD still lacks. MVV, CME, ISE and TOC are innovative of FMCD. These four robust estimators improve the last step of the FMCD algorithm. Hence, the objective of this study is to observe the performance of these five estimator to detect outliers in multivariate data particularly TOC as TOC is the latest robust estimator. Simulation studies are conducted for two outlier scenarios with various conditions. There are three performance measures, which are pout, pmask and pswamp used to measure the performance of the robust estimators. It is found that the TOC gives better performance in pswamp for most conditions. TOC gives better results for pout and pmask for certain conditions.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Usman Shahzad ◽  
Nadia H. Al-Noor ◽  
Noureen Afshan ◽  
David Anekeya Alilah ◽  
Muhammad Hanif ◽  
...  

Robust regression tools are commonly used to develop regression-type ratio estimators with traditional measures of location whenever data are contaminated with outliers. Recently, the researchers extended this idea and developed regression-type ratio estimators through robust minimum covariance determinant (MCD) estimation. In this study, the quantile regression with MCD-based measures of location is utilized and a class of quantile regression-type mean estimators is proposed. The mean squared errors (MSEs) of the proposed estimators are also obtained. The proposed estimators are compared with the reviewed class of estimators through a simulation study. We also incorporated two real-life applications. To assess the presence of outliers in these real-life applications, the Dixon chi-squared test is used. It is found that the quantile regression estimators are performing better as compared to some existing estimators.


Author(s):  
Jing Jin ◽  
Hua Fang ◽  
Ian Daly ◽  
Ruocheng Xiao ◽  
Yangyang Miao ◽  
...  

The common spatial patterns (CSP) algorithm is one of the most frequently used and effective spatial filtering methods for extracting relevant features for use in motor imagery brain–computer interfaces (MI-BCIs). However, the inherent defect of the traditional CSP algorithm is that it is highly sensitive to potential outliers, which adversely affects its performance in practical applications. In this work, we propose a novel feature optimization and outlier detection method for the CSP algorithm. Specifically, we use the minimum covariance determinant (MCD) to detect and remove outliers in the dataset, then we use the Fisher score to evaluate and select features. In addition, in order to prevent the emergence of new outliers, we propose an iterative minimum covariance determinant (IMCD) algorithm. We evaluate our proposed algorithm in terms of iteration times, classification accuracy and feature distribution using two BCI competition datasets. The experimental results show that the average classification performance of our proposed method is 12% and 22.9% higher than that of the traditional CSP method in two datasets ([Formula: see text]), and our proposed method obtains better performance in comparison with other competing methods. The results show that our method improves the performance of MI-BCI systems.


PeerJ ◽  
2021 ◽  
Vol 9 ◽  
pp. e11436
Author(s):  
Thomas R. Etherington

The Mahalanobis distance is a statistical technique that has been used in statistics and data science for data classification and outlier detection, and in ecology to quantify species-environment relationships in habitat and ecological niche models. Mahalanobis distances are based on the location and scatter of a multivariate normal distribution, and can measure how distant any point in space is from the centre of this kind of distribution. Three different methods for calculating the multivariate location and scatter are commonly used: the sample mean and variance-covariance, the minimum covariance determinant, and the minimum volume ellipsoid. The minimum covariance determinant and minimum volume ellipsoid were developed to be robust to outliers by minimising the multivariate location and scatter for a subset of the full sample, with the proportion of the full sample forming the subset being controlled by a user-defined parameter. This outlier robustness means the minimum covariance determinant and the minimum volume ellipsoid are highly relevant for ecological niche analyses, which are usually based on natural history observations that are likely to contain errors. However, natural history observations will also contain extreme bias, to which the minimum covariance determinant and the minimum volume ellipsoid will also be sensitive. To provide guidance for selecting and parameterising a multivariate location and scatter method, a series of virtual ecological niche modelling experiments were conducted to demonstrate the performance of each multivariate location and scatter method under different levels of sample size, errors, and bias. The results show that there is no optimal modelling approach, and that choices need to be made based on the individual data and question. The sample mean and variance-covariance method will perform best on very small sample sizes if the data are free of error and bias. At larger sample sizes the minimum covariance determinant and minimum volume ellipsoid methods perform as well or better, but only if they are appropriately parameterised. Modellers who are more concerned about the prevalence of errors should retain a smaller proportion of the full data set, while modellers more concerned about the prevalence of bias should retain a larger proportion of the full data set. I conclude that Mahalanobis distances are a useful niche modelling technique, but only for questions relating to the fundamental niche of a species where the assumption of multivariate normality is reasonable. Users of the minimum covariance determinant and minimum volume ellipsoid methods must also clearly report their parameterisations so that the results can be interpreted correctly.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mohammad Tabatabai ◽  
Stephanie Bailey ◽  
Zoran Bursac ◽  
Habib Tabatabai ◽  
Derek Wilus ◽  
...  

Abstract Background The most common measure of association between two continuous variables is the Pearson correlation (Maronna et al. in Safari an OMC. Robust statistics, 2019. https://login.proxy.bib.uottawa.ca/login?url=https://learning.oreilly.com/library/view/-/9781119214687/?ar&orpq&email=^u). When outliers are present, Pearson does not accurately measure association and robust measures are needed. This article introduces three new robust measures of correlation: Taba (T), TabWil (TW), and TabWil rank (TWR). The correlation estimators T and TW measure a linear association between two continuous or ordinal variables; whereas TWR measures a monotonic association. The robustness of these proposed measures in comparison with Pearson (P), Spearman (S), Quadrant (Q), Median (M), and Minimum Covariance Determinant (MCD) are examined through simulation. Taba distance is used to analyze genes, and statistical tests were used to identify those genes most significantly associated with Williams Syndrome (WS). Results Based on the root mean square error (RMSE) and bias, the three proposed correlation measures are highly competitive when compared to classical measures such as P and S as well as robust measures such as Q, M, and MCD. Our findings indicate TBL2 was the most significant gene among patients diagnosed with WS and had the most significant reduction in gene expression level when compared with control (P value = 6.37E-05). Conclusions Overall, when the distribution is bivariate Log-Normal or bivariate Weibull, TWR performs best in terms of bias and T performs best with respect to RMSE. Under the Normal distribution, MCD performs well with respect to bias and RMSE; but TW, TWR, T, S, and P correlations were in close proximity. The identification of TBL2 may serve as a diagnostic tool for WS patients. A Taba R package has been developed and is available for use to perform all necessary computations for the proposed methods.


2021 ◽  
Vol 37 (1) ◽  
pp. 97-119
Author(s):  
Jiayun Jin ◽  
Geert Loosveldt

Abstract When monitoring industrial processes, a Statistical Process Control tool, such as a multivariate Hotelling T 2 chart is frequently used to evaluate multiple quality characteristics. However, research into the use of T 2 charts for survey fieldwork–essentially a production process in which data sets collected by means of interviews are produced–has been scant to date. In this study, using data from the eighth round of the European Social Survey in Belgium, we present a procedure for simultaneously monitoring six response quality indicators and identifying outliers: interviews with anomalous results. The procedure integrates Kernel Density Estimation (KDE) with a T 2 chart, so that historical “in-control” data or reference to the assumption of a parametric distribution of the indicators is not required. In total, 75 outliers (4.25%) are iteratively removed, resulting in an in-control data set containing 1,691 interviews. The outliers are mainly characterized by having longer sequences of identical answers, a greater number of extreme answers, and against expectation, a lower item nonresponse rate. The procedure is validated by means of ten-fold cross-validation and comparison with the minimum covariance determinant algorithm as the criterion. By providing a method of obtaining in-control data, the present findings go some way toward a way to monitor response quality, identify problems, and provide rapid feedbacks during survey fieldwork.


Entropy ◽  
2020 ◽  
Vol 23 (1) ◽  
pp. 33
Author(s):  
Edmore Ranganai ◽  
Innocent Mudhombo

The importance of variable selection and regularization procedures in multiple regression analysis cannot be overemphasized. These procedures are adversely affected by predictor space data aberrations as well as outliers in the response space. To counter the latter, robust statistical procedures such as quantile regression which generalizes the well-known least absolute deviation procedure to all quantile levels have been proposed in the literature. Quantile regression is robust to response variable outliers but very susceptible to outliers in the predictor space (high leverage points) which may alter the eigen-structure of the predictor matrix. High leverage points that alter the eigen-structure of the predictor matrix by creating or hiding collinearity are referred to as collinearity influential points. In this paper, we suggest generalizing the penalized weighted least absolute deviation to all quantile levels, i.e., to penalized weighted quantile regression using the RIDGE, LASSO, and elastic net penalties as a remedy against collinearity influential points and high leverage points in general. To maintain robustness, we make use of very robust weights based on the computationally intensive high breakdown minimum covariance determinant. Simulations and applications to well-known data sets from the literature show an improvement in variable selection and regularization due to the robust weighting formulation.


Sign in / Sign up

Export Citation Format

Share Document