scholarly journals Handling Cellwise Outliers by Sparse Regression and Robust Covariance

Author(s):  
Jakob Raymaekers ◽  
Peter Rousseeuw

We propose a data-analytic method for detecting cellwise outliers. Given a robust covariance matrix, outlying cells (entries) in a row are found by the cellFlagger technique which combines lasso regression with a stepwise application of constructed cutoff values. The penalty term of the lasso has a physical interpretation as the total distance that suspicious cells need to move in order to bring their row into the fold. For estimating a cellwise robust covariance matrix we construct a detection-imputation method which alternates between flagging outlying cells and updating the covariance matrix as in the EM algorithm. The proposed methods are illustrated by simulations and on real data about volatile organic compounds in children.

Symmetry ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2164
Author(s):  
Héctor J. Gómez ◽  
Diego I. Gallardo ◽  
Karol I. Santoro

In this paper, we present an extension of the truncated positive normal (TPN) distribution to model positive data with a high kurtosis. The new model is defined as the quotient between two random variables: the TPN distribution (numerator) and the power of a standard uniform distribution (denominator). The resulting model has greater kurtosis than the TPN distribution. We studied some properties of the distribution, such as moments, asymmetry, and kurtosis. Parameter estimation is based on the moments method, and maximum likelihood estimation uses the expectation-maximization algorithm. We performed some simulation studies to assess the recovery parameters and illustrate the model with a real data application related to body weight. The computational implementation of this work was included in the tpn package of the R software.


Geophysics ◽  
2014 ◽  
Vol 79 (1) ◽  
pp. V1-V11 ◽  
Author(s):  
Amr Ibrahim ◽  
Mauricio D. Sacchi

We adopted the robust Radon transform to eliminate erratic incoherent noise that arises in common receiver gathers when simultaneous source data are acquired. The proposed robust Radon transform was posed as an inverse problem using an [Formula: see text] misfit that is not sensitive to erratic noise. The latter permitted us to design Radon algorithms that are capable of eliminating incoherent noise in common receiver gathers. We also compared nonrobust and robust Radon transforms that are implemented via a quadratic ([Formula: see text]) or a sparse ([Formula: see text]) penalty term in the cost function. The results demonstrated the importance of incorporating a robust misfit functional in the Radon transform to cope with simultaneous source interferences. Synthetic and real data examples proved that the robust Radon transform produces more accurate data estimates than least-squares and sparse Radon transforms.


2016 ◽  
Vol 4 (2) ◽  
pp. 92
Author(s):  
I Gusti Ngurah Fredi Firawan ◽  
Ida Bagus Suryawan

Nungnung Waterfal is located in Pelaga Village, Petang District, Badung Regency. Nungnung Waterfal have several potentials that can be developed into a natural taourist attraction. It is the researchers wanted to know the potential of what is owned by NungnungWaterfall that can be developed into a tourist attraction. Types of data and data sources used are the data Qualitative, Quantitatif, primary data and secondary data. Collection data by Observasi, interviews, library, and using purpose sampling method, data analytic method using qualitative descriptive that applies the facts found in the field. Nungnung Waterfallhas the potential of natural and artificial potential that could be developed into a tourist attraction. Natural potential possessed NungnungWaterfallis landscapes, mountains, waterfalls, and forests. As for the potential of artificial owned by Nungnung Waterfallnamely supporting facilities including a gazebo for resting place for tourists and take pictures in the area of Nungnung Waterfall.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
J.-F. Degurse ◽  
L. Savy ◽  
S. Marcos ◽  
J.-Ph. Molinié

Classical space-time adaptive processing (STAP) detectors are strongly limited when facing highly heterogeneous environments. Indeed, in this case, representative target free data are no longer available. Single dataset algorithms, such as the MLED algorithm, have proved their efficiency in overcoming this problem by only working on primary data. These methods are based on the APES algorithm which removes the useful signal from the covariance matrix. However, a small part of the clutter signal is also removed from the covariance matrix in this operation. Consequently, a degradation of clutter rejection performance is observed. We propose two algorithms that use deterministic aided STAP to overcome this issue of the single dataset APES method. The results on realistic simulated data and real data show that these methods outperform traditional single dataset methods in detection and in clutter rejection.


2015 ◽  
Author(s):  
Huan Li ◽  
Xuli Zhu ◽  
Ke Mao ◽  
Rongling Wu ◽  
Qin Yan

Despite their pivotal role in agriculture and biological research, polyploids, a group of organisms with more than two sets of chromosomes, are very difficult to study. Increasing studies have used high-density genetic linkage maps to investigate the genome structure and function of polyploids and to identify genes underlying polyploid traits. However, although models for linkage analysis have been well established for diploids, with some essential modifications for tetraploids, no models have been available thus far for polyploids at higher ploidy levels. The linkage analysis of polyploids typically requires knowledge about their meiotic mechanisms, depending on the origin of polyplody. Here we describe a computational modeling framework for linkage analysis in allohexaploids by integrating their preferential chromosomal-pairing meiotic feature into a mixture model setting. The framework, implemented with the EM algorithm, allows the simultaneous estimates of preferential pairing factors and the recombination fraction. We investigated statistical properties of the framework through extensive computer simulation and validated its usefulness and utility by analyzing a real data from a full-sib family of allohexaploid persimmon. Our attempt in linkage analysis of allohexaploids by incorporating their meiotic mechanism lays a foundation for allohexaploid genetic mapping and also provides a new horizon to explore allohexaploid parental kinship.


2020 ◽  
Author(s):  
yan zheng ◽  
Yuanke Zhong ◽  
Jialu Hu ◽  
Xuequn Shang

Abstract Background: Single-cell RNA sequencing (scRNA-seq) enables the possibility of many in-depth transcriptomic analyses at a single-cell resolution, it’s already widely used for exploring the dynamic development process of life, studying the gene regulation mechanism, and discovering new cell types. However, the low RNA capture rate, which cause highly sparse expression with dropout, makes it difficult to do downstream analyses.Method: Most current methods use bimodal model to fit the gene expression with overwhelming zero. In this paper, we proposed scRNA-seq complementation (SCC) to solve the dropout problem in scRNA-seq data. Firstly, we find the nearest neighbor cells of every cell. Then we use a mixture model to impute the dropouts of scRNA-seq data. The model can identify the possibility of dropouts and estimates the reasonable gene expression value.Results: Experiment results show that SCC gives competitive results compared to two existing methods while showing superiority in reducing the intra-class distance of cells and improving the clustering accuracy in both simulation and real data.Conclusions: SCC is an effective tool to resolve the dropout noise in scRNA-seq data. The code is freely accessible at https://github.com/nwpuzhengyan/SCC.


COMPSTAT ◽  
1996 ◽  
pp. 259-263 ◽  
Author(s):  
Z. Geng ◽  
F. Tao ◽  
K. Wan ◽  
Ch. Asano ◽  
M. Ichimura ◽  
...  

2000 ◽  
Vol 16 (2) ◽  
pp. 176-199 ◽  
Author(s):  
Jukka Nyblom ◽  
Andrew Harvey

This paper is concerned with tests in multivariate time series models made up of random walk (with drift) and stationary components. When the stationary component is white noise, a Lagrange multiplier test of the hypothesis that the covariance matrix of the disturbances driving the multivariate random walk is null is shown to be locally best invariant, something that does not automatically follow in the multivariate case. The asymptotic distribution of the test statistic is derived for the general model. The test is then extended to deal with a serially correlated stationary component. The main contribution of the paper is to propose a test of the validity of a specified value for the rank of the covariance matrix of the disturbances driving the multivariate random walk. This rank is equal to the number of common trends, or levels, in the series. The test is very simple insofar as it does not require any models to be estimated, even if serial correlation is present. Its use with real data is illustrated in the context of a stochastic volatility model, and the relationship with tests in the cointegration literature is discussed.


Sign in / Sign up

Export Citation Format

Share Document