Outlier detection via localized p-value estimation

Author(s):  
Manqi Zhao ◽  
Venkatesh Saligrama
2021 ◽  
Vol 3 (1) ◽  
pp. 8
Author(s):  
Ilham Thaib ◽  
Gesit Thabrani ◽  
Silvia Netsyah

The public sea freight sector is one of the affected by COVID-19. PT. Samudera Indonesia Tbk is one of the sea transportations companies in Indonesia. The ARIMA model in the previous study provided a statistical test with the aim of evaluating the suitability of the model with a p value of less than 0.05 to determine ARIMA by guessing through ACF (Autocorrelation Function) and PACF (Partial Autocorrelation Function) through stationary data. Outlier detection can be done by plotting the residuals from the specified model. Forecasting data for the next 5 days using the ARIMA (3,1,2) model can be seen that the results of forecasting stock price data for PT. Samudera Indonesia Tbk using ARIMA (3,1,2) is within the 95% confidence interval with a forecast value that is close to the actual value. There are outliers that are detected which are related to economic phenomena.Keywords: Forecasting, Covid-19, stock, ARIMA, outlier


2020 ◽  
Author(s):  
Balint Magyar ◽  
Ambrus Kenyeres ◽  
Sandor Toth ◽  
Istvan Hajdu

<p>The GNSS velocity field filtering topic can be identified as a multi-dimensional unsupervised spatial outlier detection problem. In the discussed case, we jointly interpreted the horizontal and vertical velocity fields and its uncertainties as a six dimensional space. To detect and classify the spatial outliers, we performed an orthogonal linear transformation technique called Principal Component Analysis (PCA) to dynamically project the data to a lower dimensional subspace, while redacting the most (~99%) of the explained variance of the input data.</p><p>Therefore, the resulting component space can be seen as an attribute function, which describes the investigated deformation patterns. Then we constructed two subspace mapping functions, respectively the k-nearest neighbor (k-NN) and median based neighbor function with Haversine metric, and the samplewise comparison function which compares the samples with the properties of its k-NN environment. Consequently, the resulting comparison function scores highlights the significantly different observations as outliers. Assuming that the data comes from Multivariate Gaussian Distribution (MVD), we evaluated the corresponding Mahalanobis-distance with the estimation of the robust covariance matrix of the investigated area. Then, as the main result of the Robust Mahalanobis-distance (RMD) based approach, we implemented the binary classification via the p-value and critical Mahalanobis-distance thresholding.</p><p>Compared to the formerly investigated and applied One-Class Support Vector machine (OCSVM) approach, the RMD based solution gives <em>~ 17%</em> more accurate results of the European scaled velocity field filtering (like EPN D1933), as well as it corrects the ambiguities and non-desired features (like overfitting) of the former OCSVM approach.</p><p>The results will be also presented as an interactive web page of the velocity fields of the latest version of EPN D2050 filtered with the introduced RMD approach.</p>


2021 ◽  
Author(s):  
Martin A. Hoffmann ◽  
Louis-Félix Nothias ◽  
Marcus Ludwig ◽  
Markus Fleischauer ◽  
Emily C. Gentry ◽  
...  

Untargeted metabolomics experiments rely on spectral libraries for structure annotation, but these libraries are vastly incomplete; in silico methods search in structure databases but cannot distinguish between correct and incorrect annotations. As biological interpretation relies on accurate structure annotations, the ability to assign confidence to such annotations is a key outstanding problem. We introduce the COSMIC workflow that combines structure database generation, in silico annotation, and a confidence score consisting of kernel density p-value estimation and a Support Vector Machine with enforced directionality of features. In evaluation, COSMIC annotates a substantial number of hits at small false discovery rates, and outperforms spectral library search for this purpose. To demonstrate that COSMIC can annotate structures never reported before, we annotated twelve novel bile acid conjugates; nine structures were confirmed by manual evaluation and two structures using synthetic standards. Second, we annotated and manually evaluated 315 molecular structures in human samples currently absent from the Human Metabolome Database. Third, we applied COSMIC to 17,400 experimental runs and annotated 1,715 structures with high confidence that were absent from spectral libraries.


Biostatistics ◽  
2008 ◽  
Vol 9 (4) ◽  
pp. 601-612 ◽  
Author(s):  
R. Kustra ◽  
X. Shi ◽  
D. J. Murdoch ◽  
C. M. T. Greenwood ◽  
J. Rangrej

Sign in / Sign up

Export Citation Format

Share Document