scholarly journals Comparison of Multivariate Outlier Detection Methods for Nearly Elliptical Distributions

2020 ◽  
Vol 49 (2) ◽  
pp. 1-17 ◽  
Author(s):  
Kazumi Wada ◽  
Mariko Kawano ◽  
Hiroe Tsubaki

In this paper, the performance of outlier detection methods has been evaluated with symmetrically distributed datasets. We choose four estimators, viz. modified Stahel-Donoho (MSD) estimators, blocked adaptive computationally efficient outlier nominators, minimum covariance determinant estimator obtained by a fast algorithm, and nearest-neighbour variance estimator, which are known for their good performance with elliptically distributed data, for practical applications in national survey data processing. We adopt the data model of multivariate skew-t distribution, of which only the direction of the main axis is skewed and contaminated with outliers following another probability distribution for evaluation. We conducted Monte Carlo simulation under the data distribution to compare the performance of outlier detection. We also explore the applicability of the selected methods for several accounting items in small and medium enterprise survey data. Accordingly, it was found that the MSD estimators are the most suitable.

2020 ◽  
Vol 7 (3) ◽  
pp. 12-29
Author(s):  
M. Fevzi Esen

Insider trading is one the most common deceptive trading practice in securities markets. Data mining appears as an effective approach to tackle the problems in fraud detection with high accuracy. In this study, the authors aim to detect outlying insider transactions depending on the variables affecting insider trading profitability. 1,241,603 sales and purchases of insiders, which range from 2010 to 2017, are analyzed by using classical and robust outlier detection methods. They computed robust distance scores based on minimum volume ellipsoid, Stahel-Donoho, and fast minimum covariance determinant estimators. To investigate the outlying observations that are likely to be fraudulent, they employ event study analysis to measure abnormal returns of outlying transactions. The results are compared to the abnormal returns of non-outlying transactions. They find that outlying transactions gain higher abnormal returns than transactions that are not flagged as outliers. Business intelligence and analytics may be a useful strategy for detecting and preventing of financial fraud for companies.


Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 517
Author(s):  
Seong-heum Kim ◽  
Youngbae Hwang

Owing to recent advancements in deep learning methods and relevant databases, it is becoming increasingly easier to recognize 3D objects using only RGB images from single viewpoints. This study investigates the major breakthroughs and current progress in deep learning-based monocular 3D object detection. For relatively low-cost data acquisition systems without depth sensors or cameras at multiple viewpoints, we first consider existing databases with 2D RGB photos and their relevant attributes. Based on this simple sensor modality for practical applications, deep learning-based monocular 3D object detection methods that overcome significant research challenges are categorized and summarized. We present the key concepts and detailed descriptions of representative single-stage and multiple-stage detection solutions. In addition, we discuss the effectiveness of the detection models on their baseline benchmarks. Finally, we explore several directions for future research on monocular 3D object detection.


2021 ◽  
Vol 15 (4) ◽  
pp. 1-20
Author(s):  
Georg Steinbuss ◽  
Klemens Böhm

Benchmarking unsupervised outlier detection is difficult. Outliers are rare, and existing benchmark data contains outliers with various and unknown characteristics. Fully synthetic data usually consists of outliers and regular instances with clear characteristics and thus allows for a more meaningful evaluation of detection methods in principle. Nonetheless, there have only been few attempts to include synthetic data in benchmarks for outlier detection. This might be due to the imprecise notion of outliers or to the difficulty to arrive at a good coverage of different domains with synthetic data. In this work, we propose a generic process for the generation of datasets for such benchmarking. The core idea is to reconstruct regular instances from existing real-world benchmark data while generating outliers so that they exhibit insightful characteristics. We propose and describe a generic process for the benchmarking of unsupervised outlier detection, as sketched so far. We then describe three instantiations of this generic process that generate outliers with specific characteristics, like local outliers. To validate our process, we perform a benchmark with state-of-the-art detection methods and carry out experiments to study the quality of data reconstructed in this way. Next to showcasing the workflow, this confirms the usefulness of our proposed process. In particular, our process yields regular instances close to the ones from real data. Summing up, we propose and validate a new and practical process for the benchmarking of unsupervised outlier detection.


Author(s):  
Hassan M.E. Azzazy ◽  
Mai M.H. Mansour ◽  
Tamer M. Samir ◽  
Ricardo Franco

AbstractIn order to meet the challenges of effective healthcare, the clinical laboratory is constantly striving to improve testing sensitivity while reducing the required time and cost. Gold nanoparticles (AuNPs) are proposed as one of the most promising tools to meet such goals. They have unique optophysical properties which enable sensitive detection of biomarkers, and are easily amenable to modification for use in different assay formats including immunoassays and molecular assays. Additionally, their preparation is relatively simple and their detection methods are quite versatile. AuNPs are showing substantial promise for effective practical applications and commercial utilization is already underway. This article covers the principles of preparation of AuNPs and their use for development of different diagnostic platforms.


Author(s):  
Jing Jin ◽  
Hua Fang ◽  
Ian Daly ◽  
Ruocheng Xiao ◽  
Yangyang Miao ◽  
...  

The common spatial patterns (CSP) algorithm is one of the most frequently used and effective spatial filtering methods for extracting relevant features for use in motor imagery brain–computer interfaces (MI-BCIs). However, the inherent defect of the traditional CSP algorithm is that it is highly sensitive to potential outliers, which adversely affects its performance in practical applications. In this work, we propose a novel feature optimization and outlier detection method for the CSP algorithm. Specifically, we use the minimum covariance determinant (MCD) to detect and remove outliers in the dataset, then we use the Fisher score to evaluate and select features. In addition, in order to prevent the emergence of new outliers, we propose an iterative minimum covariance determinant (IMCD) algorithm. We evaluate our proposed algorithm in terms of iteration times, classification accuracy and feature distribution using two BCI competition datasets. The experimental results show that the average classification performance of our proposed method is 12% and 22.9% higher than that of the traditional CSP method in two datasets ([Formula: see text]), and our proposed method obtains better performance in comparison with other competing methods. The results show that our method improves the performance of MI-BCI systems.


Ekonomika ◽  
2016 ◽  
Vol 95 (2) ◽  
pp. 118-138
Author(s):  
Camilla Jensen ◽  
Aušryte Rasteniene

Using Enterprise Survey data covering the period 2001–2011, the paper investigates the export behavior of Lithuanian firms and changes herein before, during and after the financial crisis. The primary objective is to investigate if there are changes in export behavior such as frequency, intensity, value and structure, hence focus lies on the results obtained with the standard enterprise survey data that is annual and collected before and after the crisis. The findings show that in a quantitative perspective the financial crisis has only a marginal impact on the long run exporting behavior of Lithuanian firms. There are no significant changes in number of exporters and exported percentage and only a small but negative effect on exported value when using simple ANOVA (F-test) analysis or more advanced regression analysis for repeated cross sections and panel data. The impact of the crisis falls more on the qualitative aspects of exporters from Lithuania. Generally do exporters, though affected by the crisis, outperform local market oriented firms in and over the crisis on factors such as productivity, sales growth and quality. Complementary evidence from the more ad-hoc and short-term focused financial crisis surveys corroborates the findings from the standard enterprise surveys. In every aspect investigated did exporters perform at least as well and often much better than firms catering solely to the local market. The financial crisis survey data reveals that exporters had higher capacity utilization, lower levels of indebtedness and recovered generally faster than other firms from the crisis. For the methodology, we conclude with this paper that the usage of repeated cross sections from the Standard enterprise surveys is the best way to investigate our research questions. This owes to the large drop in number of observations in the panel dataset published by the World Bank, making those results overtly vulnerable to outliers in the sample and unobservable attrition factors. The financial crisis survey data is mainuly useful towards understanding short run adjustments and financial aspects of the crisis, while structural aspects and exporting behavior is better covered with the standard surveys. The main methodology problem of using less than population data (making it sensitive to survey sampling routines) to investigate exporting behavior in general concerns the enormous skewedness that exists within the population of exporting firms. This owes to the phenomena that in most countries a handful of (multinational and locally owned) firms account for more than 50% of total exports. This is also increasingly true for a country such as Lithuania as the transition towards a market and open economy has progressed.


Data Mining ◽  
2013 ◽  
pp. 142-158
Author(s):  
Baoying Wang ◽  
Aijuan Dong

Clustering and outlier detection are important data mining areas. Online clustering and outlier detection generally work with continuous data streams generated at a rapid rate and have many practical applications, such as network instruction detection and online fraud detection. This chapter first reviews related background of online clustering and outlier detection. Then, an incremental clustering and outlier detection method for market-basket data is proposed and presented in details. This proposed method consists of two phases: weighted affinity measure clustering (WC clustering) and outlier detection. Specifically, given a data set, the WC clustering phase analyzes the data set and groups data items into clusters. Then, outlier detection phase examines each newly arrived transaction against the item clusters formed in WC clustering phase, and determines whether the new transaction is an outlier. Periodically, the newly collected transactions are analyzed using WC clustering to produce an updated set of clusters, against which transactions arrived afterwards are examined. The process is carried out continuously and incrementally. Finally, the future research trends on online data mining are explored at the end of the chapter.


Author(s):  
Fabrizio Angiulli

Data mining techniques can be grouped in four main categories: clustering, classification, dependency detection, and outlier detection. Clustering is the process of partitioning a set of objects into homogeneous groups, or clusters. Classification is the task of assigning objects to one of several predefined categories. Dependency detection searches for pairs of attribute sets which exhibit some degree of correlation in the data set at hand. The outlier detection task can be defined as follows: “Given a set of data points or objects, find the objects that are considerably dissimilar, exceptional or inconsistent with respect to the remaining data”. These exceptional objects as also referred to as outliers. Most of the early methods for outlier identification have been developed in the field of statistics (Hawkins, 1980; Barnett & Lewis, 1994). Hawkins’ definition of outlier clarifies the approach: “An outlier is an observation that deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism”. Indeed, statistical techniques assume that the given data set has a distribution model. Outliers are those points that satisfy a discordancy test, that is, that are significantly far from what would be their expected position given the hypothesized distribution. Many clustering, classification and dependency detection methods produce outliers as a by-product of their main task. For example, in classification, mislabeled objects are considered outliers and thus they are removed from the training set to improve the accuracy of the resulting classifier, while in clustering, objects that do not strongly belong to any cluster are considered outliers. Nevertheless, it must be said that searching for outliers through techniques specifically designed for tasks different from outlier detection could not be advantageous. As an example, clusters can be distorted by outliers and, thus, the quality of the outliers returned is affected by their presence. Moreover, other than returning a solution of higher quality, outlier detection algorithms can be vastly more efficient than non ad-hoc algorithms. While in many contexts outliers are considered as noise that must be eliminated, as pointed out elsewhere, “one person’s noise could be another person’s signal”, and thus outliers themselves can be of great interest. Outlier mining is used in telecom or credit card frauds to detect the atypical usage of telecom services or credit cards, in intrusion detection for detecting unauthorized accesses, in medical analysis to test abnormal reactions to new medical therapies, in marketing and customer segmentations to identify customers spending much more or much less than average customer, in surveillance systems, in data cleaning, and in many other fields.


Sign in / Sign up

Export Citation Format

Share Document