scholarly journals Detecting Outliers in High Dimensional Data Sets using Z-Score Methodology

Outlier detection is an interesting research area in machine learning. With the recently emergent tools and varied applications, the attention of outlier recognition is growing significantly. Recently, a significant number of outlier detection approaches have been observed and effectively applied in a wide range of fields, comprising medical health, credit card fraud and intrusion detection. They can be utilized for conservative data analysis. However, Outlier recognition aims to discover sequence in data that do not conform to estimated performance. In this paper, we presented a statistical approach called Z-score method for outlier recognition in high-dimensional data. Z-scores is a novel method for deciding distant data based on data positions on charts. The projected method is computationally fast and robust to outliers’ recognition. A comparative Analysis with extant methods is implemented with high dimensional datasets. Exploratory outcomes determines an enhanced accomplishment, efficiency and effectiveness of our projected methods.

2012 ◽  
Vol 6-7 ◽  
pp. 621-624
Author(s):  
Hong Bin Fang

Outlier detection is an important field of data mining, which is widely used in credit card fraud detection, network intrusion detection ,etc. A kind of high dimensional data similarity metric function and the concept of class density are given in the paper, basing on the combination of hierarchical clustering and similarity, as well as outlier detection algorithm about similarity measurement is presented after the redefinition of high dimension density outliers is put. The algorithm has some value for outliers detection of high dimensional data set in view of experimental result.


Outlier detection in large datasets is the dynamic research area in computer science such as data mining, database systems, and distributed systems. Outlier detection faces many challenges due to the absence of data samples from the outlier class. Massive algorithms have been projected to conquer the challenges in this field to improve the efficiency of regression approach for large datasets. Currently, no particular efficient regression technique is designed for outlier detection. In this research, we proposed an ElasticNet regression model for detecting the outliers in high dimensional data. To validate the efficiency and competence of our projected algorithm, it is implemented in the open source software called Weka Explorer. The parameters such as Mean absolute error 0.0022, RMSE 0.0387, Relative absolute error (RAE) 0.4562 and Root relative squared error (RSE) 7.8722 are calculated using annthyroid dataset. ElasticNet model consumes less computational time, generates fast convergence results, provides high accuracy and correctly classified accuracy is 98.25%.


Detecting Outliers has become a significant research area in data mining in last few years. The focus of this research has been to identify patterns or objects in huge data sets of a database that are exceptional from normal pattern, specifically dissimilar, and unpredictable with reference to the most of the datasets. As billions of personal computers, and internet users rose phenomenally, huge data sets of real life applications have been created for new challenges as well as explorations in research for Outlier detection. Many traditional techniques to detect outliers have unable to yield good results in such environments. So, developing a method to detect Outliers has become a critical task. A method to identify anomalies in high dimensional data based on Lasso Regression has been study in this research. This framework has been implemented in the open source JMP software. The parameters such as RSquare 0.001162, RMSE 0.031806 and Mean Response 0.007889 are calculated using Spambase dataset. The results from the experiments have shown that the proposed method detects Outliers in high dimensional data with potentially higher accuracy.


Sign in / Sign up

Export Citation Format

Share Document