An efficient local outlier detection optimized by rough clustering

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-211433 ◽

2021 ◽

pp. 1-12

Author(s):

Chunyan She ◽

Shaohua Zeng

Keyword(s):

Outlier Detection ◽

False Positive Rate ◽

Time Cost ◽

Local Distribution ◽

Running Speed ◽

Local Outlier Factor ◽

Real World Applications ◽

Positive Rate ◽

Nearest Neighborhood ◽

Local Outlier

Outlier detection is a hot issue in data mining, which has plenty of real-world applications. LOF (Local Outlier Factor) can capture the abnormal degree of objects in the dataset with different density levels, and many extended algorithms have been proposed in recent years. However, the LOF needs to search the nearest neighborhood of each object on the whole dataset, which greatly increases the time cost. Most of these extended algorithms only consider the distance between an object and its neighborhood, but ignore the local distribution of an object within its neighborhood, resulting in a high false-positive rate. To improve the running speed, a rough clustering based on triple fusion is proposed, which divides a dataset into several subsets and outlier detection is performed only on each subset. Then, considering the local distribution of an object within its neighborhood, a new local outlier factor is constructed to estimate the abnormal degree of each object. Finally, the experimental results indicate that the proposed algorithm has better performance and lower running time than the others.

Download Full-text

Outlier Detection Method for Flash Flood Disaster Monitoring Data based on Information Entropy

Journal of Physics Conference Series ◽

10.1088/1742-6596/2138/1/012013 ◽

2021 ◽

Vol 2138 (1) ◽

pp. 012013

Author(s):

Yongzhi Chen ◽

Ziao Xu ◽

Chaoqun Niu

Keyword(s):

Outlier Detection ◽

Information Entropy ◽

Detection Method ◽

Flash Flood ◽

False Positive Rate ◽

Flood Disaster ◽

Detection Methods ◽

Positive Rate ◽

Disaster Monitoring ◽

Local Outlier

Abstract In the research of flash flood disaster monitoring and early warning, the Internet of Things is widely used in real-time information collection. There are abnormal situations such as noise, repetition and errors in a large amount of data collected by sensors, which will lead to false alarm, lower prediction accuracy and other problems. Aiming at the characteristic that outliers flow of sensors will cause obvious fluctuation of information entropy, this paper proposes a local outlier detection method based on information entropy and optimized by sliding window and LOF (Local Outlier Factor). This method can be used to improve the data quality, thus improving the accuracy of disaster prediction. The method is applied to data stream processing of water sensor, and the experimental results show that the method can accurately detect outliers. Compared with the existing detection methods that only use data distance to determine, the test positive rate is improved and the false positive rate is reduced.

Download Full-text

Outlier Detection for Transformer's Oil Chromatographic Data Based on Metric Learning and the Weighted Local Outlier Factor

2019 6th International Conference on Systems and Informatics (ICSAI) ◽

10.1109/icsai48974.2019.9010155 ◽

2019 ◽

Author(s):

Jiafeng Qin ◽

Yi Yang ◽

Chao Gu ◽

Zijing Hong ◽

Hongyi Du

Keyword(s):

Outlier Detection ◽

Metric Learning ◽

Chromatographic Data ◽

Local Outlier Factor ◽

Local Outlier

Download Full-text

Density-based Outlier Detection by Local Outlier Factor on Largescale Traffic Data

Electronic Imaging ◽

10.2352/issn.2470-1173.2016.14.ipmva-385 ◽

2016 ◽

Vol 2016 (14) ◽

pp. 1-4 ◽

Cited By ~ 10

Author(s):

Mathew X Ma ◽

Henry Y.T Ngan ◽

Wei Liu

Keyword(s):

Outlier Detection ◽

Traffic Data ◽

Local Outlier Factor ◽

Local Outlier

Download Full-text

Improving the outlier detection method in concrete mix design by combining the isolation forest and local outlier factor

Construction and Building Materials ◽

10.1016/j.conbuildmat.2020.121396 ◽

2020 ◽

pp. 121396

Author(s):

Raed Alsini ◽

Abdullah Almakrab ◽

Ahmed Ibrahim ◽

Xiaogang Ma

Keyword(s):

Outlier Detection ◽

Detection Method ◽

Mix Design ◽

Local Outlier Factor ◽

Concrete Mix Design ◽

Concrete Mix ◽

Local Outlier ◽

Isolation Forest

Download Full-text

Identification of Influential Variants in Significant Aggregate Rare Variant Tests

Human Heredity ◽

10.1159/000513290 ◽

2021 ◽

pp. 1-13

Author(s):

Rachel Z. Blumhagen ◽

David A. Schwartz ◽

Carl D. Langefeld ◽

Tasha E. Fingerlin

Keyword(s):

Outlier Detection ◽

Rare Variant ◽

False Positive ◽

Rare Variants ◽

False Positive Rate ◽

Experimental Studies ◽

Detection Methods ◽

True Positive ◽

Adaptive Combination ◽

Positive Rate

Introduction: Studies that examine the role of rare variants in both simple and complex disease are increasingly common. Though the usual approach of testing rare variants in aggregate sets is more powerful than testing individual variants, it is of interest to identify the variants that are plausible drivers of the association. We present a novel method for prioritization of rare variants after a significant aggregate test by quantifying the influence of the variant on the aggregate test of association. Methods: In addition to providing a measure used to rank variants, we use outlier detection methods to present the computationally efficient Rare Variant Influential Filtering Tool (RIFT) to identify a subset of variants that influence the disease association. We evaluated several outlier detection methods that vary based on the underlying variance measure: interquartile range (Tukey fences), median absolute deviation, and SD. We performed 1,000 simulations for 50 regions of size 3 kb and compared the true and false positive rates. We compared RIFT using the Inner Tukey to 2 existing methods: adaptive combination of p values (ADA) and a Bayesian hierarchical model (BeviMed). Finally, we applied this method to data from our targeted resequencing study in idiopathic pulmonary fibrosis (IPF). Results: All outlier detection methods observed higher sensitivity to detect uncommon variants (0.001 < minor allele frequency, MAF > 0.03) compared to very rare variants (MAF <0.001). For uncommon variants, RIFT had a lower median false positive rate compared to the ADA. ADA and RIFT had significantly higher true positive rates than that observed for BeviMed. When applied to 2 regions found previously associated with IPF including 100 rare variants, we identified 6 polymorphisms with the greatest evidence for influencing the association with IPF. Discussion: In summary, RIFT has a high true positive rate while maintaining a low false positive rate for identifying polymorphisms influencing rare variant association tests. This work provides an approach to obtain greater resolution of the rare variant signals within significant aggregate sets; this information can provide an objective measure to prioritize variants for follow-up experimental studies and insight into the biological pathways involved.

Download Full-text

A Hybrid Vertex Outlier Detection Method Based on Distributed Representation and Local Outlier Factor

2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom) ◽

10.1109/uic-atc-scalcom-cbdcom-iop.2015.104 ◽

2015 ◽

Author(s):

Zili Li ◽

Li Zeng

Keyword(s):

Outlier Detection ◽

Detection Method ◽

Distributed Representation ◽

Local Outlier Factor ◽

Local Outlier

Download Full-text

A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams

Big Data and Cognitive Computing ◽

10.3390/bdcc5010001 ◽

2020 ◽

Vol 5 (1) ◽

pp. 1

Author(s):

Omar Alghushairy ◽

Raed Alsini ◽

Terence Soule ◽

Xiaogang Ma

Keyword(s):

Big Data ◽

Outlier Detection ◽

Data Streams ◽

Detection Methods ◽

Normal Range ◽

Local Outlier Factor ◽

Detection Algorithms ◽

Network Intrusion ◽

Entire Dataset ◽

Local Outlier

Outlier detection is a statistical procedure that aims to find suspicious events or items that are different from the normal form of a dataset. It has drawn considerable interest in the field of data mining and machine learning. Outlier detection is important in many applications, including fraud detection in credit card transactions and network intrusion detection. There are two general types of outlier detection: global and local. Global outliers fall outside the normal range for an entire dataset, whereas local outliers may fall within the normal range for the entire dataset, but outside the normal range for the surrounding data points. This paper addresses local outlier detection. The best-known technique for local outlier detection is the Local Outlier Factor (LOF), a density-based technique. There are many LOF algorithms for a static data environment; however, these algorithms cannot be applied directly to data streams, which are an important type of big data. In general, local outlier detection algorithms for data streams are still deficient and better algorithms need to be developed that can effectively analyze the high velocity of data streams to detect local outliers. This paper presents a literature review of local outlier detection algorithms in static and stream environments, with an emphasis on LOF algorithms. It collects and categorizes existing local outlier detection algorithms and analyzes their characteristics. Furthermore, the paper discusses the advantages and limitations of those algorithms and proposes several promising directions for developing improved local outlier detection methods for data streams.

Download Full-text

Dynamic graph embedding for outlier detection on multiple meteorological time series

PLoS ONE ◽

10.1371/journal.pone.0247119 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0247119

Author(s):

Gen Li ◽

Jason J. Jung

Keyword(s):

Time Series ◽

Outlier Detection ◽

Meteorological Data ◽

Graph Embedding ◽

Detection Methods ◽

Dynamic Graph ◽

Local Outlier Factor ◽

Box Plot ◽

Local Outlier ◽

Isolation Forest

Existing dynamic graph embedding-based outlier detection methods mainly focus on the evolution of graphs and ignore the similarities among them. To overcome this limitation for the effective detection of abnormal climatic events from meteorological time series, we proposed a dynamic graph embedding model based on graph proximity, called DynGPE. Climatic events are represented as a graph where each vertex indicates meteorological data and each edge indicates a spurious relationship between two meteorological time series that are not causally related. The graph proximity is described as the distance between two graphs. DynGPE can cluster similar climatic events in the embedding space. Abnormal climatic events are distant from most of the other events and can be detected using outlier detection methods. We conducted experiments by applying three outlier detection methods (i.e., isolation forest, local outlier factor, and box plot) to real meteorological data. The results showed that DynGPE achieves better results than the baseline by 44.3% on average in terms of the F-measure. Isolation forest provides the best performance and stability. It achieved higher results than the local outlier factor and box plot methods, namely, by 15.4% and 78.9% on average, respectively.

Download Full-text

Spatial Outlier Detection of CO2 Monitoring Data Based on Spatial Local Outlier Factor

Journal of Engineering Science and Technology Review ◽

10.25103/jestr.085.15 ◽

2015 ◽

Vol 8 (5) ◽

pp. 110-116

Author(s):

Liu Xin ◽

◽

Zhang Shaoliang ◽

Zheng Pulin ◽

◽

...

Keyword(s):

Outlier Detection ◽

Monitoring Data ◽

Spatial Outlier ◽

Local Outlier Factor ◽

Co2 Monitoring ◽

Local Outlier

Download Full-text

Structural similarity based common library detection method for Android

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20213920448 ◽

2021 ◽

Vol 39 (2) ◽

pp. 448-453

Author(s):

Zhiying Mu ◽

Zhihu Li ◽

Xiaoyu Li

Keyword(s):

Large Scale ◽

Detection Method ◽

False Positive Rate ◽

Structural Similarity ◽

Detection Methods ◽

Fine Grained ◽

Real World Applications ◽

Positive Rate ◽

Android Applications ◽

Detection Speed

The correct classifying and filtering of common libraries in Android applications can effectively improve the accuracy of repackaged application detection. However, the existing common library detection methods barely meet the requirement of large-scale app markets due to the low detection speed caused by their classification rules. Aiming at this problem, a structural similarity based common library detection method for Android is presented. The sub-packages with weak association to main package are extracted as common library candidates from the decompiled APK (Android application package) by using PDG (program dependency graph) method. With package structures and API calls being used as features, the classifying of those candidates is accomplished through coarse and fine-grained filtering. The experimental results by using real-world applications as dataset show that the detection speed of the present method is higher while the accuracy and false positive rate are both ensured. The method is proved to be efficient and precise.

Download Full-text