Robust outlier detection in geo-spatial data based on LOLIMOT and KNN search

Author(s):  
Mohammadreza Tabatabaei ◽  
Roohollah Kimiaefar ◽  
Alireza Hajian ◽  
Alireza Akbari
2004 ◽  
Vol 13 (04) ◽  
pp. 801-811 ◽  
Author(s):  
CHANG-TIEN LU ◽  
DECHANG CHEN ◽  
YUFENG KOU

A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from the values of its neighborhood. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and useful spatial patterns for further analysis. Previous work in spatial outlier detection focuses on detecting spatial outliers with a single attribute. In the paper, we propose two approaches to discover spatial outliers with multiple attributes. We formulate the multi-attribute spatial outlier detection problem in a general way, provide two effective detection algorithms, and analyze their computation complexity. In addition, using a real-world census data, we demonstrate that our approaches can effectively identify local abnormality in large spatial data sets.


2018 ◽  
Vol 2 ◽  
pp. e26104
Author(s):  
Julien Troudet ◽  
Fred Legendre ◽  
Régine Vignes-Lebbe

Primary biodiversity data, or occurrence data, are being produced at an increasing rate and are used in numerous studies (Hampton et al. 2013, La Salle et al. 2016). This data avalanche is a remarkable opportunity but it comes with hurdles. First, available software solutions are rare for very large datasets and those solutions often require significant computer skills (Gaiji et al. 2013), while most biologists are not formally trained in bioinformatics (List et al. 2017). Second, large datasets are heterogeneous because they come from different producers and they can contain erroneous data (Gaiji et al. 2013). Hence, they need to be curated. In this context, we developed a biodiversity occurrence curator designed to quickly handle large amounts of data through a simple interface: the Darwin Core Spatial Processor (DwCSP). DwCSP does not require the installation or use of third-party software and has a simple graphical user interface that requires no computer knowledge. DwCSP allows for the data enrichment of biodiversity occurrences and also ensures data quality through outlier detection. For example, the software can enrich a tabulated occurrence file (Darwin Core for instance) with spatial data from polygon files (e.g., Esri shapefile) or a Rasters file (geotiff). The speed of the enriching procedures is ensured through multithreading and optimized spatial access methods (R-Tree indexes). DwCSP can also detect and tag outliers based on their geographic coordinates or environmental variables. The first type of outlier detection uses a computed distance between the occurrence and its nearest neighbors, whereas the second type uses a Mahalanobis distance (Mahalanobis 1936). One hundred thousand occurrences can be processed by DwCSP in less than 20 minutes and another test on forty million occurrences was completed in a few days on a recent personal computer. DwCSP has an English interface including documentation and will be available as a stand-alone Java Archive (JAR) executable that works on all computers having a Java environment (version 1.8 and onward).


Author(s):  
Roopa G. M. ◽  
Arun Kumar G. H. ◽  
Naveen Kumar K. R. ◽  
Nirmala C. R.

Enormous agricultural data collected using sensors for crop management decisions on spatial data with soil parameters like N, P, K, pH, and EC enhances crop growth for soil type. Spatial data play vital role in DSS, but inconsistent values leads to improper inferences. From EDA, few observations involve outliers that deviates crop management assessments. In spatial data context, outliers are the observations whose non-spatial attributes are distinct from other observations. Thus, treating an entire field as uniform area is trivial which influence the farmers to use expensive fertilizers. Iterative-R algorithm is applied for outlier detection to reduce the masking/swamping effects. Outlier-free data defines interpretable field patterns to satisfy statistical assumptions. For heterogeneous farms, the aim is to identify sub-fields and percentage of fertilizers. MZD achieved by interpolation technique predicts the unobserved values by comparing with its known neighbor-points. MZD suggests the farmers with better knowledge of soil fertility, field variability, and fertilizer applying rates.


2012 ◽  
Vol 2 (3) ◽  
pp. 98-101 ◽  
Author(s):  
E.Sateesh E.Sateesh ◽  
◽  
M.L.Prasanthi M.L.Prasanthi

Sign in / Sign up

Export Citation Format

Share Document