A GRAPH-BASED APPROACH TO DETECT ABNORMAL SPATIAL POINTS AND REGIONS

2011 ◽  
Vol 20 (04) ◽  
pp. 721-751 ◽  
Author(s):  
CHANG-TIEN LU ◽  
RAIMUNDO F. DOS SANTOS ◽  
XUTONG LIU ◽  
YUFENG KOU

Spatial outliers are the spatial objects whose nonspatial attribute values are quite different from those of their spatial neighbors. Identification of spatial outliers is an important task for data mining researchers and geographers. A number of algorithms have been developed to detect spatial anomalies in meteorological images, transportation systems, and contagious disease data. In this paper, we propose a set of graph-based algorithms to identify spatial outliers. Our method first constructs a graph based on k-nearest neighbor relationship in spatial domain, assigns the differences of nonspatial attribute as edge weights, and continuously cuts high-weight edges to identify isolated points or regions that are much dissimilar to their neighboring objects. The proposed algorithms have three major advantages compared with other existing spatial outlier detection methods: accurate in detecting both point and region outliers, capable of avoiding false outliers, and capable of computing the local outlierness of an object within subgraphs. We present time complexity of the algorithms, and show experiments conducted on US housing and Census data to demonstrate the effectiveness of the proposed approaches.

2021 ◽  
Vol 25 (6) ◽  
pp. 1453-1471
Author(s):  
Chunhua Tang ◽  
Han Wang ◽  
Zhiwen Wang ◽  
Xiangkun Zeng ◽  
Huaran Yan ◽  
...  

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.


Symmetry ◽  
2019 ◽  
Vol 11 (6) ◽  
pp. 815 ◽  
Author(s):  
Minghui Ma ◽  
Shidong Liang ◽  
Yifei Qin

Traffic data are the basis of traffic control, planning, management, and other implementations. Incomplete traffic data that are not conducive to all aspects of transport research and related activities can have adverse effects such as traffic status identification error and poor control performance. For intelligent transportation systems, the data recovery strategy has become increasingly important since the application of the traffic system relies on the traffic data quality. In this study, a bidirectional k-nearest neighbor searching strategy was constructed for effectively detecting and recovering abnormal data considering the symmetric time network and the correlation of the traffic data in time dimension. Moreover, the state vector of the proposed bidirectional searching strategy was designed based the bidirectional retrieval for enhancing the accuracy. In addition, the proposed bidirectional searching strategy shows significantly more accuracy compared to those of the previous methods.


Author(s):  
Heru Ismanto ◽  
Retantyo Wardoyo

Developing a sustainable activity needs a good plan, so the programs can be effective and have a clear objective. Therefore, a model to help the analysis is significantly needed in determining the priority area to conduct better development in the future. This research applies the concept of Klassen Typology to analyze PDRB data in Papua Province. Based on the result of using Klassen typology analysis method, there are 4 (four) quadrants of area classification in Papua Province. Twenty nine regencies were analyzed based on PDRB data to investigate which area can be used as the development of priority area in the future. The method used in this study is C4.5 and K-Nearest Neighbor. Time complexity becomes test standard of a particular algorithm to get efficient execution time when it is implemented into programming language. The approach of asymptotic analysis using the concept of Big-O is one of the techniques that is usually used to test time complexity of an algorithm. Based on the test result of both methods, it shows that the result of running time of KNN is more stable than of C4.5 although the analysis of Big-O gives the same complexity.


Author(s):  
Aditya Ashvin Doshi ◽  
Prabu Sevugan ◽  
P. Swarnalatha

A number of methodologies are available in the field of data mining, machine learning, and pattern recognition for solving classification problems. In past few years, retrieval and extraction of information from a large amount of data is growing rapidly. Classification is nothing but a stepwise process of prediction of responses using some existing data. Some of the existing prediction algorithms are support vector machine and k-nearest neighbor. But there is always some drawback of each algorithm depending upon the type of data. To reduce misclassification, a new methodology of support vector machine is introduced. Instead of having the hyperplane exactly in middle, the position of hyperplane is to be change per number of data points of class available near the hyperplane. To optimize the time consumption for computation of classification algorithm, some multi-core architecture is used to compute more than one independent module simultaneously. All this results in reduction in misclassification and faster computation of class for data point.


2019 ◽  
Vol 52 (7-8) ◽  
pp. 985-994
Author(s):  
Mustafa Teke ◽  
Fecir Duran

Intelligent transportation systems are advanced applications that inform vehicle drivers about road conditions. The main purpose of the intelligent transportation systems is to reduce either tangible or intangible loss for the drivers by ensuring the safety of passengers and vehicles. In this study, a system is designed and implemented using wireless sensor networks to inform vehicle drivers about the condition of the road surface. Icing has been chosen as the primary focus of the study since it is considered to be a big threat to road and driver’s safety. The temperature at 10 cm depth of the road, air temperature, relative humidity, air pressure and conductivity values are used as the input data for the prediction of icing on the road surface. The data were previously collected on Raspberry Pi which is a single-board computer and the data were read and processed instantly via k-nearest neighbor algorithm. Using these collected data, the road surface condition is classified as icy, dry, wet or salty-wet. The analyzed results for the road surface condition are presented to the drivers via a mobile application in real time. The drivers are alerted visually and audibly as they approach the coordinates on the road where risky conditions are present.


Author(s):  
Md. Ashikuzzaman ◽  
Wasim Akram ◽  
Md. Mydul Islam Anik ◽  
Mahamudul Hasan ◽  
Md. Sawkat Ali ◽  
...  

Traffic accident is a global threat which causes health and economic casualties all around the world. Due to the expansion of transportation systems, congestion can lead to spike road accident. Every year thousands of people have died due to traffic accidents. Various technologies have been adopted by modern cities to minimize traffic accidents. Therefore, to ensure people’s safety, the concept of the smart city has been introduced. In a smart city, factors like road, light, and weather conditions are important to consider to predict traffic mishap. Several machine learning models have been implemented in the existing literature to determine and predict traffic collision. But the accuracy is not enough and there exist a lot of challenges in determining the accident. In this paper, an approach of particle swarm optimization with artificial neural network (PSO-ANN) has been proposed to determine traffic collision using the dataset of the transport department of United Kingdom. The performance of PSO-ANN outperforms the existing machine learning model. PSO-ANN model can be adopted in the transportation system to counter traffic accident issues. Random Forest, Naıve Bayes, Nearest Centroid, K-Nearest Neighbor classification have been used to compare with the proposed PSO-ANN model.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-22
Author(s):  
Shi Ying ◽  
Bingming Wang ◽  
Lu Wang ◽  
Qingshan Li ◽  
Yishi Zhao ◽  
...  

Logs that record system abnormal states (anomaly logs) can be regarded as outliers, and the k-Nearest Neighbor (kNN) algorithm has relatively high accuracy in outlier detection methods. Therefore, we use the kNN algorithm to detect anomalies in the log data. However, there are some problems when using the kNN algorithm to detect anomalies, three of which are: excessive vector dimension leads to inefficient kNN algorithm, unlabeled log data cannot support the kNN algorithm, and the imbalance of the number of log data distorts the classification decision of kNN algorithm. In order to solve these three problems, we propose an efficient log anomaly detection method based on an improved kNN algorithm with an automatically labeled sample set. This method first proposes a log parsing method based on N-gram and frequent pattern mining (FPM) method, which reduces the dimension of the log vector converted with Term frequency.Inverse Document Frequency (TF-IDF) technology. Then we use clustering and self-training method to get labeled log data sample set from historical logs automatically. Finally, we improve the kNN algorithm using average weighting technology, which improves the accuracy of the kNN algorithm on unbalanced samples. The method in this article is validated on six log datasets with different types.


Author(s):  
M. Jeyanthi ◽  
C. Velayutham

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.


2020 ◽  
Vol 17 (1) ◽  
pp. 319-328
Author(s):  
Ade Muchlis Maulana Anwar ◽  
Prihastuti Harsani ◽  
Aries Maesya

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.


Sign in / Sign up

Export Citation Format

Share Document