Flood Detection and Susceptibility Mapping Using Sentinel-1 Remote Sensing Data and a Machine Learning Approach: Hybrid Intelligence of Bagging Ensemble Based on K-Nearest Neighbor Classifier

Mapping flood-prone areas is a key activity in flood disaster management. In this paper, we propose a new flood susceptibility mapping technique. We employ new ensemble models based on bagging as a meta-classifier and K-Nearest Neighbor (KNN) coarse, cosine, cubic, and weighted base classifiers to spatially forecast flooding in the Haraz watershed in northern Iran. We identified flood-prone areas using data from Sentinel-1 sensor. We then selected 10 conditioning factors to spatially predict floods and assess their predictive power using the Relief Attribute Evaluation (RFAE) method. Model validation was performed using two statistical error indices and the area under the curve (AUC). Our results show that the Bagging–Cubic–KNN ensemble model outperformed other ensemble models. It decreased the overfitting and variance problems in the training dataset and enhanced the prediction accuracy of the Cubic–KNN model (AUC=0.660). We therefore recommend that the Bagging–Cubic–KNN model be more widely applied for the sustainable management of flood-prone areas.

Download Full-text

Entropy Based k Nearest Neighbor Pattern Classification (EbkNN): En-route to Achieving a High Accuracy in Breast Cancer Diagnosis

Asian Journal of Applied Sciences ◽

10.24203/ajas.v8i6.6386 ◽

2020 ◽

Vol 8 (6) ◽

Author(s):

Pushpam Sinha ◽

Ankita Sinha

Keyword(s):

Breast Cancer ◽

Pattern Classification ◽

Test Data ◽

Nearest Neighbor ◽

Training Dataset ◽

Breast Cancer Dataset ◽

K Nearest Neighbor ◽

Cancer Dataset ◽

Test Dataset ◽

Data Points

Entropy based k-Nearest Neighbor pattern classification (EbkNN) is a variation of the conventional k-Nearest Neighbor rule of pattern classification, which exclusively optimizes the value of k-neighbors for each test data based on the calculations of entropy. The formula for entropy used in EbkNN is the one that has been defined popularly in information theory for a set of n different types of information (class) attached to a total of m objects (data points) with each object defined by f features. In EbkNN that value of k is chosen for discrimination of given test data for which the entropy is the least non-zero value. Other rules of conventional kNN are retained in EbkNN. It is concluded that EbkNN works best for binary classification. It is computationally prohibitive to use EbkNN for discriminating the data points of the test dataset into number of classes greater than two. The biggest advantage of EbkNN vis-à-vis the conventional kNN is that in one single run of EbkNN algorithm we get optimum classification of test data. But conventional kNN algorithm has to be run separately for each of the selected range of values of k, and then the optimum k to be chosen from amongst them. We also tested our EbkNN method on WDBC (Wisconsin Diagnostic Breast Cancer) dataset. There are 569 instances in this dataset and we made a random choice of first 290 instances as training dataset and the rest 279 instances as test dataset. We got an exceptionally remarkable result with EbkNN method- accuracy close to 100% and better than the ones got by most of the other researchers who worked on WDBC dataset.

Download Full-text

Wireless Indoor Localization Using Convolutional Neural Network and Gaussian Process Regression

Sensors ◽

10.3390/s19112508 ◽

2019 ◽

Vol 19 (11) ◽

pp. 2508 ◽

Cited By ~ 8

Author(s):

Guolong Zhang ◽

Ping Wang ◽

Haibing Chen ◽

Lan Zhang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Gaussian Process ◽

Hybrid Model ◽

Nearest Neighbor ◽

Gaussian Process Regression ◽

Kernel Functions ◽

Training Dataset ◽

K Nearest Neighbor ◽

Positive Effects

This paper presents a localization model employing convolutional neural network (CNN) and Gaussian process regression (GPR) based on Wi-Fi received signal strength indication (RSSI) fingerprinting data. In the proposed scheme, the CNN model is trained by a training dataset. The trained model adapts to complex scenes with multipath effects or many access points (APs). More specifically, the pre-processing algorithm makes the RSSI vector which is formed by considerable RSSI values from different APs readable by the CNN algorithm. The trained CNN model improves the positioning performance by taking a series of RSSI vectors into account and extracting local features. In this design, however, the performance is to be further improved by applying the GPR algorithm to adjust the coordinates of target points and offset the over-fitting problem of CNN. After implementing the hybrid model, the model is experimented with a public database that was collected from a library of Jaume I University in Spain. The results show that the hybrid model has outperformed the model using k-nearest neighbor (KNN) by 61.8%. While the CNN model improves the performance by 45.8%, the GPR algorithm further enhances the localization accuracy. In addition, the paper has also experimented with the three kernel functions, all of which have been demonstrated to have positive effects on GPR.

Download Full-text

Training Dataset Extension Through Multiclass Generative Adversarial Networks and K-nearest Neighbor Classifier

Communications in Computer and Information Science - Recent Trends in Image Processing and Pattern Recognition ◽

10.1007/978-981-13-9181-1_52 ◽

2019 ◽

pp. 596-610

Author(s):

Hubert Cecotti ◽

Ganesh Jha

Keyword(s):

Nearest Neighbor ◽

Training Dataset ◽

Generative Adversarial Networks ◽

K Nearest Neighbor ◽

Nearest Neighbor Classifier ◽

Adversarial Networks ◽

Neighbor Classifier

Download Full-text

Smartphone Naïve Bayes Human Activity Recognition Using Personalized Datasets

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2020.p0685 ◽

2020 ◽

Vol 24 (5) ◽

pp. 685-702

Author(s):

Moses L. Gadebe ◽

◽

Okuthe P. Kogeda ◽

Sunday O. Ojo

Keyword(s):

Support Vector Machines ◽

Real Time ◽

Activity Recognition ◽

Human Activity ◽

Nearest Neighbor ◽

Human Activity Recognition ◽

Training Dataset ◽

Support Vector ◽

K Nearest Neighbor ◽

Vector Machines

Recognizing human activity in real time with a limited dataset is possible on a resource-constrained device. However, most classification algorithms such as Support Vector Machines, C4.5, and K Nearest Neighbor require a large dataset to accurately predict human activities. In this paper, we present a novel real-time human activity recognition model based on Gaussian Naïve Bayes (GNB) algorithm using a personalized JavaScript object notation dataset extracted from the publicly available Physical Activity Monitoring for Aging People dataset and University of Southern California Human Activity dataset. With the proposed method, the personalized JSON training dataset is extracted and compressed into a 12×8 multi-dimensional array of the time-domain features extracted using a signal magnitude vector and tilt angles from tri-axial accelerometer sensor data. The algorithm is implemented on the Android platform using the Cordova cross-platform framework with HTML5 and JavaScript. Leave-one-activity-out cross validation is implemented as a testTrainer() function, the results of which are presented using a confusion matrix. The testTrainer() function leaves category K as the testing subset and the remaining K-1 as the training dataset to validate the proposed GNB algorithm. The proposed model is inexpensive in terms of memory and computational power owing to the use of a compressed small training dataset. Each K category was repeated five times and the algorithm consistently produced the same result for each test. The result of the simulation using the tilted angle features shows overall precision, recall, F-measure, and accuracy rates of 90%, 99.6%, 94.18%, and 89.51% respectively, in comparison to rates of 36.9%, 75%, 42%, and 36.9% when the signal magnitude vector features were used. The results of the simulations confirmed and proved that when using the tilt angle dataset, the GNB algorithm is superior to Support Vector Machines, C4.5, and K Nearest Neighbor algorithms.

Download Full-text

WEIGHTED THRESHOLD-BASED CLUSTERING FOR INTRUSION DETECTION SYSTEMS

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026806001770 ◽

2006 ◽

Vol 06 (01) ◽

pp. 1-19 ◽

Cited By ~ 5

Author(s):

VLADIMIR NIKULIN

Keyword(s):

Intrusion Detection ◽

Nearest Neighbor ◽

Intrusion Detection Systems ◽

Training Dataset ◽

K Nearest Neighbor ◽

Detection Systems ◽

Special Procedure ◽

Original Dataset ◽

The Status ◽

Weight Coefficients

Signature-based intrusion detection systems look for known, suspicious patterns in the input data. In this paper we explore compression of labeled empirical data using threshold-based clustering with regularization. The main target of clustering is to compress training dataset to the limited number of signatures, and to minimize the number of comparisons that are necessary to determine the status of the input event as a result. Essentially, the process of clustering includes merging of the clusters which are close enough. As a consequence, we will reduce original dataset to the limited number of labeled centroids. In a complex with k-nearest-neighbor (kNN) method, this set of centroids may be used as a multi-class classifier. Clearly, different attributes have different importance depending on the particular training database and given cost matrix. This importance may be regulated in the definition of the distance using linear weight coefficients. The paper introduces special procedure to estimate above weight coefficients. The experiments on the KDD-99 intrusion detection dataset have confirmed the effectiveness of the proposed methods.

Download Full-text

Classification of high knee flexion postures using EMG signals

Work ◽

10.3233/wor-203404 ◽

2021 ◽

Vol 68 (3) ◽

pp. 701-709

Author(s):

Annemarie F. Laudanski ◽

Stacey M. Acker

Keyword(s):

Knee Flexion ◽

Nearest Neighbor ◽

Occupational Exposures ◽

Training Dataset ◽

K Nearest Neighbor ◽

Lower Limb Muscles ◽

Increased Risk ◽

Working Day ◽

Occupational Settings

BACKGROUND: High knee flexion postures are often adopted in occupational settings and may lead to increased risk of knee osteoarthritis. Pattern recognition algorithms using wireless electromyographic (EMG) signals may be capable of detecting and quantifying occupational exposures throughout a working day. OBJECTIVE: To develop a k-Nearest Neighbor (kNN) algorithm for the classification of eight high knee flexion activities frequently observed in childcare. METHODS: EMG signals from eight lower limb muscles were recorded for 30 participants, signals were decomposed into time- and frequency-domain features, and used to develop a kNN classification algorithm. Features were reduced to a combination of ten time-domain features from 8 muscles using neighborhood component analysis, in order to most effectively identify the postures of interest. RESULTS: The final classifier was capable of accurately identifying 80.1%of high knee flexion postures based on novel data from participants included in the training dataset, yet only achieved 18.4%accuracy when predicting postures based on novel subject data. CONCLUSIONS: EMG based classification of high flexion postures may be possible within occupational settings when the model is first trained on sample data from a given individual. The developed algorithm may provide quantitative measures leading to a greater understanding of occupation specific postural requirements.

Download Full-text

MODIFIED CORRELATION WEIGHT K-NEAREST NEIGHBOR CLASSIFIER USING TRAINING DATASET CLEANING METHOD

Indonesian Journal of Physics ◽

10.5614/itb.ijp.2021.32.2.5 ◽

2021 ◽

Vol 32 (2) ◽

pp. 20-25

Author(s):

Efraim Kurniawan Dairo Kette

Keyword(s):

Nearest Neighbor ◽

Training Sample ◽

Classification Performance ◽

Training Data ◽

Training Dataset ◽

Classification Problems ◽

K Nearest Neighbor ◽

Neighborhood Structure ◽

Data Set ◽

Sample Data

In pattern recognition, the k-Nearest Neighbor (kNN) algorithm is the simplest non-parametric algorithm. Due to its simplicity, the model cases and the quality of the training data itself usually influence kNN algorithm classification performance. Therefore, this article proposes a sparse correlation weight model, combined with the Training Data Set Cleaning (TDC) method by Classification Ability Ranking (CAR) called the CAR classification method based on Coefficient-Weighted kNN (CAR-CWKNN) to improve kNN classifier performance. Correlation weight in Sparse Representation (SR) has been proven can increase classification accuracy. The SR can show the 'neighborhood' structure of the data, which is why it is very suitable for classification based on the Nearest Neighbor. The Classification Ability (CA) function is applied to classify the best training sample data based on rank in the cleaning stage. The Leave One Out (LV1) concept in the CA works by cleaning data that is considered likely to have the wrong classification results from the original training data, thereby reducing the influence of the training sample data quality on the kNN classification performance. The results of experiments with four public UCI data sets related to classification problems show that the CAR-CWKNN method provides better performance in terms of accuracy.

Download Full-text

Remote Sensing Data Classification Technique: A Review

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-2084 ◽

2021 ◽

pp. 67-75

Author(s):

Vaibhav A. Didore ◽

Dhananjay B. Nalawade ◽

Renuka B. Vaidya

Keyword(s):

Remote Sensing ◽

Minimum Distance ◽

Nearest Neighbor ◽

Remote Sensing Data ◽

Image Data ◽

Soil Science ◽

K Nearest Neighbor ◽

Maximum Likelihood Classification ◽

Sensing Technology ◽

Cover Class

Remote sensing is the prominent technology to study the ecology of the earth. Classification is a commonly used technique for quantitative analysis of remote sensing image data. It is based on the concept of segmentation of spectral regions into regions that can be associated with a soil cover class of interest for a particular application. As an advanced remote sensing tool, Hyperspectral remote sensing technology has been studied in many applications such as geology, topography, biology, soil science, hydrology, plants and ecosystems, atmospheric science. In this paper, Supervised Decision tree; Minimum distance; Maximum likelihood classification; Parallelepiped; K-nearest neighbor; and Unsupervised K-mean; & ISODATA algorithm are reviewed. This review is helpful to the researchers who are studying this emerging field i.e. HRS.

Download Full-text

Estimating Aboveground Biomass in Dense Hyrcanian Forests by the Use of Sentinel-2 Data

Forests ◽

10.3390/f13010104 ◽

2022 ◽

Vol 13 (1) ◽

pp. 104

Author(s):

Fardin Moradi ◽

Ali Asghar Darvishsefat ◽

Manizheh Rajab Pourrahmati ◽

Azade Deljouei ◽

Stelian Alexandru Borz

Keyword(s):

Aboveground Biomass ◽

Nearest Neighbor ◽

Vegetation Indices ◽

Field Measurements ◽

K Nearest Neighbor ◽

Remotely Sensed Data ◽

Northern Iran ◽

Hyrcanian Forests ◽

Sentinel 2

Due to the challenges brought by field measurements to estimate the aboveground biomass (AGB), such as the remote locations and difficulties in walking in these areas, more accurate and cost-effective methods are required, by the use of remote sensing. In this study, Sentinel-2 data were used for estimating the AGB in pure stands of Carpinus betulus (L., common hornbeam) located in the Hyrcanian forests, northern Iran. For this purpose, the diameter at breast height (DBH) of all trees thicker than 7.5 cm was measured in 55 square plots (45 × 45 m). In situ AGB was estimated using a local volume table and the specific density of wood. To estimate the AGB from remotely sensed data, parametric and nonparametric methods, including Multiple Regression (MR), Artificial Neural Network (ANN), k-Nearest Neighbor (kNN), and Random Forest (RF), were applied to a single image of the Sentinel-2, having as a reference the estimations produced by in situ measurements and their corresponding spectral values of the original spectral (B2, B3, B4, B5, B6, B7, B8, B8a, B11, and B12) and derived synthetic (IPVI, IRECI, GEMI, GNDVI, NDVI, DVI, PSSRA, and RVI) bands. Band 6 located in the red-edge region (0.740 nm) showed the highest correlation with AGB (r = −0.723). A comparison of the machine learning methods indicated that the ANN algorithm returned the best ABG-estimating performance (%RMSE = 19.9). This study demonstrates that simple vegetation indices extracted from Sentinel-2 multispectral imagery can provide good results in the AGB estimation of C. betulus trees of the Hyrcanian forests. The approach used in this study may be extended to similar areas located in temperate forests.

Download Full-text

A Remote Sensing Ship Recognition Method Based on Co-Training Model

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.713-715.2077 ◽

2015 ◽

Vol 713-715 ◽

pp. 2077-2080 ◽

Cited By ~ 1

Author(s):

Wei Ya Guo ◽

Xiao Fei Wang ◽

Xue Zhi Xia

Keyword(s):

Remote Sensing ◽

Nearest Neighbor ◽

Rough Set Theory ◽

Remote Sensing Data ◽

Training Model ◽

Support Vector ◽

Optical Remote Sensing ◽

K Nearest Neighbor ◽

Vector Machines ◽

Ship Recognition

Aiming at detecting sea targets efficiently, an approach using optical remote sensing data based on co-training model is proposed. Firstly, using size, texture, shape, moment invariants features and ratio codes, feature extraction is realized. Secondly, based on rough set theory, the common discernibility degree is used to select valid recognition features automatically. Finally, a co-training model for classification is introduced. Firstly, two diverse ruducts are generated, and then the model employs them to train two base classifiers on labeled dada, and makes two base classifiers teach each other on unlabeled data to boot their performance iteratively. Experimental results show the proposed approach can get better performance than K-Nearest Neighbor (KNN), Support Vector Machines (SVM), traditional hierarchical discriminant regression (HDR).

Download Full-text