scholarly journals A Hybrid Data Balancing Method for Classification of Imbalanced Training Data within Google Earth Engine: Case Studies from Mountainous Regions

2020 ◽  
Vol 12 (20) ◽  
pp. 3301 ◽  
Author(s):  
Amin Naboureh ◽  
Ainong Li ◽  
Jinhu Bian ◽  
Guangbin Lei ◽  
Meisam Amani

Distribution of Land Cover (LC) classes is mostly imbalanced with some majority LC classes dominating against minority classes in mountainous areas. Although standard Machine Learning (ML) classifiers can achieve high accuracies for majority classes, they largely fail to provide reasonable accuracies for minority classes. This is mainly due to the class imbalance problem. In this study, a hybrid data balancing method, called the Partial Random Over-Sampling and Random Under-Sampling (PROSRUS), was proposed to resolve the class imbalance issue. Unlike most data balancing techniques which seek to fully balance datasets, PROSRUS uses a partial balancing approach with hundreds of fractions for majority and minority classes to balance datasets. For this, time-series of Landsat-8 and SRTM topographic data along with various spectral indices and topographic data were used over three mountainous sites within the Google Earth Engine (GEE) cloud platform. It was observed that PROSRUS had better performance than several other balancing methods and increased the accuracy of minority classes without a reduction in overall classification accuracy. Furthermore, adopting complementary information, particularly topographic data, considerably increased the accuracy of minority classes in mountainous areas. Finally, the obtained results from PROSRUS indicated that every imbalanced dataset requires a specific fraction(s) for addressing the class imbalance problem, because different datasets contain various characteristics.

2020 ◽  
Vol 12 (21) ◽  
pp. 3484 ◽  
Author(s):  
Amin Naboureh ◽  
Hamid Ebrahimy ◽  
Mohsen Azadbakht ◽  
Jinhu Bian ◽  
Meisam Amani

Timely and accurate Land Cover (LC) information is required for various applications, such as climate change analysis and sustainable development. Although machine learning algorithms are most likely successful in LC mapping tasks, the class imbalance problem is known as a common challenge in this regard. This problem occurs during the training phase and reduces classification accuracy for infrequent and rare LC classes. To address this issue, this study proposes a new method by integrating random under-sampling of majority classes and an ensemble of Support Vector Machines, namely Random Under-sampling Ensemble of Support Vector Machines (RUESVMs). The performance of RUESVMs for LC classification was evaluated in Google Earth Engine (GEE) over two different case studies using Sentinel-2 time-series data and five well-known spectral indices, including the Normalized Difference Vegetation Index (NDVI), Green Normalized Difference Vegetation Index (GNDVI), Soil-Adjusted Vegetation Index (SAVI), Normalized Difference Built-up Index (NDBI), and Normalized Difference Water Index (NDWI). The performance of RUESVMs was also compared with the traditional SVM and combination of SVM with three benchmark data balancing techniques namely the Random Over-Sampling (ROS), Random Under-Sampling (RUS), and Synthetic Minority Over-sampling Technique (SMOTE). It was observed that the proposed method considerably improved the accuracy of LC classification, especially for the minority classes. After adopting RUESVMs, the overall accuracy of the generated LC map increased by approximately 4.95 percentage points, and this amount for the geometric mean of producer’s accuracies was almost 3.75 percentage points, in comparison to the most accurate data balancing method (i.e., SVM-SMOTE). Regarding the geometric mean of users’ accuracies, RUESVMs also outperformed the SVM-SMOTE method with an average increase of 6.45 percentage points.


2021 ◽  
Vol 13 (12) ◽  
pp. 2299
Author(s):  
Andrea Tassi ◽  
Daniela Gigante ◽  
Giuseppe Modica ◽  
Luciano Di Martino ◽  
Marco Vizzari

With the general objective of producing a 2018–2020 Land Use/Land Cover (LULC) map of the Maiella National Park (central Italy), useful for a future long-term LULC change analysis, this research aimed to develop a Landsat 8 (L8) data composition and classification process using Google Earth Engine (GEE). In this process, we compared two pixel-based (PB) and two object-based (OB) approaches, assessing the advantages of integrating the textural information in the PB approach. Moreover, we tested the possibility of using the L8 panchromatic band to improve the segmentation step and the object’s textural analysis of the OB approach and produce a 15-m resolution LULC map. After selecting the best time window of the year to compose the base data cube, we applied a cloud-filtering and a topography-correction process on the 32 available L8 surface reflectance images. On this basis, we calculated five spectral indices, some of them on an interannual basis, to account for vegetation seasonality. We added an elevation, an aspect, a slope layer, and the 2018 CORINE Land Cover classification layer to improve the available information. We applied the Gray-Level Co-Occurrence Matrix (GLCM) algorithm to calculate the image’s textural information and, in the OB approaches, the Simple Non-Iterative Clustering (SNIC) algorithm for the image segmentation step. We performed an initial RF optimization process finding the optimal number of decision trees through out-of-bag error analysis. We randomly distributed 1200 ground truth points and used 70% to train the RF classifier and 30% for the validation phase. This subdivision was randomly and recursively redefined to evaluate the performance of the tested approaches more robustly. The OB approaches performed better than the PB ones when using the 15 m L8 panchromatic band, while the addition of textural information did not improve the PB approach. Using the panchromatic band within an OB approach, we produced a detailed, 15-m resolution LULC map of the study area.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1906
Author(s):  
Jia-Zheng Jian ◽  
Tzong-Rong Ger ◽  
Han-Hua Lai ◽  
Chi-Ming Ku ◽  
Chiung-An Chen ◽  
...  

Diverse computer-aided diagnosis systems based on convolutional neural networks were applied to automate the detection of myocardial infarction (MI) found in electrocardiogram (ECG) for early diagnosis and prevention. However, issues, particularly overfitting and underfitting, were not being taken into account. In other words, it is unclear whether the network structure is too simple or complex. Toward this end, the proposed models were developed by starting with the simplest structure: a multi-lead features-concatenate narrow network (N-Net) in which only two convolutional layers were included in each lead branch. Additionally, multi-scale features-concatenate networks (MSN-Net) were also implemented where larger features were being extracted through pooling the signals. The best structure was obtained via tuning both the number of filters in the convolutional layers and the number of inputting signal scales. As a result, the N-Net reached a 95.76% accuracy in the MI detection task, whereas the MSN-Net reached an accuracy of 61.82% in the MI locating task. Both networks give a higher average accuracy and a significant difference of p < 0.001 evaluated by the U test compared with the state-of-the-art. The models are also smaller in size thus are suitable to fit in wearable devices for offline monitoring. In conclusion, testing throughout the simple and complex network structure is indispensable. However, the way of dealing with the class imbalance problem and the quality of the extracted features are yet to be discussed.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2803
Author(s):  
Rabeea Jaffari ◽  
Manzoor Ahmed Hashmani ◽  
Constantino Carlos Reyes-Aldasoro

The segmentation of power lines (PLs) from aerial images is a crucial task for the safe navigation of unmanned aerial vehicles (UAVs) operating at low altitudes. Despite the advances in deep learning-based approaches for PL segmentation, these models are still vulnerable to the class imbalance present in the data. The PLs occupy only a minimal portion (1–5%) of the aerial images as compared to the background region (95–99%). Generally, this class imbalance problem is addressed via the use of PL-specific detectors in conjunction with the popular class balanced cross entropy (BBCE) loss function. However, these PL-specific detectors do not work outside their application areas and a BBCE loss requires hyperparameter tuning for class-wise weights, which is not trivial. Moreover, the BBCE loss results in low dice scores and precision values and thus, fails to achieve an optimal trade-off between dice scores, model accuracy, and precision–recall values. In this work, we propose a generalized focal loss function based on the Matthews correlation coefficient (MCC) or the Phi coefficient to address the class imbalance problem in PL segmentation while utilizing a generic deep segmentation architecture. We evaluate our loss function by improving the vanilla U-Net model with an additional convolutional auxiliary classifier head (ACU-Net) for better learning and faster model convergence. The evaluation of two PL datasets, namely the Mendeley Power Line Dataset and the Power Line Dataset of Urban Scenes (PLDU), where PLs occupy around 1% and 2% of the aerial images area, respectively, reveal that our proposed loss function outperforms the popular BBCE loss by 16% in PL dice scores on both the datasets, 19% in precision and false detection rate (FDR) values for the Mendeley PL dataset and 15% in precision and FDR values for the PLDU with a minor degradation in the accuracy and recall values. Moreover, our proposed ACU-Net outperforms the baseline vanilla U-Net for the characteristic evaluation parameters in the range of 1–10% for both the PL datasets. Thus, our proposed loss function with ACU-Net achieves an optimal trade-off for the characteristic evaluation parameters without any bells and whistles. Our code is available at Github.


Author(s):  
Sayan Surya Shaw ◽  
Shameem Ahmed ◽  
Samir Malakar ◽  
Laura Garcia-Hernandez ◽  
Ajith Abraham ◽  
...  

AbstractMany real-life datasets are imbalanced in nature, which implies that the number of samples present in one class (minority class) is exceptionally less compared to the number of samples found in the other class (majority class). Hence, if we directly fit these datasets to a standard classifier for training, then it often overlooks the minority class samples while estimating class separating hyperplane(s) and as a result of that it missclassifies the minority class samples. To solve this problem, over the years, many researchers have followed different approaches. However the selection of the true representative samples from the majority class is still considered as an open research problem. A better solution for this problem would be helpful in many applications like fraud detection, disease prediction and text classification. Also, the recent studies show that it needs not only analyzing disproportion between classes, but also other difficulties rooted in the nature of different data and thereby it needs more flexible, self-adaptable, computationally efficient and real-time method for selection of majority class samples without loosing much of important data from it. Keeping this fact in mind, we have proposed a hybrid model constituting Particle Swarm Optimization (PSO), a popular swarm intelligence-based meta-heuristic algorithm, and Ring Theory (RT)-based Evolutionary Algorithm (RTEA), a recently proposed physics-based meta-heuristic algorithm. We have named the algorithm as RT-based PSO or in short RTPSO. RTPSO can select the most representative samples from the majority class as it takes advantage of the efficient exploration and the exploitation phases of its parent algorithms for strengthening the search process. We have used AdaBoost classifier to observe the final classification results of our model. The effectiveness of our proposed method has been evaluated on 15 standard real-life datasets having low to extreme imbalance ratio. The performance of the RTPSO has been compared with PSO, RTEA and other standard undersampling methods. The obtained results demonstrate the superiority of RTPSO over state-of-the-art class imbalance problem-solvers considered here for comparison. The source code of this work is available in https://github.com/Sayansurya/RTPSO_Class_imbalance.


Sign in / Sign up

Export Citation Format

Share Document