Comparison of Classical Methods and Mask R-CNN for Automatic Tree Detection and Mapping Using UAV Imagery

Detecting and mapping individual trees accurately and automatically from remote sensing images is of great significance for precision forest management. Many algorithms, including classical methods and deep learning techniques, have been developed and applied for tree crown detection from remote sensing images. However, few studies have evaluated the accuracy of different individual tree detection (ITD) algorithms and their data and processing requirements. This study explored the accuracy of ITD using local maxima (LM) algorithm, marker-controlled watershed segmentation (MCWS), and Mask Region-based Convolutional Neural Networks (Mask R-CNN) in a young plantation forest with different test images. Manually delineated tree crowns from UAV imagery were used for accuracy assessment of the three methods, followed by an evaluation of the data processing and application requirements for three methods to detect individual trees. Overall, Mask R-CNN can best use the information in multi-band input images for detecting individual trees. The results showed that the Mask R-CNN model with the multi-band combination produced higher accuracy than the model with a single-band image, and the RGB band combination achieved the highest accuracy for ITD (F1 score = 94.68%). Moreover, the Mask R-CNN models with multi-band images are capable of providing higher accuracies for ITD than the LM and MCWS algorithms. The LM algorithm and MCWS algorithm also achieved promising accuracies for ITD when the canopy height model (CHM) was used as the test image (F1 score = 87.86% for LM algorithm, F1 score = 85.92% for MCWS algorithm). The LM and MCWS algorithms are easy to use and lower computer computational requirements, but they are unable to identify tree species and are limited by algorithm parameters, which need to be adjusted for each classification. It is highlighted that the application of deep learning with its end-to-end-learning approach is very efficient and capable of deriving the information from multi-layer images, but an additional training set is needed for model training, robust computer resources are required, and a large number of accurate training samples are necessary. This study provides valuable information for forestry practitioners to select an optimal approach for detecting individual trees.

Download Full-text

Deep Learning Based Oil Palm Tree Detection and Counting for High-Resolution Remote Sensing Images

Remote Sensing ◽

10.3390/rs9010022 ◽

2016 ◽

Vol 9 (1) ◽

pp. 22 ◽

Cited By ~ 128

Author(s):

Weijia Li ◽

Haohuan Fu ◽

Le Yu ◽

Arthur Cracknell

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Oil Palm ◽

Palm Tree ◽

Remote Sensing Images ◽

Tree Detection ◽

Detection And Counting

Download Full-text

Convolutional Neural Network for the Semantic Segmentation of Remote Sensing Images

Mobile Networks and Applications ◽

10.1007/s11036-020-01703-3 ◽

2021 ◽

Vol 26 (1) ◽

pp. 200-215

Author(s):

Muhammad Alam ◽

Jian-Feng Wang ◽

Cong Guangpei ◽

LV Yunrong ◽

Yuanfang Chen

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Neural Networks ◽

Image Processing ◽

Deep Learning ◽

Semantic Segmentation ◽

Natural Scene ◽

Remote Sensing Images ◽

Advantages And Disadvantages ◽

Target Segmentation

AbstractIn recent years, the success of deep learning in natural scene image processing boosted its application in the analysis of remote sensing images. In this paper, we applied Convolutional Neural Networks (CNN) on the semantic segmentation of remote sensing images. We improve the Encoder- Decoder CNN structure SegNet with index pooling and U-net to make them suitable for multi-targets semantic segmentation of remote sensing images. The results show that these two models have their own advantages and disadvantages on the segmentation of different objects. In addition, we propose an integrated algorithm that integrates these two models. Experimental results show that the presented integrated algorithm can exploite the advantages of both the models for multi-target segmentation and achieve a better segmentation compared to these two models.

Download Full-text

Self-Attention in Reconstruction Bias U-Net for Semantic Segmentation of Building Rooftops in Optical Remote Sensing Images

Remote Sensing ◽

10.3390/rs13132524 ◽

2021 ◽

Vol 13 (13) ◽

pp. 2524

Author(s):

Ziyi Chen ◽

Dilong Li ◽

Wentao Fan ◽

Haiyan Guan ◽

Cheng Wang ◽

...

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Semantic Segmentation ◽

Extraction Methods ◽

The Self ◽

Optical Remote Sensing ◽

Building Extraction ◽

Learning Models ◽

Remote Sensing Images ◽

Segmentation Methods

Deep learning models have brought great breakthroughs in building extraction from high-resolution optical remote-sensing images. Among recent research, the self-attention module has called up a storm in many fields, including building extraction. However, most current deep learning models loading with the self-attention module still lose sight of the reconstruction bias’s effectiveness. Through tipping the balance between the abilities of encoding and decoding, i.e., making the decoding network be much more complex than the encoding network, the semantic segmentation ability will be reinforced. To remedy the research weakness in combing self-attention and reconstruction-bias modules for building extraction, this paper presents a U-Net architecture that combines self-attention and reconstruction-bias modules. In the encoding part, a self-attention module is added to learn the attention weights of the inputs. Through the self-attention module, the network will pay more attention to positions where there may be salient regions. In the decoding part, multiple large convolutional up-sampling operations are used for increasing the reconstruction ability. We test our model on two open available datasets: the WHU and Massachusetts Building datasets. We achieve IoU scores of 89.39% and 73.49% for the WHU and Massachusetts Building datasets, respectively. Compared with several recently famous semantic segmentation methods and representative building extraction methods, our method’s results are satisfactory.

Download Full-text

Scribble-Based Weakly Supervised Deep Learning for Road Surface Extraction From Remote Sensing Images

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2021.3061213 ◽

2021 ◽

pp. 1-12

Author(s):

Yao Wei ◽

Shunping Ji

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Road Surface ◽

Remote Sensing Images ◽

Surface Extraction ◽

Weakly Supervised

Download Full-text

Analysis of Encoder-Decoder Based Deep Learning Architectures for Semantic Segmentation in Remote Sensing Images

Advances in Intelligent Systems and Computing - Intelligent Systems Design and Applications ◽

10.1007/978-3-030-16660-1_33 ◽

2019 ◽

pp. 332-341

Author(s):

R. Sivagami ◽

J. Srihari ◽

K. S. Ravichandran

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Semantic Segmentation ◽

Remote Sensing Images ◽

Learning Architectures

Download Full-text

Performance Evaluation of Single-Label and Multi-Label Remote Sensing Image Retrieval Using a Dense Labeling Dataset

Remote Sensing ◽

10.3390/rs10060964 ◽

2018 ◽

Vol 10 (6) ◽

pp. 964 ◽

Cited By ~ 34

Author(s):

Zhenfeng Shao ◽

Ke Yang ◽

Weixun Zhou

Keyword(s):

Remote Sensing ◽

Performance Evaluation ◽

Deep Learning ◽

Image Retrieval ◽

Semantic Segmentation ◽

Semantic Content ◽

Remote Sensing Image ◽

Remote Sensing Images ◽

Benchmark Datasets ◽

Feature Based

Benchmark datasets are essential for developing and evaluating remote sensing image retrieval (RSIR) approaches. However, most of the existing datasets are single-labeled, with each image in these datasets being annotated by a single label representing the most significant semantic content of the image. This is sufficient for simple problems, such as distinguishing between a building and a beach, but multiple labels and sometimes even dense (pixel) labels are required for more complex problems, such as RSIR and semantic segmentation.We therefore extended the existing multi-labeled dataset collected for multi-label RSIR and presented a dense labeling remote sensing dataset termed "DLRSD". DLRSD contained a total of 17 classes, and the pixels of each image were assigned with 17 pre-defined labels. We used DLRSD to evaluate the performance of RSIR methods ranging from traditional handcrafted feature-based methods to deep learning-based ones. More specifically, we evaluated the performances of RSIR methods from both single-label and multi-label perspectives. These results demonstrated the advantages of multiple labels over single labels for interpreting complex remote sensing images. DLRSD provided the literature a benchmark for RSIR and other pixel-based problems such as semantic segmentation.

Download Full-text

Deep Learning Based Fossil-Fuel Power Plant Monitoring in High Resolution Remote Sensing Images: A Comparative Study

Remote Sensing ◽

10.3390/rs11091117 ◽

2019 ◽

Vol 11 (9) ◽

pp. 1117 ◽

Cited By ~ 4

Author(s):

Haopeng Zhang ◽

Qin Deng

Keyword(s):

Remote Sensing ◽

Deep Learning ◽

Power Plants ◽

North China ◽

Fossil Fuel ◽

Environment Monitoring ◽

Remote Sensing Images ◽

Plant Monitoring ◽

Working Status ◽

Fossil Fuel Power Plants

The frequent hazy weather with air pollution in North China has aroused wide attention in the past few years. One of the most important pollution resource is the anthropogenic emission by fossil-fuel power plants. To relieve the pollution and assist urban environment monitoring, it is necessary to continuously monitor the working status of power plants. Satellite or airborne remote sensing provides high quality data for such tasks. In this paper, we design a power plant monitoring framework based on deep learning to automatically detect the power plants and determine their working status in high resolution remote sensing images (RSIs). To this end, we collected a dataset named BUAA-FFPP60 containing RSIs of over 60 fossil-fuel power plants in the Beijing-Tianjin-Hebei region in North China, which covers about 123 km 2 of an urban area. We compared eight state-of-the-art deep learning models and comprehensively analyzed their performance on accuracy, speed, and hardware cost. Experimental results illustrate that our deep learning based framework can effectively detect the fossil-fuel power plants and determine their working status with mean average precision up to 0.8273, showing good potential for urban environment monitoring.

Download Full-text