scholarly journals Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China

2022 ◽  
Vol 14 (2) ◽  
pp. 265
Author(s):  
Yanjun Wang ◽  
Shaochun Li ◽  
Fei Teng ◽  
Yunhao Lin ◽  
Mengjie Wang ◽  
...  

Accurate roof information of buildings can be obtained from UAV high-resolution images. The large-scale accurate recognition of roof types (such as gabled, flat, hipped, complex and mono-pitched roofs) of rural buildings is crucial for rural planning and construction. At present, most UAV high-resolution optical images only have red, green and blue (RGB) band information, which aggravates the problems of inter-class similarity and intra-class variability of image features. Furthermore, the different roof types of rural buildings are complex, spatially scattered, and easily covered by vegetation, which in turn leads to the low accuracy of roof type identification by existing methods. In response to the above problems, this paper proposes a method for identifying roof types of complex rural buildings based on visible high-resolution remote sensing images from UAVs. First, the fusion of deep learning networks with different visual features is investigated to analyze the effect of the different feature combinations of the visible difference vegetation index (VDVI) and Sobel edge detection features and UAV visible images on model recognition of rural building roof types. Secondly, an improved Mask R-CNN model is proposed to learn more complex features of different types of images of building roofs by using the ResNet152 feature extraction network with migration learning. After we obtained roof type recognition results in two test areas, we evaluated the accuracy of the results using the confusion matrix and obtained the following conclusions: (1) the model with RGB images incorporating Sobel edge detection features has the highest accuracy and enables the model to recognize more and more accurately the roof types of different morphological rural buildings, and the model recognition accuracy (Kappa coefficient (KC)) compared to that of RGB images is on average improved by 0.115; (2) compared with the original Mask R-CNN, U-Net, DeeplabV3 and PSPNet deep learning models, the improved Mask R-CNN model has the highest accuracy in recognizing the roof types of rural buildings, with F1-score, KC and OA averaging 0.777, 0.821 and 0.905, respectively. The method can obtain clear and accurate profiles and types of rural building roofs, and can be extended for green roof suitability evaluation, rooftop solar potential assessment, and other building roof surveys, management and planning.

2021 ◽  
Vol 13 (11) ◽  
pp. 2187
Author(s):  
Liegang Xia ◽  
Xiongbo Zhang ◽  
Junxia Zhang ◽  
Haiping Yang ◽  
Tingting Chen

The automated detection of buildings in remote sensing images enables understanding the distribution information of buildings, which is indispensable for many geographic and social applications, such as urban planning, change monitoring and population estimation. The performance of deep learning in images often depends on a large number of manually labeled samples, the production of which is time-consuming and expensive. Thus, this study focuses on reducing the number of labeled samples used and proposing a semi-supervised deep learning approach based on an edge detection network (SDLED), which is the first to introduce semi-supervised learning to the edge detection neural network for extracting building roof boundaries from high-resolution remote sensing images. This approach uses a small number of labeled samples and abundant unlabeled images for joint training. An expert-level semantic edge segmentation model is trained based on labeled samples, which guides unlabeled images to generate pseudo-labels automatically. The inaccurate label sets and manually labeled samples are used to update the semantic edge model together. Particularly, we modified the semantic segmentation network D-LinkNet to obtain high-quality pseudo-labels. Specifically, the main network architecture of D-LinkNet is retained while the multi-scale fusion is added in its second half to improve its performance on edge detection. The SDLED was tested on high-spatial-resolution remote sensing images taken from Google Earth. Results show that the SDLED performs better than the fully supervised method. Moreover, when the trained models were used to predict buildings in the neighboring counties, our approach was superior to the supervised way, with line IoU improvement of at least 6.47% and F1 score improvement of at least 7.49%.


2020 ◽  
Vol 10 (12) ◽  
pp. 4282
Author(s):  
Ghada Zamzmi ◽  
Sivaramakrishnan Rajaraman ◽  
Sameer Antani

Medical images are acquired at different resolutions based on clinical goals or available technology. In general, however, high-resolution images with fine structural details are preferred for visual task analysis. Recognizing this significance, several deep learning networks have been proposed to enhance medical images for reliable automated interpretation. These deep networks are often computationally complex and require a massive number of parameters, which restrict them to highly capable computing platforms with large memory banks. In this paper, we propose an efficient deep learning approach, called Hydra, which simultaneously reduces computational complexity and improves performance. The Hydra consists of a trunk and several computing heads. The trunk is a super-resolution model that learns the mapping from low-resolution to high-resolution images. It has a simple architecture that is trained using multiple scales at once to minimize a proposed learning-loss function. We also propose to append multiple task-specific heads to the trained Hydra trunk for simultaneous learning of multiple visual tasks in medical images. The Hydra is evaluated on publicly available chest X-ray image collections to perform image enhancement, lung segmentation, and abnormality classification. Our experimental results support our claims and demonstrate that the proposed approach can improve the performance of super-resolution and visual task analysis in medical images at a remarkably reduced computational cost.


Author(s):  
Wei Wang ◽  
Rongyuan Liu ◽  
Huiyun Yang ◽  
Ping Zhou ◽  
Xiangwen Zhang ◽  
...  

2022 ◽  
Author(s):  
Yaxing Li ◽  
Xiaofeng Jia ◽  
Xinming Wu ◽  
Zhicheng Geng

<p>Reverse time migration (RTM) is a technique used to obtain high-resolution images of underground reflectors; however, this method is computationally intensive when dealing with large amounts of seismic data. Multi-source RTM can significantly reduce the computational cost by processing multiple shots simultaneously. However, multi-source-based methods frequently result in crosstalk artifacts in the migrated images, causing serious interference in the imaging signals. Plane-wave migration, as a mainstream multi-source method, can yield migrated images with plane waves in different angles by implementing phase encoding of the source and receiver wavefields; however, this method frequently requires a trade-off between computational efficiency and imaging quality. We propose a method based on deep learning for removing crosstalk artifacts and enhancing the image quality of plane-wave migration images. We designed a convolutional neural network that accepts an input of seven plane-wave images at different angles and outputs a clear and enhanced image. We built 505 1024×256 velocity models, and employed each of them using plane-wave migration to produce raw images at 0°, ±20°, ±40°, and ±60° as input of the network. Labels are high-resolution images computed from the corresponding reflectivity models by convolving with a Ricker wavelet. Random sub-images with a size of 512×128 were used for training the network. Numerical examples demonstrated the effectiveness of the trained network in crosstalk removal and imaging enhancement. The proposed method is superior to both the conventional RTM and plane-wave RTM (PWRTM) in imaging resolution. Moreover, the proposed method requires only seven migrations, significantly improving the computational efficiency. In the numerical examples, the processing time required by our method was approximately 1.6% and 10% of that required by RTM and PWRTM, respectively.</p>


2020 ◽  
Vol 13 (1) ◽  
pp. 54
Author(s):  
Leonardo Josoé Biffi ◽  
Edson Mitishita ◽  
Veraldo Liesenberg ◽  
Anderson Aparecido dos Santos ◽  
Diogo Nunes Gonçalves ◽  
...  

In recent years, many agriculture-related problems have been evaluated with the integration of artificial intelligence techniques and remote sensing systems. Specifically, in fruit detection problems, several recent works were developed using Deep Learning (DL) methods applied in images acquired in different acquisition levels. However, the increasing use of anti-hail plastic net cover in commercial orchards highlights the importance of terrestrial remote sensing systems. Apples are one of the most highly-challenging fruits to be detected in images, mainly because of the target occlusion problem occurrence. Additionally, the introduction of high-density apple tree orchards makes the identification of single fruits a real challenge. To support farmers to detect apple fruits efficiently, this paper presents an approach based on the Adaptive Training Sample Selection (ATSS) deep learning method applied to close-range and low-cost terrestrial RGB images. The correct identification supports apple production forecasting and gives local producers a better idea of forthcoming management practices. The main advantage of the ATSS method is that only the center point of the objects is labeled, which is much more practicable and realistic than bounding-box annotations in heavily dense fruit orchards. Additionally, we evaluated other object detection methods such as RetinaNet, Libra Regions with Convolutional Neural Network (R-CNN), Cascade R-CNN, Faster R-CNN, Feature Selective Anchor-Free (FSAF), and High-Resolution Network (HRNet). The study area is a highly-dense apple orchard consisting of Fuji Suprema apple fruits (Malus domestica Borkh) located in a smallholder farm in the state of Santa Catarina (southern Brazil). A total of 398 terrestrial images were taken nearly perpendicularly in front of the trees by a professional camera, assuring both a good vertical coverage of the apple trees in terms of heights and overlapping between picture frames. After, the high-resolution RGB images were divided into several patches for helping the detection of small and/or occluded apples. A total of 3119, 840, and 2010 patches were used for training, validation, and testing, respectively. Moreover, the proposed method’s generalization capability was assessed by applying simulated image corruptions to the test set images with different severity levels, including noise, blurs, weather, and digital processing. Experiments were also conducted by varying the bounding box size (80, 100, 120, 140, 160, and 180 pixels) in the image original for the proposed approach. Our results showed that the ATSS-based method slightly outperformed all other deep learning methods, between 2.4% and 0.3%. Also, we verified that the best result was obtained with a bounding box size of 160 × 160 pixels. The proposed method was robust regarding most of the corruption, except for snow, frost, and fog weather conditions. Finally, a benchmark of the reported dataset is also generated and publicly available.


2022 ◽  
Author(s):  
Yaxing Li ◽  
Xiaofeng Jia ◽  
Xinming Wu ◽  
Zhicheng Geng

<p>Reverse time migration (RTM) is a technique used to obtain high-resolution images of underground reflectors; however, this method is computationally intensive when dealing with large amounts of seismic data. Multi-source RTM can significantly reduce the computational cost by processing multiple shots simultaneously. However, multi-source-based methods frequently result in crosstalk artifacts in the migrated images, causing serious interference in the imaging signals. Plane-wave migration, as a mainstream multi-source method, can yield migrated images with plane waves in different angles by implementing phase encoding of the source and receiver wavefields; however, this method frequently requires a trade-off between computational efficiency and imaging quality. We propose a method based on deep learning for removing crosstalk artifacts and enhancing the image quality of plane-wave migration images. We designed a convolutional neural network that accepts an input of seven plane-wave images at different angles and outputs a clear and enhanced image. We built 505 1024×256 velocity models, and employed each of them using plane-wave migration to produce raw images at 0°, ±20°, ±40°, and ±60° as input of the network. Labels are high-resolution images computed from the corresponding reflectivity models by convolving with a Ricker wavelet. Random sub-images with a size of 512×128 were used for training the network. Numerical examples demonstrated the effectiveness of the trained network in crosstalk removal and imaging enhancement. The proposed method is superior to both the conventional RTM and plane-wave RTM (PWRTM) in imaging resolution. Moreover, the proposed method requires only seven migrations, significantly improving the computational efficiency. In the numerical examples, the processing time required by our method was approximately 1.6% and 10% of that required by RTM and PWRTM, respectively.</p>


2021 ◽  
Author(s):  
Etienne David ◽  
Gaëtan Daubige ◽  
François Joudelat ◽  
Philippe Burger ◽  
Alexis Comar ◽  
...  

1AbstractPlants density is a key information on crop growth. Usually done manually, this task can beneficiate from advances in image analysis technics. Automated detection of individual plants in images is a key step to estimate this density. To develop and evaluate dedicated processing technics, high resolution RGB images were acquired from UAVs during several years and experiments over maize, sugar beet and sunflower crops at early stages. A total of 16247 plants have been labelled interactively. We compared the performances of handcrafted method (HC) to those of deep-learning (DL). HC method consists in segmenting the image into green and background pixels, identifying rows, then objects corresponding to plants thanks to knowledge of the sowing pattern as prior information. DL method is based on the Faster RCNN model trained over 2/3 of the images selected to represent a good balance between plant development stage and sessions. One model is trained for each crop.Results show that DL generally outperforms HC, particularly for maize and sunflower crops. The quality of images appears mandatory for HC methods where image blur and complex background induce difficulties for the segmentation step. Performances of DL methods are also limited by image quality as well as the presence of weeds. An hybrid method (HY) was proposed to eliminate weeds between the rows using the rules used for the HC method. HY improves slightly DL performances in the case of high weed infestation. A significant level of variability of plant detection performances is observed between the several experiments. This was explained by the variability of image acquisition conditions including illumination, plant development stage, background complexity and weed infestation. We tested an active learning approach where few images corresponding to the conditions of the testing dataset were complementing the training dataset for DL. Results show a drastic increase of performances for all crops, with relative RMSE below 5% for the estimation of the plant density.


Sign in / Sign up

Export Citation Format

Share Document