scholarly journals SEMANTIC SEGMENTATION AND DIFFERENCE EXTRACTION VIA TIME SERIES AERIAL VIDEO CAMERA AND ITS APPLICATION

Author(s):  
S. N. K. Amit ◽  
S. Saito ◽  
S. Sasaki ◽  
Y. Kiyoki ◽  
Y. Aoki

Google earth with high-resolution imagery basically takes months to process new images before online updates. It is a time consuming and slow process especially for post-disaster application. The objective of this research is to develop a fast and effective method of updating maps by detecting local differences occurred over different time series; where only region with differences will be updated. In our system, aerial images from Massachusetts’s road and building open datasets, Saitama district datasets are used as input images. Semantic segmentation is then applied to input images. Semantic segmentation is a pixel-wise classification of images by implementing deep neural network technique. Deep neural network technique is implemented due to being not only efficient in learning highly discriminative image features such as road, buildings etc., but also partially robust to incomplete and poorly registered target maps. Then, aerial images which contain semantic information are stored as database in 5D world map is set as ground truth images. This system is developed to visualise multimedia data in 5 dimensions; 3 dimensions as spatial dimensions, 1 dimension as temporal dimension, and 1 dimension as degenerated dimensions of semantic and colour combination dimension. Next, ground truth images chosen from database in 5D world map and a new aerial image with same spatial information but different time series are compared via difference extraction method. The map will only update where local changes had occurred. Hence, map updating will be cheaper, faster and more effective especially post-disaster application, by leaving unchanged region and only update changed region.

2019 ◽  
Vol 8 (12) ◽  
pp. 582 ◽  
Author(s):  
Gang Zhang ◽  
Tao Lei ◽  
Yi Cui ◽  
Ping Jiang

Semantic segmentation on high-resolution aerial images plays a significant role in many remote sensing applications. Although the Deep Convolutional Neural Network (DCNN) has shown great performance in this task, it still faces the following two challenges: intra-class heterogeneity and inter-class homogeneity. To overcome these two problems, a novel dual-path DCNN, which contains a spatial path and an edge path, is proposed for high-resolution aerial image segmentation. The spatial path, which combines the multi-level and global context features to encode the local and global information, is used to address the intra-class heterogeneity challenge. For inter-class homogeneity problem, a Holistically-nested Edge Detection (HED)-like edge path is employed to detect the semantic boundaries for the guidance of feature learning. Furthermore, we improve the computational efficiency of the network by employing the backbone of MobileNetV2. We enhance the performance of MobileNetV2 with two modifications: (1) replacing the standard convolution in the last four Bottleneck Residual Blocks (BRBs) with atrous convolution; and (2) removing the convolution stride of 2 in the first layer of BRBs 4 and 6. Experimental results on the ISPRS Vaihingen and Potsdam 2D labeling dataset show that the proposed DCNN achieved real-time inference speed on a single GPU card with better performance, compared with the state-of-the-art baselines.


Author(s):  
C. H. Yang ◽  
B. K. Kenduiywo ◽  
U. Soergel

Persistent Scatterer Interferometry (PSI) is a technique to detect a network of extracted persistent scatterer (PS) points which feature temporal phase stability and strong radar signal throughout time-series of SAR images. The small surface deformations on such PS points are estimated. PSI particularly works well in monitoring human settlements because regular substructures of man-made objects give rise to large number of PS points. If such structures and/or substructures substantially alter or even vanish due to big change like construction, their PS points are discarded without additional explorations during standard PSI procedure. Such rejected points are called big change (BC) points. On the other hand, incoherent change detection (ICD) relies on local comparison of multi-temporal images (e.g. image difference, image ratio) to highlight scene modifications of larger size rather than detail level. However, image noise inevitably degrades ICD accuracy. We propose a change detection approach based on PSI to synergize benefits of PSI and ICD. PS points are extracted by PSI procedure. A local change index is introduced to quantify probability of a big change for each point. We propose an automatic thresholding method adopting change index to extract BC points along with a clue of the period they emerge. In the end, PS ad BC points are integrated into a change detection image. Our method is tested at a site located around north of Berlin main station where steady, demolished, and erected building substructures are successfully detected. The results are consistent with ground truth derived from time-series of aerial images provided by Google Earth. In addition, we apply our technique for traffic infrastructure, business district, and sports playground monitoring.


Author(s):  
C. H. Yang ◽  
B. K. Kenduiywo ◽  
U. Soergel

Persistent Scatterer Interferometry (PSI) is a technique to detect a network of extracted persistent scatterer (PS) points which feature temporal phase stability and strong radar signal throughout time-series of SAR images. The small surface deformations on such PS points are estimated. PSI particularly works well in monitoring human settlements because regular substructures of man-made objects give rise to large number of PS points. If such structures and/or substructures substantially alter or even vanish due to big change like construction, their PS points are discarded without additional explorations during standard PSI procedure. Such rejected points are called big change (BC) points. On the other hand, incoherent change detection (ICD) relies on local comparison of multi-temporal images (e.g. image difference, image ratio) to highlight scene modifications of larger size rather than detail level. However, image noise inevitably degrades ICD accuracy. We propose a change detection approach based on PSI to synergize benefits of PSI and ICD. PS points are extracted by PSI procedure. A local change index is introduced to quantify probability of a big change for each point. We propose an automatic thresholding method adopting change index to extract BC points along with a clue of the period they emerge. In the end, PS ad BC points are integrated into a change detection image. Our method is tested at a site located around north of Berlin main station where steady, demolished, and erected building substructures are successfully detected. The results are consistent with ground truth derived from time-series of aerial images provided by Google Earth. In addition, we apply our technique for traffic infrastructure, business district, and sports playground monitoring.


2019 ◽  
Vol 11 (10) ◽  
pp. 1157 ◽  
Author(s):  
Jorge Fuentes-Pacheco ◽  
Juan Torres-Olivares ◽  
Edgar Roman-Rangel ◽  
Salvador Cervantes ◽  
Porfirio Juarez-Lopez ◽  
...  

Crop segmentation is an important task in Precision Agriculture, where the use of aerial robots with an on-board camera has contributed to the development of new solution alternatives. We address the problem of fig plant segmentation in top-view RGB (Red-Green-Blue) images of a crop grown under open-field difficult circumstances of complex lighting conditions and non-ideal crop maintenance practices defined by local farmers. We present a Convolutional Neural Network (CNN) with an encoder-decoder architecture that classifies each pixel as crop or non-crop using only raw colour images as input. Our approach achieves a mean accuracy of 93.85% despite the complexity of the background and a highly variable visual appearance of the leaves. We make available our CNN code to the research community, as well as the aerial image data set and a hand-made ground truth segmentation with pixel precision to facilitate the comparison among different algorithms.


Sensors ◽  
2021 ◽  
Vol 21 (11) ◽  
pp. 3813
Author(s):  
Athanasios Anagnostis ◽  
Aristotelis C. Tagarakis ◽  
Dimitrios Kateris ◽  
Vasileios Moysiadis ◽  
Claus Grøn Sørensen ◽  
...  

This study aimed to propose an approach for orchard trees segmentation using aerial images based on a deep learning convolutional neural network variant, namely the U-net network. The purpose was the automated detection and localization of the canopy of orchard trees under various conditions (i.e., different seasons, different tree ages, different levels of weed coverage). The implemented dataset was composed of images from three different walnut orchards. The achieved variability of the dataset resulted in obtaining images that fell under seven different use cases. The best-trained model achieved 91%, 90%, and 87% accuracy for training, validation, and testing, respectively. The trained model was also tested on never-before-seen orthomosaic images or orchards based on two methods (oversampling and undersampling) in order to tackle issues with out-of-the-field boundary transparent pixels from the image. Even though the training dataset did not contain orthomosaic images, it achieved performance levels that reached up to 99%, demonstrating the robustness of the proposed approach.


Author(s):  
H. Sun ◽  
Y. Ding ◽  
Y. Huang ◽  
G. Wang

Aerial Image records the large-range earth objects with the ever-improving spatial and radiometric resolution. It becomes a powerful tool for earth observation, land-coverage survey, geographical census, etc., and helps delineating the boundary of different kinds of objects on the earth both manually and automatically. In light of the geo-spatial correspondence between the pixel locations of aerial image and the spatial coordinates of ground objects, there is an increasing need of super-pixel segmentation and high-accuracy positioning of objects in aerial image. Besides the commercial software package of eCognition and ENVI, many algorithms have also been developed in the literature to segment objects of aerial images. But how to evaluate the segmentation results remains a challenge, especially in the context of the geo-spatial correspondence. The Geo-Hausdorff Distance (GHD) is proposed to measure the geo-spatial distance between the results of various object segmentation that can be done with the manual ground truth or with the automatic algorithms.Based on the early-breaking and random-sampling design, the GHD calculates the geographical Hausdorff distance with nearly-linear complexity. Segmentation results of several state-of-the-art algorithms, including those of the commercial packages, are evaluated with a diverse set of aerial images. They have different signal-to-noise ratio around the object boundaries and are hard to trace correctly even for human operators. The GHD value is analyzed to comprehensively measure the suitability of different object segmentation methods for aerial images of different spatial resolution. By critically assessing the strengths and limitations of the existing algorithms, the paper provides valuable insight and guideline for extensive research in automating object detection and classification of aerial image in the nation-wide geographic census. It is also promising for the optimal design of operational specification of remote sensing interpretation under the constraints of limited resource.


Author(s):  
W. Yuan ◽  
Z. Fan ◽  
X. Yuan ◽  
J. Gong ◽  
R. Shibasaki

Abstract. Dense image matching is essential to photogrammetry applications, including Digital Surface Model (DSM) generation, three dimensional (3D) reconstruction, and object detection and recognition. The development of an efficient and robust method for dense image matching has been one of the technical challenges due to high variations in illumination and ground features of aerial images of large areas. Nowadays, due to the development of deep learning technology, deep neural network-based algorithms outperform traditional methods on a variety of tasks such as object detection, semantic segmentation and stereo matching. The proposed network includes cost-volume computation, cost-volume aggregation, and disparity prediction. It starts with a pre-trained VGG-16 network as a backend and using the U-net architecture with nine layers for feature map extraction and a correlation layer for cost volume calculation, after that a guided filter based cost aggregation is adopted for cost volume filtering and finally the soft Argmax function is utilized for disparity prediction. The experimental conducted on a UAV dataset demonstrated that the proposed method achieved the RMSE (root mean square error) of the reprojection error better than 1 pixel in image coordinate and in-ground positioning accuracy within 2.5 ground sample distance. The comparison experiments on KITTI 2015 dataset shows the proposed unsupervised method even comparably with other supervised methods.


Author(s):  
D. Gritzner ◽  
J. Ostermann

Abstract. Modern machine learning, especially deep learning, which is used in a variety of applications, requires a lot of labelled data for model training. Having an insufficient amount of training examples leads to models which do not generalize well to new input instances. This is a particular significant problem for tasks involving aerial images: often training data is only available for a limited geographical area and a narrow time window, thus leading to models which perform poorly in different regions, at different times of day, or during different seasons. Domain adaptation can mitigate this issue by using labelled source domain training examples and unlabeled target domain images to train a model which performs well on both domains. Modern adversarial domain adaptation approaches use unpaired data. We propose using pairs of semantically similar images, i.e., whose segmentations are accurate predictions of each other, for improved model performance. In this paper we show that, as an upper limit based on ground truth, using semantically paired aerial images during training almost always increases model performance with an average improvement of 4.2% accuracy and .036 mean intersection-over-union (mIoU). Using a practical estimate of semantic similarity, we still achieve improvements in more than half of all cases, with average improvements of 2.5% accuracy and .017 mIoU in those cases.


Author(s):  
Georgios Orfanidis ◽  
Konstantinos Ioannidis ◽  
Konstantinos Avgerinakis ◽  
Stefanos Vrochidis ◽  
Ioannis Kompatsiaris

Sign in / Sign up

Export Citation Format

Share Document