Joint Learning of Contour and Structure for Boundary-Preserved Building Extraction

Most of the existing approaches to the extraction of buildings from high-resolution orthoimages consider the problem as semantic segmentation, which extracts a pixel-wise mask for buildings and trains end-to-end with manually labeled building maps. However, as buildings are highly structured, such a strategy suffers several problems, such as blurred boundaries and the adhesion to close objects. To alleviate the above problems, we proposed a new strategy that also considers the contours of the buildings. Both the contours and structures of the buildings are jointly learned in the same network. The contours are learnable because the boundary of the mask labels of buildings implicitly represents the contours of buildings. We utilized the building contour information embedded in the labels to optimize the representation of building boundaries, then combined the contour information with multi-scale semantic features to enhance the robustness to image spatial resolution. The experimental results showed that the proposed method achieved 91.64%, 81.34%, and 74.51% intersection over union (IoU) on the WHU, Aerial, and Massachusetts building datasets, and outperformed the state-of-the-art (SOTA) methods. It significantly improved the accuracy of building boundaries, especially for the edges of adjacent buildings. The code is made publicly available.

Download Full-text

Semantic Segmentation Network for Surface Defect Detection of Automobile Wheel Hub Fusing High-Resolution Feature and Multi-Scale Feature

Applied Sciences ◽

10.3390/app112210508 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10508

Author(s):

Chaowei Tang ◽

Xinxin Feng ◽

Haotian Wen ◽

Xu Zhou ◽

Yanqing Shao ◽

...

Keyword(s):

High Resolution ◽

Defect Detection ◽

Automobile Industry ◽

Surface Defect ◽

Semantic Segmentation ◽

The Body ◽

Multi Scale ◽

Surface Defect Detection ◽

Edge Features ◽

Automobile Wheel

Surface defect detection of an automobile wheel hub is important to the automobile industry because these defects directly affect the safety and appearance of automobiles. At present, surface defect detection networks based on convolutional neural network use many pooling layers when extracting features, reducing the spatial resolution of features and preventing the accurate detection of the boundary of defects. On the basis of DeepLab v3+, we propose a semantic segmentation network for the surface defect detection of an automobile wheel hub. To solve the gridding effect of atrous convolution, the high-resolution network (HRNet) is used as the backbone network to extract high-resolution features, and the multi-scale features extracted by the Atrous Spatial Pyramid Pooling (ASPP) of DeepLab v3+ are superimposed. On the basis of the optical flow, we decouple the body and edge features of the defects to accurately detect the boundary of defects. Furthermore, in the upsampling process, a decoder can accurately obtain detection results by fusing the body, edge, and multi-scale features. We use supervised training to optimize these features. Experimental results on four defect datasets (i.e., wheels, magnetic tiles, fabrics, and welds) show that the proposed network has better F1 score, average precision, and intersection over union than SegNet, Unet, and DeepLab v3+, proving that the proposed network is effective for different defect detection scenarios.

Download Full-text

MSG-SR-Net: A Weakly Supervised Network Integrating Multi-Scale Generation and Super-Pixel Refinement for Building Extraction from High-Resolution Remotely Sensed Imageries

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2021.3137450 ◽

2021 ◽

pp. 1-1

Author(s):

Xin Yan ◽

Li Shen ◽

Jicheng Wang ◽

Xu Deng ◽

Zhilin Li

Keyword(s):

High Resolution ◽

Remotely Sensed ◽

Building Extraction ◽

Multi Scale ◽

Weakly Supervised

Download Full-text

Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6803 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11402-11409

Author(s):

Siqi Li ◽

Changqing Zou ◽

Yipeng Li ◽

Xibin Zhao ◽

Yue Gao

Keyword(s):

State Of The Art ◽

Semantic Segmentation ◽

Spatial Dimension ◽

Semantic Features ◽

Convolutional Network ◽

The Real ◽

Single View ◽

Depth Cues ◽

Semantic Scene ◽

3D Scene

This paper presents an end-to-end 3D convolutional network named attention-based multi-modal fusion network (AMFNet) for the semantic scene completion (SSC) task of inferring the occupancy and semantic labels of a volumetric 3D scene from single-view RGB-D images. Compared with previous methods which use only the semantic features extracted from RGB-D images, the proposed AMFNet learns to perform effective 3D scene completion and semantic segmentation simultaneously via leveraging the experience of inferring 2D semantic segmentation from RGB-D images as well as the reliable depth cues in spatial dimension. It is achieved by employing a multi-modal fusion architecture boosted from 2D semantic segmentation and a 3D semantic completion network empowered by residual attention blocks. We validate our method on both the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset and the results show that our method respectively achieves the gains of 2.5% and 2.6% on the synthetic SUNCG-RGBD dataset and the real NYUv2 dataset against the state-of-the-art method.

Download Full-text

Learning Dual Multi-Scale Manifold Ranking for Semantic Segmentation of High-Resolution Images

Remote Sensing ◽

10.3390/rs9050500 ◽

2017 ◽

Vol 9 (5) ◽

pp. 500 ◽

Cited By ~ 16

Author(s):

Mi Zhang ◽

Xiangyun Hu ◽

Like Zhao ◽

Ye Lv ◽

Min Luo ◽

...

Keyword(s):

High Resolution ◽

Semantic Segmentation ◽

Manifold Ranking ◽

Multi Scale ◽

High Resolution Images

Download Full-text

A Fast 4K Video Frame Interpolation Using a Multi-Scale Optical Flow Reconstruction Network

Symmetry ◽

10.3390/sym11101251 ◽

2019 ◽

Vol 11 (10) ◽

pp. 1251 ◽

Cited By ~ 2

Author(s):

Ahn ◽

Jeong ◽

Kim ◽

Kwon ◽

Yoo

Keyword(s):

High Resolution ◽

Optical Flow ◽

State Of The Art ◽

Interpolation Method ◽

Video Frame ◽

Frame Interpolation ◽

Multi Scale ◽

Reconstruction Scheme ◽

Flow Reconstruction

Recently, video frame interpolation research developed with a convolutional neural network has shown remarkable results. However, these methods demand huge amounts of memory and run time for high-resolution videos, and are unable to process a 4K frame in a single pass. In this paper, we propose a fast 4K video frame interpolation method, based upon a multi-scale optical flow reconstruction scheme. The proposed method predicts low resolution bi-directional optical flow, and reconstructs it into high resolution. We also proposed consistency and multi-scale smoothness loss to enhance the quality of the predicted optical flow. Furthermore, we use adversarial loss to make the interpolated frame more seamless and natural. We demonstrated that the proposed method outperforms the existing state-of-the-art methods in quantitative evaluation, while it runs up to 4.39× faster than those methods for 4K videos.

Download Full-text

A Multi-Scale Filtering Building Index for Building Extraction in Very High-Resolution Satellite Imagery

Remote Sensing ◽

10.3390/rs11050482 ◽

2019 ◽

Vol 11 (5) ◽

pp. 482 ◽

Cited By ~ 6

Author(s):

Qi Bi ◽

Kun Qin ◽

Han Zhang ◽

Ye Zhang ◽

Zhili Li ◽

...

Keyword(s):

Remote Sensing ◽

High Resolution ◽

Common Knowledge ◽

Remote Sensing Image ◽

Morphological Operations ◽

Building Extraction ◽

Multi Scale ◽

Training Samples ◽

Image Building ◽

Very High

Building extraction plays a significant role in many high-resolution remote sensing image applications. Many current building extraction methods need training samples while it is common knowledge that different samples often lead to different generalization ability. Morphological building index (MBI), representing morphological features of building regions in an index form, can effectively extract building regions especially in Chinese urban regions without any training samples and has drawn much attention. However, some problems like the heavy computation cost of multi-scale and multi-direction morphological operations still exist. In this paper, a multi-scale filtering building index (MFBI) is proposed in the hope of overcoming these drawbacks and dealing with the increasing noise in very high-resolution remote sensing image. The profile of multi-scale average filtering is averaged and normalized to generate this index. Moreover, to fully utilize the relatively little spectral information in very high-resolution remote sensing image, two scenarios to generate the multi-channel multi-scale filtering index (MMFBI) are proposed. While no high-resolution remote sensing image building extraction dataset is open to the public now and the current very high-resolution remote sensing image building extraction datasets usually contain samples from the Northern American or European regions, we offer a very high-resolution remote sensing image building extraction datasets in which the samples contain multiple building styles from multiple Chinese regions. The proposed MFBI and MMFBI outperform MBI and the currently used object based segmentation method on the dataset, with a high recall and F-score. Meanwhile, the computation time of MFBI and MBI is compared on three large-scale very high-resolution satellite image and the sensitivity analysis demonstrates the robustness of the proposed method.

Download Full-text

Resolution Dependence of European Precipitation in a State-of-the-Art Atmospheric General Circulation Model

Journal of Climate ◽

10.1175/jcli-d-14-00279.1 ◽

2015 ◽

Vol 28 (13) ◽

pp. 5134-5149 ◽

Cited By ~ 19

Author(s):

Ronald van Haren ◽

Reindert J. Haarsma ◽

Geert Jan Van Oldenborgh ◽

Wilco Hazeleger

Keyword(s):

High Resolution ◽

Spatial Resolution ◽

Vertical Velocity ◽

Moisture Transport ◽

State Of The Art ◽

Horizontal Resolution ◽

Accurate Representation ◽

Medium Resolution ◽

Resolution Model ◽

High Resolution Model

Abstract In this study, the authors investigate the effect of GCM spatial resolution on modeled precipitation over Europe. The objectives of the analysis are to determine whether climate models have sufficient spatial resolution to have an accurate representation of the storm tracks that affect precipitation. They investigate if there is a significant statistical difference in modeled precipitation between a medium-resolution (~112-km horizontal resolution) and a high-resolution (~25-km horizontal resolution) version of a state-of-the-art AGCM (EC-EARTH), if either model resolution gives a better representation of precipitation in the current climate, and what processes are responsible for the differences in modeled precipitation. The authors find that the high-resolution model gives a more accurate representation of northern and central European winter precipitation. The medium-resolution model has a larger positive bias in precipitation in most of the northern half of Europe. Storm tracks are better simulated in the high-resolution model, providing for a more accurate horizontal moisture transport and moisture convergence. Using a decomposition of the precipitation difference between the medium- and high-resolution model in a part related and a part unrelated to a difference in the distribution of vertical atmospheric velocity, the authors find that the smaller precipitation bias in central and northern Europe is largely unrelated to a difference in vertical velocity distribution. The smaller precipitation amount in these areas is in agreement with less moisture transport over this area in the high-resolution model. In areas with orography the change in vertical velocity distribution is found to be more important.

Download Full-text

AUTOMATIC OBJECT EXTRACTION FROM HIGH RESOLUTION AERIAL IMAGERY WITH SIMPLE LINEAR ITERATIVE CLUSTERING AND CONVOLUTIONAL NEURAL NETWORKS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w16-61-2019 ◽

2019 ◽

Vol XLII-2/W16 ◽

pp. 61-66 ◽

Cited By ~ 1

Author(s):

A. C. Carrilho ◽

M. Galo

Keyword(s):

High Resolution ◽

Object Detection ◽

Semantic Segmentation ◽

Aerial Images ◽

Machine Learning Techniques ◽

Object Extraction ◽

Multi Scale ◽

Learning Techniques ◽

Simple Linear Iterative Clustering ◽

Fast Region

<p><strong>Abstract.</strong> Recent advances in machine learning techniques for image classification have led to the development of robust approaches to both object detection and extraction. Traditional CNN architectures, such as LeNet, AlexNet and CaffeNet, usually use as input images of fixed sizes taken from objects and attempt to assign labels to those images. Another possible approach is the Fast Region-based CNN (or Fast R-CNN), which works by using two models: (i) a Region Proposal Network (RPN) which generates a set of potential Regions of Interest (RoI) in the image; and (ii) a traditional CNN which assigns labels to the proposed RoI. As an alternative, this study proposes an approach to automatic object extraction from aerial images similar to the Fast R-CNN architecture, the main difference being the use of the Simple Linear Iterative Clustering (SLIC) algorithm instead of an RPN to generate the RoI. The dataset used is composed of high-resolution aerial images and the following classes were considered: house, sport court, hangar, building, swimming pool, tree, and street/road. The proposed method can generate RoI with different sizes by running a multi-scale SLIC approach. The overall accuracy obtained for object detection was 89% and the major advantage is that the proposed method is capable of semantic segmentation by assigning a label to each selected RoI. Some of the problems encountered are related to object proximity, in which different instances appeared merged in the results.</p>

Download Full-text

MADNet 2.0: Pixel-Scale Topography Retrieval from Single-View Orbital Imagery of Mars Using Deep Learning

Remote Sensing ◽

10.3390/rs13214220 ◽

2021 ◽

Vol 13 (21) ◽

pp. 4220

Author(s):

Yu Tao ◽

Jan-Peter Muller ◽

Siting Xiong ◽

Susan J. Conway

Keyword(s):

Deep Learning ◽

High Resolution ◽

Spatial Resolution ◽

Large Scale ◽

Fine Scale ◽

Martian Surface ◽

Multi Scale ◽

Single Scale ◽

Imaging Science ◽

Pixel Scale

The High-Resolution Imaging Science Experiment (HiRISE) onboard the Mars Reconnaissance Orbiter provides remotely sensed imagery at the highest spatial resolution at 25–50 cm/pixel of the surface of Mars. However, due to the spatial resolution being so high, the total area covered by HiRISE targeted stereo acquisitions is very limited. This results in a lack of the availability of high-resolution digital terrain models (DTMs) which are better than 1 m/pixel. Such high-resolution DTMs have always been considered desirable for the international community of planetary scientists to carry out fine-scale geological analysis of the Martian surface. Recently, new deep learning-based techniques that are able to retrieve DTMs from single optical orbital imagery have been developed and applied to single HiRISE observational data. In this paper, we improve upon a previously developed single-image DTM estimation system called MADNet (1.0). We propose optimisations which we collectively call MADNet 2.0, which is based on a supervised image-to-height estimation network, multi-scale DTM reconstruction, and 3D co-alignment processes. In particular, we employ optimised single-scale inference and multi-scale reconstruction (in MADNet 2.0), instead of multi-scale inference and single-scale reconstruction (in MADNet 1.0), to produce more accurate large-scale topographic retrieval with boosted fine-scale resolution. We demonstrate the improvements of the MADNet 2.0 DTMs produced using HiRISE images, in comparison to the MADNet 1.0 DTMs and the published Planetary Data System (PDS) DTMs over the ExoMars Rosalind Franklin rover’s landing site at Oxia Planum. Qualitative and quantitative assessments suggest the proposed MADNet 2.0 system is capable of producing pixel-scale DTM retrieval at the same spatial resolution (25 cm/pixel) of the input HiRISE images.

Download Full-text

Low Resolution Information Also Matters: Learning Multi-Resolution Representations for Person Re-Identification

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/179 ◽

2021 ◽

Author(s):

Guoqing Zhang ◽

Yuhao Chen ◽

Weisi Lin ◽

Arun Chandran ◽

Xuan Jing

Keyword(s):

Feature Extraction ◽

High Resolution ◽

Feature Fusion ◽

State Of The Art ◽

Super Resolution ◽

Input Image ◽

Low Resolution ◽

Joint Learning ◽

Novel Method ◽

Valid Information

As a prevailing task in video surveillance and forensics field, person re-identification (re-ID) aims to match person images captured from non-overlapped cameras. In unconstrained scenarios, person images often suffer from the resolution mismatch problem, i.e., Cross-Resolution Person Re-ID. To overcome this problem, most existing methods restore low resolution (LR) images to high resolution (HR) by super-resolution (SR). However, they only focus on the HR feature extraction and ignore the valid information from original LR images. In this work, we explore the influence of resolutions on feature extraction and develop a novel method for cross-resolution person re-ID called Multi-Resolution Representations Joint Learning (MRJL). Our method consists of a Resolution Reconstruction Network (RRN) and a Dual Feature Fusion Network (DFFN). The RRN uses an input image to construct a HR version and a LR version with an encoder and two decoders, while the DFFN adopts a dual-branch structure to generate person representations from multi-resolution images. Comprehensive experiments on five benchmarks verify the superiority of the proposed MRJL over the relevent state-of-the-art methods.

Download Full-text