Multi-Supervised Encoder-Decoder for Image Forgery Localization

Image manipulation localization is one of the most challenging tasks because it pays more attention to tampering artifacts than to image content, which suggests that richer features need to be learned. Unlike many existing solutions, we employ a semantic segmentation network, named Multi-Supervised Encoder–Decoder (MSED), for the detection and localization of forgery images with arbitrary sizes and multiple types of manipulations without extra pre-training. In the basic encoder–decoder framework, the former encodes multi-scale contextual information by atrous convolution at multiple rates, while the latter captures sharper object boundaries by applying upsampling to gradually recover the spatial information. The additional multi-supervised module is designed to guide the training process by multiply adopting pixel-wise Binary Cross-Entropy (BCE) loss after the encoder and each upsampling. Experiments on four standard image manipulation datasets demonstrate that our MSED network achieves state-of-the-art performance compared to alternative baselines.

Download Full-text

Quantify pixel-level detection of dam surface crack using deep learning

Measurement Science and Technology ◽

10.1088/1361-6501/ac4b8d ◽

2022 ◽

Author(s):

Bo Chen ◽

Hua Zhang ◽

Yonglong Li ◽

Shuang Wang ◽

Huaifang Zhou ◽

...

Keyword(s):

Deep Learning ◽

Surface Crack ◽

Crack Detection ◽

State Of The Art ◽

Contextual Information ◽

Semantic Segmentation ◽

Quantitative Information ◽

Cross Entropy ◽

Detection Methods ◽

Water Conservancy

Abstract An increasing number of detection methods based on computer vision are applied to detect cracks in water conservancy infrastructure. However, most studies directly use existing feature extraction networks to extract cracks information, which are proposed for open-source datasets. As the cracks distribution and pixel features are different from these data, the extracted cracks information is incomplete. In this paper, a deep learning-based network for dam surface crack detection is proposed, which mainly addresses the semantic segmentation of cracks on the dam surface. Particularly, we design a shallow encoding network to extract features of crack images based on the statistical analysis of cracks. Further, to enhance the relevance of contextual information, we introduce an attention module into the decoding network. During the training, we use the sum of Cross-Entropy and Dice Loss as the loss function to overcome data imbalance. The quantitative information of cracks is extracted by the imaging principle after using morphological algorithms to extract the morphological features of the predicted result. We built a manual annotation dataset containing 1577 images to verify the effectiveness of the proposed method. This method achieves the state-of-the-art performance on our dataset. Specifically, the precision, recall, IoU, F1_measure, and accuracy achieve 90.81%, 81.54%, 75.23%, 85.93%, 99.76%, respectively. And the quantization error of cracks is less than 4%.

Download Full-text

EFFICIENT SEMANTIC SEGMENTATION OF MAN-MADE SCENES USING FULLY-CONNECTED CONDITIONAL RANDOM FIELD

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xli-b3-633-2016 ◽

2016 ◽

Vol XLI-B3 ◽

pp. 633-640

Author(s):

Weihao Li ◽

Michael Ying Yang

Keyword(s):

Random Field ◽

State Of The Art ◽

Conditional Random Field ◽

Contextual Information ◽

Mean Field ◽

Semantic Segmentation ◽

Gaussian Kernels ◽

Previous State ◽

Fully Connected ◽

Better Than

In this paper we explore semantic segmentation of man-made scenes using fully connected conditional random field (CRF). Images of man-made scenes display strong contextual dependencies in the spatial structures. Fully connected CRFs can model long-range connections within the image of man-made scenes and make use of contextual information of scene structures. The pairwise edge potentials of fully connected CRF models are defined by a linear combination of Gaussian kernels. Using filter-based mean field algorithm, the inference is very efficient. Our experimental results demonstrate that fully connected CRF performs better than previous state-of-the-art approaches on both eTRIMS dataset and LabelMeFacade dataset.

Download Full-text

A Hierarchical Feature Extraction Network for Fast Scene Segmentation

Sensors ◽

10.3390/s21227730 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7730

Author(s):

◽

Keyword(s):

Feature Extraction ◽

Spatial Information ◽

Contextual Information ◽

Semantic Segmentation ◽

Superior Performance ◽

Scene Segmentation ◽

Research Topics ◽

Time Performance ◽

Segmentation Accuracy ◽

Active Research

Semantic segmentation is one of the most active research topics in computer vision with the goal to assign dense semantic labels for all pixels in a given image. In this paper, we introduce HFEN (Hierarchical Feature Extraction Network), a lightweight network to reach a balance between inference speed and segmentation accuracy. Our architecture is based on an encoder-decoder framework. The input images are down-sampled through an efficient encoder to extract multi-layer features. Then the extracted features are fused via a decoder, where the global contextual information and spatial information are aggregated for final segmentations with real-time performance. Extensive experiments have been conducted on two standard benchmarks, Cityscapes and Camvid, where our network achieved superior performance on NVIDIA 2080Ti.

Download Full-text

Cascaded Residual Attention Enhanced Road Extraction from Remote Sensing Images

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010009 ◽

2021 ◽

Vol 11 (1) ◽

pp. 9

Author(s):

Shengfu Li ◽

Cheng Liao ◽

Yulin Ding ◽

Han Hu ◽

Yang Jia ◽

...

Keyword(s):

Remote Sensing ◽

Spatial Information ◽

Semantic Segmentation ◽

Road Extraction ◽

Remote Sensing Images ◽

Long Distance ◽

Features Fusion ◽

Multi Scale ◽

Boundary Recognition ◽

Benchmark Datasets

Efficient and accurate road extraction from remote sensing imagery is important for applications related to navigation and Geographic Information System updating. Existing data-driven methods based on semantic segmentation recognize roads from images pixel by pixel, which generally uses only local spatial information and causes issues of discontinuous extraction and jagged boundary recognition. To address these problems, we propose a cascaded attention-enhanced architecture to extract boundary-refined roads from remote sensing images. Our proposed architecture uses spatial attention residual blocks on multi-scale features to capture long-distance relations and introduce channel attention layers to optimize the multi-scale features fusion. Furthermore, a lightweight encoder-decoder network is connected to adaptively optimize the boundaries of the extracted roads. Our experiments showed that the proposed method outperformed existing methods and achieved state-of-the-art results on the Massachusetts dataset. In addition, our method achieved competitive results on more recent benchmark datasets, e.g., the DeepGlobe and the Huawei Cloud road extraction challenge.

Download Full-text

River Segmentation of Remote Sensing Images Based on Composite Attention Network

Complexity ◽

10.1155/2022/7750281 ◽

2022 ◽

Vol 2022 ◽

pp. 1-13

Author(s):

Zhiyong Fan ◽

Jianmin Hou ◽

Qiang Zang ◽

Yunjie Chen ◽

Fei Yan

Keyword(s):

Remote Sensing ◽

Semantic Segmentation ◽

Cross Entropy ◽

Important Research ◽

Dice Coefficient ◽

Remote Sensing Images ◽

Training Process ◽

Attention Network ◽

Evaluation Indexes ◽

Agricultural Planning

River segmentation of remote sensing images is of important research significance and application value for environmental monitoring, disaster warning, and agricultural planning in an area. In this study, we propose a river segmentation model in remote sensing images based on composite attention network to solve the problems of abundant river details in images and the interference of non-river information including bridges, shadows, and roads. To improve the segmentation efficiency, a composite attention mechanism is firstly introduced in the central region of the network to obtain the global feature dependence of river information. Next, in this study, we dynamically combine binary cross-entropy loss that is designed for pixel-wise segmentation and the Dice coefficient loss that measures the similarity of two segmentation objects into a weighted one to optimize the training process of the proposed segmentation network. The experimental results show that compared with other semantic segmentation networks, the evaluation indexes of the proposed method are higher than those of others, and the river segmentation effect of CoANet model is significantly improved. This method can segment rivers in remote sensing images more accurately and coherently, which can meet the needs of subsequent research.

Download Full-text

EHANet: An Effective Hierarchical Aggregation Network for Face Parsing

Applied Sciences ◽

10.3390/app10093135 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3135 ◽

Cited By ~ 3

Author(s):

Ling Luo ◽

Dingyu Xue ◽

Xinglong Feng

Keyword(s):

Neural Networks ◽

Real World ◽

State Of The Art ◽

Contextual Information ◽

Semantic Gap ◽

Deep Convolutional Neural Networks ◽

Multi Scale ◽

Hierarchical Aggregation ◽

Real World Applications ◽

Weighted Boundary

In recent years, benefiting from deep convolutional neural networks (DCNNs), face parsing has developed rapidly. However, it still has the following problems: (1) Existing state-of-the-art frameworks usually do not satisfy real-time while pursuing performance; (2) similar appearances cause incorrect pixel label assignments, especially in the boundary; (3) to promote multi-scale prediction, deep features and shallow features are used for fusion without considering the semantic gap between them. To overcome these drawbacks, we propose an effective and efficient hierarchical aggregation network called EHANet for fast and accurate face parsing. More specifically, we first propose a stage contextual attention mechanism (SCAM), which uses higher-level contextual information to re-encode the channel according to its importance. Secondly, a semantic gap compensation block (SGCB) is presented to ensure the effective aggregation of hierarchical information. Thirdly, the advantages of weighted boundary-aware loss effectively make up for the ambiguity of boundary semantics. Without any bells and whistles, combined with a lightweight backbone, we achieve outstanding results on both CelebAMask-HQ (78.19% mIoU) and Helen datasets (90.7% F1-score). Furthermore, our model can achieve 55 FPS on a single GTX 1080Ti card with 640 × 640 input and further reach over 300 FPS with a resolution of 256 × 256, which is suitable for real-world applications.

Download Full-text

Pyramid scene parsing network in 3D: Improving semantic segmentation of point clouds with multi-scale contextual information

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2019.06.010 ◽

2019 ◽

Vol 154 ◽

pp. 246-258 ◽

Cited By ~ 4

Author(s):

Hao Fang ◽

Florent Lafarge

Keyword(s):

Contextual Information ◽

Semantic Segmentation ◽

Point Clouds ◽

Multi Scale ◽

Scene Parsing

Download Full-text

MS-AFF: A Novel Semantic Segmentation Approach for Buried Object Based on Multi-scale Attentional Feature Fusion

10.21203/rs.3.rs-193757/v1 ◽

2021 ◽

Author(s):

Chao Lu ◽

Fansheng Chen ◽

Xiaofeng Su ◽

Dan Zeng

Keyword(s):

Deep Learning ◽

Spatial Information ◽

Feature Fusion ◽

Infrared Image ◽

Semantic Segmentation ◽

Target Object ◽

Infrared Images ◽

Feature Maps ◽

Multi Scale ◽

Visible Images

Abstract Infrared technology is a widely used in precision guidance and mine detection since it can capture the heat radiated outward from the target object. We use infrared (IR) thermography to get the infrared image of the buried obje cts. Compared to the visible images, infrared images present poor resolution, low contrast, and fuzzy visual effect, which make it difficult to segment the target object, specifically in the complex backgrounds. In this condition, traditional segmentation methods cannot perform well in infrared images since they are easily disturbed by the noise and non-target objects in the images. With the advance of deep convolutional neural network (CNN), the deep learning-based methods have made significant improvements in semantic segmentation task. However, few of them research Infrared image semantic segmentation, which is a more challenging scenario compared to visible images. Moreover, the lack of an Infrared image dataset is also a problem for current methods based on deep learning. We raise a multi-scale attentional feature fusion (MS-AFF) module for infrared image semantic segmentation to solve this problem. Precisely, we integrate a series of feature maps from different levels by an atrous spatial pyramid structure. In this way, the model can obtain rich representation ability on the infrared images. Besides, a global spatial information attention module is employed to let the model focus on the target region and reduce disturbance in infrared images' background. In addition, we propose an infrared segmentation dataset based on the infrared thermal imaging system. Extensive experiments conducted in the infrared image segmentation dataset show the superiority of our method.

Download Full-text

Adaptive Feature Pyramid Network to Predict Crisp Boundaries via NMS Layer and ODS F-Measure Loss Function

Information ◽

10.3390/info13010032 ◽

2022 ◽

Vol 13 (1) ◽

pp. 32

Author(s):

Gang Sun ◽

Hancheng Yu ◽

Xiangtao Jiang ◽

Mingkui Feng

Keyword(s):

Edge Detection ◽

Loss Function ◽

State Of The Art ◽

Cross Entropy ◽

Post Processing ◽

Multi Scale ◽

Feature Pyramid ◽

Multi Level ◽

Different Levels ◽

F Measure

Edge detection is one of the fundamental computer vision tasks. Recent methods for edge detection based on a convolutional neural network (CNN) typically employ the weighted cross-entropy loss. Their predicted results being thick and needing post-processing before calculating the optimal dataset scale (ODS) F-measure for evaluation. To achieve end-to-end training, we propose a non-maximum suppression layer (NMS) to obtain sharp boundaries without the need for post-processing. The ODS F-measure can be calculated based on these sharp boundaries. So, the ODS F-measure loss function is proposed to train the network. Besides, we propose an adaptive multi-level feature pyramid network (AFPN) to better fuse different levels of features. Furthermore, to enrich multi-scale features learned by AFPN, we introduce a pyramid context module (PCM) that includes dilated convolution to extract multi-scale features. Experimental results indicate that the proposed AFPN achieves state-of-the-art performance on the BSDS500 dataset (ODS F-score of 0.837) and the NYUDv2 dataset (ODS F-score of 0.780).

Download Full-text

Semantic Segmentation of High-Resolution Airborne Images With Dual-Stream DeepLabV3+

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi11010023 ◽

2021 ◽

Vol 11 (1) ◽

pp. 23

Author(s):

Ozgun Akcay ◽

Ahmet Cumhur Kinaci ◽

Emin Ozgur Avsar ◽

Umut Aydar

Keyword(s):

Data Augmentation ◽

Contextual Information ◽

Imbalanced Data ◽

Semantic Segmentation ◽

Multi Scale ◽

Segmentation Algorithms ◽

Geographic Datasets ◽

Stream Architecture ◽

The Given

In geospatial applications such as urban planning and land use management, automatic detection and classification of earth objects are essential and primary subjects. When the significant semantic segmentation algorithms are considered, DeepLabV3+ stands out as a state-of-the-art CNN. Although the DeepLabV3+ model is capable of extracting multi-scale contextual information, there is still a need for multi-stream architectural approaches and different training approaches of the model that can leverage multi-modal geographic datasets. In this study, a new end-to-end dual-stream architecture that considers geospatial imagery was developed based on the DeepLabV3+ architecture. As a result, the spectral datasets other than RGB provided increments in semantic segmentation accuracies when they were used as additional channels to height information. Furthermore, both the given data augmentation and Tversky loss function which is sensitive to imbalanced data accomplished better overall accuracies. Also, it has been shown that the new dual-stream architecture using Potsdam and Vaihingen datasets produced 88.87% and 87.39% overall semantic segmentation accuracies, respectively. Eventually, it was seen that enhancement of the traditional significant semantic segmentation networks has a great potential to provide higher model performances, whereas the contribution of geospatial data as the second stream to RGB to segmentation was explicitly shown.

Download Full-text