A Novel Multi-Focus Image Fusion Network with U-Shape Structure

Multi-focus image fusion has become a very practical image processing task. It uses multiple images focused on various depth planes to create an all-in-focus image. Although extensive studies have been produced, the performance of existing methods is still limited by the inaccurate detection of the focus regions for fusion. Therefore, in this paper, we proposed a novel U-shape network which can generate an accurate decision map for the multi-focus image fusion. The Siamese encoder of our U-shape network can preserve the low-level cues with rich spatial details and high-level semantic information from the source images separately. Moreover, we introduce the ResBlocks to expand the receptive field, which can enhance the ability of our network to distinguish between focus and defocus regions. Moreover, in the bridge stage between the encoder and decoder, the spatial pyramid pooling is adopted as a global perception fusion module to capture sufficient context information for the learning of the decision map. Finally, we use a hybrid loss that combines the binary cross-entropy loss and the structural similarity loss for supervision. Extensive experiments have demonstrated that the proposed method can achieve the state-of-the-art performance.

Download Full-text

SeDAR: Reading Floorplans Like a Human—Using Deep Learning to Enable Human-Inspired Localisation

International Journal of Computer Vision ◽

10.1007/s11263-019-01239-4 ◽

2019 ◽

Vol 128 (5) ◽

pp. 1286-1310 ◽

Cited By ~ 3

Author(s):

Oscar Mendez ◽

Simon Hadfield ◽

Nicolas Pugeault ◽

Richard Bowden

Keyword(s):

Deep Learning ◽

Semantic Information ◽

State Of The Art ◽

Depth Information ◽

Semantic Maps ◽

Novel Method ◽

Rgb Images ◽

High Level ◽

Robotic Tasks ◽

And Robotics

Abstract The use of human-level semantic information to aid robotic tasks has recently become an important area for both Computer Vision and Robotics. This has been enabled by advances in Deep Learning that allow consistent and robust semantic understanding. Leveraging this semantic vision of the world has allowed human-level understanding to naturally emerge from many different approaches. Particularly, the use of semantic information to aid in localisation and reconstruction has been at the forefront of both fields. Like robots, humans also require the ability to localise within a structure. To aid this, humans have designed high-level semantic maps of our structures called floorplans. We are extremely good at localising in them, even with limited access to the depth information used by robots. This is because we focus on the distribution of semantic elements, rather than geometric ones. Evidence of this is that humans are normally able to localise in a floorplan that has not been scaled properly. In order to grant this ability to robots, it is necessary to use localisation approaches that leverage the same semantic information humans use. In this paper, we present a novel method for semantically enabled global localisation. Our approach relies on the semantic labels present in the floorplan. Deep Learning is leveraged to extract semantic labels from RGB images, which are compared to the floorplan for localisation. While our approach is able to use range measurements if available, we demonstrate that they are unnecessary as we can achieve results comparable to state-of-the-art without them.

Download Full-text

Pedestrian Attributes Recognition in Surveillance Scenarios Using Multi-Task Lightweight Convolutional Neural Network

Applied Sciences ◽

10.3390/app9194182 ◽

2019 ◽

Vol 9 (19) ◽

pp. 4182 ◽

Cited By ~ 2

Author(s):

Pu Yan ◽

Li Zhuo ◽

Jiafeng Li ◽

Hui Zhang ◽

Jing Zhang

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

State Of The Art ◽

Cross Entropy ◽

Semantic Features ◽

Recognition Method ◽

Relationship Model ◽

High Level ◽

Fully Connected ◽

Video Structuring

Pedestrian attributes (such as gender, age, hairstyle, and clothing) can effectively represent the appearance of pedestrians. These are high-level semantic features that are robust to illumination, deformation, etc. Therefore, they can be widely used in person re-identification, video structuring analysis and other applications. In this paper, a pedestrian attributes recognition method for surveillance scenarios using a multi-task lightweight convolutional neural network is proposed. Firstly, the labels of the attributes for each pedestrian image are integrated into a label vector. Then, a multi-task lightweight Convolutional Neural Network (CNN) is designed, which consists of five convolutional layers, three pooling layers and two fully connected layers to extract the deep features of pedestrian images. Considering that the data distribution of the datasets is unbalanced, the loss function is improved based on the sigmoid cross-entropy, and the scale factor is added to balance the amount of various attributes data. Through training the network, the mapping relationship model between the deep features of pedestrian images and the integration label vector of their attributes is established, which can be used to predict each attribute of the pedestrian. The experiments were conducted on two public pedestrian attributes datasets in surveillance scenarios, namely PETA and RAP. The results show that, compared with the state-of-the-art pedestrian attributes recognition methods, the proposed method can achieve a superior accuracy by 91.88% on PETA and 87.44% on RAP respectively.

Download Full-text

HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3412847 ◽

2021 ◽

Vol 17 (1s) ◽

pp. 1-22

Author(s):

Chengyuan Zhang ◽

Jiayu Song ◽

Xiaofeng Zhu ◽

Lei Zhu ◽

Shichao Zhang

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Similarity Learning ◽

Retrieval Method ◽

Feature Representations ◽

High Level ◽

Fully Connected ◽

The Relationship ◽

Hybrid Cross ◽

Fully Connected Networks

The purpose of cross-modal retrieval is to find the relationship between different modal samples and to retrieve other modal samples with similar semantics by using a certain modal sample. As the data of different modalities presents heterogeneous low-level feature and semantic-related high-level features, the main problem of cross-modal retrieval is how to measure the similarity between different modalities. In this article, we present a novel cross-modal retrieval method, named Hybrid Cross-Modal Similarity Learning model (HCMSL for short). It aims to capture sufficient semantic information from both labeled and unlabeled cross-modal pairs and intra-modal pairs with same classification label. Specifically, a coupled deep fully connected networks are used to map cross-modal feature representations into a common subspace. Weight-sharing strategy is utilized between two branches of networks to diminish cross-modal heterogeneity. Furthermore, two Siamese CNN models are employed to learn intra-modal similarity from samples of same modality. Comprehensive experiments on real datasets clearly demonstrate that our proposed technique achieves substantial improvements over the state-of-the-art cross-modal retrieval techniques.

Download Full-text

Construction of All-in-Focus Images Assisted by Depth Sensing

Sensors ◽

10.3390/s19061409 ◽

2019 ◽

Vol 19 (6) ◽

pp. 1409 ◽

Cited By ~ 1

Author(s):

Hang Liu ◽

Hengyu Li ◽

Jun Luo ◽

Shaorong Xie ◽

Yu Sun

Keyword(s):

Image Fusion ◽

Imaging System ◽

State Of The Art ◽

Depth Map ◽

Depth Of Field ◽

Depth Sensor ◽

Depth Sensing ◽

Focus Image ◽

Limited Depth ◽

Image Fusion Method

Multi-focus image fusion is a technique for obtaining an all-in-focus image in which all objects are in focus to extend the limited depth of field (DoF) of an imaging system. Different from traditional RGB-based methods, this paper presents a new multi-focus image fusion method assisted by depth sensing. In this work, a depth sensor is used together with a colour camera to capture images of a scene. A graph-based segmentation algorithm is used to segment the depth map from the depth sensor, and the segmented regions are used to guide a focus algorithm to locate in-focus image blocks from among multi-focus source images to construct the reference all-in-focus image. Five test scenes and six evaluation metrics were used to compare the proposed method and representative state-of-the-art algorithms. Experimental results quantitatively demonstrate that this method outperforms existing methods in both speed and quality (in terms of comprehensive fusion metrics). The generated images can potentially be used as reference all-in-focus images.

Download Full-text

LiDAR Odometry and Mapping Based on Semantic Information for Outdoor Environment

Remote Sensing ◽

10.3390/rs13152864 ◽

2021 ◽

Vol 13 (15) ◽

pp. 2864

Author(s):

Shitong Du ◽

Yifan Li ◽

Xuyou Li ◽

Menghao Wu

Keyword(s):

Feature Extraction ◽

Semantic Information ◽

State Of The Art ◽

Outdoor Environment ◽

Geometric Features ◽

Dynamic Object ◽

Object Removal ◽

Localization And Mapping ◽

High Level ◽

Crucial Part

Simultaneous Localization and Mapping (SLAM) in an unknown environment is a crucial part for intelligent mobile robots to achieve high-level navigation and interaction tasks. As one of the typical LiDAR-based SLAM algorithms, the Lidar Odometry and Mapping in Real-time (LOAM) algorithm has shown impressive results. However, LOAM only uses low-level geometric features without considering semantic information. Moreover, the lack of a dynamic object removal strategy limits the algorithm to obtain higher accuracy. To this end, this paper extends the LOAM pipeline by integrating semantic information into the original framework. Specifically, we first propose a two-step dynamic objects filtering strategy. Point-wise semantic labels are then used to improve feature extraction and searching for corresponding points. We evaluate the performance of the proposed method in many challenging scenarios, including highway, country and urban from the KITTI dataset. The results demonstrate that the proposed SLAM system outperforms the state-of-the-art SLAM methods in terms of accuracy and robustness.

Download Full-text

Multi-focus image fusion: A Survey of the state of the art

Information Fusion ◽

10.1016/j.inffus.2020.06.013 ◽

2020 ◽

Vol 64 ◽

pp. 71-91 ◽

Cited By ~ 1

Author(s):

Yu Liu ◽

Lei Wang ◽

Juan Cheng ◽

Chang Li ◽

Xun Chen

Keyword(s):

Image Fusion ◽

State Of The Art ◽

The State ◽

Focus Image

Download Full-text

Multi-Focus Image Fusion: Algorithms, Evaluation, and a Library

Journal of Imaging ◽

10.3390/jimaging6070060 ◽

2020 ◽

Vol 6 (7) ◽

pp. 60

Author(s):

Rabia Zafar ◽

Muhammad Shahid Farid ◽

Muhammad Hassan Khan

Keyword(s):

Remote Sensing ◽

Comparative Study ◽

Image Fusion ◽

State Of The Art ◽

Machine Perception ◽

Statistical Measures ◽

Combination Methods ◽

Focus Image ◽

Assessment Metrics ◽

Significant Data

Image fusion is a process that integrates similar types of images collected from heterogeneous sources into one image in which the information is more definite and certain. Hence, the resultant image is anticipated as more explanatory and enlightening both for human and machine perception. Different image combination methods have been presented to consolidate significant data from a collection of images into one image. As a result of its applications and advantages in variety of fields such as remote sensing, surveillance, and medical imaging, it is significant to comprehend image fusion algorithms and have a comparative study on them. This paper presents a review of the present state-of-the-art and well-known image fusion techniques. The performance of each algorithm is assessed qualitatively and quantitatively on two benchmark multi-focus image datasets. We also produce a multi-focus image fusion dataset by collecting the widely used test images in different studies. The quantitative evaluation of fusion results is performed using a set of image fusion quality assessment metrics. The performance is also evaluated using different statistical measures. Another contribution of this paper is the proposal of a multi-focus image fusion library, to the best of our knowledge, no such library exists so far. The library provides implementation of numerous state-of-the-art image fusion algorithms and is made available publicly at project website.

Download Full-text

Depression Detection on Reddit With an Emotion-Based Attention Network: Algorithm Development and Validation (Preprint)

10.2196/preprints.28754 ◽

2021 ◽

Author(s):

Lu Ren ◽

Hongfei Lin ◽

Bo Xu ◽

Shaowu Zhang ◽

Liang Yang ◽

...

Keyword(s):

Semantic Information ◽

State Of The Art ◽

Emotion Understanding ◽

Experimental Results ◽

Network Module ◽

Emotional Information ◽

Attention Network ◽

Depression Detection ◽

Self Harm ◽

High Level

BACKGROUND As a common mental disease, depression seriously affects people’s physical and mental health. According to the statistics of the World Health Organization, depression is one of the main reasons for suicide and self-harm events in the world. Therefore, strengthening depression detection can effectively reduce the occurrence of suicide or self-harm events so as to save more people and families. With the development of computer technology, some researchers are trying to apply natural language processing techniques to detect people who are depressed automatically. Many existing feature engineering methods for depression detection are based on emotional characteristics, but these methods do not consider high-level emotional semantic information. The current deep learning methods for depression detection cannot accurately extract effective emotional semantic information. OBJECTIVE In this paper, we propose an emotion-based attention network, including a semantic understanding network and an emotion understanding network, which can capture the high-level emotional semantic information effectively to improve the depression detection task. METHODS The semantic understanding network module is used to capture the contextual semantic information. The emotion understanding network module is used to capture the emotional semantic information. There are two units in the emotion understanding network module, including a positive emotion understanding unit and a negative emotion understanding unit, which are used to capture the positive emotional information and the negative emotional information, respectively. We further proposed a dynamic fusion strategy in the emotion understanding network module to fuse the positive emotional information and the negative emotional information. RESULTS We evaluated our method on the Reddit data set. The experimental results showed that the proposed emotion-based attention network model achieved an accuracy, precision, recall, and F-measure of 91.30%, 91.91%, 96.15%, and 93.98%, respectively, which are comparable results compared with state-of-the-art methods. CONCLUSIONS The experimental results showed that our model is competitive with the state-of-the-art models. The semantic understanding network module, the emotion understanding network module, and the dynamic fusion strategy are effective modules for depression detection. In addition, the experimental results verified that the emotional semantic information was effective in depression detection.

Download Full-text

Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6975 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12797-12804 ◽

Cited By ~ 7

Author(s):

Hao Zhang ◽

Han Xu ◽

Yang Xiao ◽

Xiaojie Guo ◽

Jiayi Ma

Keyword(s):

Image Fusion ◽

Medical Image ◽

State Of The Art ◽

The State ◽

Visible Image ◽

Transfer Block ◽

Intensity Information ◽

Focus Image ◽

The One ◽

Maintenance Problem

In this paper, we propose a fast unified image fusion network based on proportional maintenance of gradient and intensity (PMGI), which can end-to-end realize a variety of image fusion tasks, including infrared and visible image fusion, multi-exposure image fusion, medical image fusion, multi-focus image fusion and pan-sharpening. We unify the image fusion problem into the texture and intensity proportional maintenance problem of the source images. On the one hand, the network is divided into gradient path and intensity path for information extraction. We perform feature reuse in the same path to avoid loss of information due to convolution. At the same time, we introduce the pathwise transfer block to exchange information between different paths, which can not only pre-fuse the gradient information and intensity information, but also enhance the information to be processed later. On the other hand, we define a uniform form of loss function based on these two kinds of information, which can adapt to different fusion tasks. Experiments on publicly available datasets demonstrate the superiority of our PMGI over the state-of-the-art in terms of both visual effect and quantitative metric in a variety of fusion tasks. In addition, our method is faster compared with the state-of-the-art.

Download Full-text

Lightweight Attention Pyramid Network for Object Detection and Instance Segmentation

Applied Sciences ◽

10.3390/app10030883 ◽

2020 ◽

Vol 10 (3) ◽

pp. 883 ◽

Cited By ~ 5

Author(s):

Jiwei Zhang ◽

Yanyu Yan ◽

Zelei Cheng ◽

Wendong Wang

Keyword(s):

Object Detection ◽

Semantic Information ◽

Target Location ◽

Feature Fusion ◽

State Of The Art ◽

Detection Accuracy ◽

Low Level ◽

Feature Attention ◽

High Level ◽

Bottom To Top

Feature pyramids of convolutional neural networks (ConvNets)—from bottom to top—are used by most recent researchers for the improvement of object detection accuracy, but they seldom aim to address the correlation of each feature channel and the fusion of low-level features and high-level features. In this paper, an Attention Pyramid Network (APN) is proposed, which mainly contains the adaptive transformation module and feature attention block. The adaptive transformation module utilizes the multiscale feature fusion, and makes full use of the accurate target location information of low-level features and the semantic information of high-level features. Then, the feature attention block strengthens the features of important channels and weakens the features of unimportant channels through learning. By implementing the APN in a basic Mask R-CNN system, our method achieves state-of-the-art results on the MS COCO dataset and 2018 WAD database without bells and whistles. In addition, the structure of the APN makes the network parameters lighter, and runs at 4 ms on average, which is ignorable when compared to the inference time of the backbone of ConvNet.

Download Full-text