Multi-Level Joint Feature Learning for Person Re-Identification

In person re-identification, extracting image features is an important step when retrieving pedestrian images. Most of the current methods only extract global features or local features of pedestrian images. Some inconspicuous details are easily ignored when learning image features, which is not efficient or robust to for scenarios with large differences. In this paper, we propose a Multi-level Feature Fusion model that combines both global features and local features of images through deep learning networks to generate more discriminative pedestrian descriptors. Specifically, we extract local features from different depths of network by the Part-based Multi-level Net to fuse low-to-high level local features of pedestrian images. Global-Local Branches are used to extract the local features and global features at the highest level. The experiments have proved that our deep learning model based on multi-level feature fusion works well in person re-identification. The overall results outperform the state of the art with considerable margins on three widely-used datasets. For instance, we achieve 96% Rank-1 accuracy on the Market-1501 dataset and 76.1% mAP on the DukeMTMC-reID dataset, outperforming the existing works by a large margin (more than 6%).

Download Full-text

Building Extraction in Very High Resolution Imagery by Dense-Attention Networks

Remote Sensing ◽

10.3390/rs10111768 ◽

2018 ◽

Vol 10 (11) ◽

pp. 1768 ◽

Cited By ~ 24

Author(s):

Hui Yang ◽

Penghai Wu ◽

Xuedong Yao ◽

Yanlan Wu ◽

Biao Wang ◽

...

Keyword(s):

Deep Learning ◽

High Resolution ◽

Building Extraction ◽

Learning Networks ◽

Feature Maps ◽

Low Level ◽

High Resolution Imagery ◽

Very High Resolution Imagery ◽

High Level ◽

Very High

Building extraction from very high resolution (VHR) imagery plays an important role in urban planning, disaster management, navigation, updating geographic databases, and several other geospatial applications. Compared with the traditional building extraction approaches, deep learning networks have recently shown outstanding performance in this task by using both high-level and low-level feature maps. However, it is difficult to utilize different level features rationally with the present deep learning networks. To tackle this problem, a novel network based on DenseNets and the attention mechanism was proposed, called the dense-attention network (DAN). The DAN contains an encoder part and a decoder part which are separately composed of lightweight DenseNets and a spatial attention fusion module. The proposed encoder–decoder architecture can strengthen feature propagation and effectively bring higher-level feature information to suppress the low-level feature and noises. Experimental results based on public international society for photogrammetry and remote sensing (ISPRS) datasets with only red–green–blue (RGB) images demonstrated that the proposed DAN achieved a higher score (96.16% overall accuracy (OA), 92.56% F1 score, 90.56% mean intersection over union (MIOU), less training and response time and higher-quality value) when compared with other deep learning methods.

Download Full-text

MR-InpaintNet: Toward Deep Multi-Resolution Learning for Progressive Image Inpainting

10.36227/techrxiv.16641241 ◽

2021 ◽

Author(s):

Huan Zhang ◽

Zhao Zhang ◽

Haijun Zhang ◽

Yi Yang ◽

Shuicheng Yan ◽

...

Keyword(s):

Deep Learning ◽

High Resolution ◽

Semantic Information ◽

Feature Fusion ◽

Image Inpainting ◽

Feature Learning ◽

Low Resolution ◽

Resolution Image ◽

Texture Information ◽

Multiple Resolutions

<div>Deep learning based image inpainting methods have improved the performance greatly due to powerful representation ability of deep learning. However, current deep inpainting methods still tend to produce unreasonable structure and blurry texture, implying that image inpainting is still a challenging topic due to the ill-posed property of the task. To address these issues, we propose a novel deep multi-resolution learning-based progressive image inpainting method, termed MR-InpaintNet, which takes the damaged images of different resolutions as input and then fuses the multi-resolution features for repairing the damaged images. The idea is motivated by the fact that images of different resolutions can provide different levels of feature information. Specifically, the low-resolution image provides strong semantic information and the high-resolution image offers detailed texture information. The middle-resolution image can be used to reduce the gap between low-resolution and high-resolution images, which can further refine the inpainting result. To fuse and improve the multi-resolution features, a novel multi-resolution feature learning (MRFL) process is designed, which is consisted of a multi-resolution feature fusion (MRFF) module, an adaptive feature enhancement (AFE) module and a memory enhanced mechanism (MEM) module for information preservation. Then, the refined multi-resolution features contain both rich semantic information and detailed texture information from multiple resolutions. We further handle the refined multiresolution features by the decoder to obtain the recovered image. Extensive experiments on the Paris Street View, Places2 and CelebA-HQ datasets demonstrate that our proposed MRInpaintNet can effectively recover the textures and structures, and performs favorably against state-of-the-art methods.</div>

Download Full-text

Multimodal biometrics of fingerprint and signature recognition using multi-level feature fusion and deep learning techniques

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v22.i1.pp187-195 ◽

2021 ◽

Vol 22 (1) ◽

pp. 187

Author(s):

Arjun Benagatte Channegowda ◽

H N Prakash

Keyword(s):

Deep Learning ◽

Feature Fusion ◽

Research Work ◽

Recognition Rate ◽

Recognition System ◽

Multimodal Biometrics ◽

Histograms Of Oriented Gradients ◽

Multi Level ◽

Deep Learning Neural Network ◽

Hidden Neurons

Providing security in biometrics is the major challenging task in the current situation. A lot of research work is going on in this area. Security can be more tightened by using complex security systems, like by using more than one biometric trait for recognition. In this paper multimodal biometric models are developed to improve the recognition rate of a person. The combination of physiological and behavioral biometrics characteristics is used in this work. Fingerprint and signature biometrics characteristics are used to develop a multimodal recognition system. Histograms of oriented gradients (HOG) features are extracted from biometric traits and for these feature fusions are applied at two levels. Features of fingerprint and signatures are fused using concatenation, sum, max, min, and product rule at multilevel stages, these features are used to train deep learning neural network model. In the proposed work, multi-level feature fusion for multimodal biometrics with a deep learning classifier is used and results are analyzed by a varying number of hidden neurons and hidden layers. Experiments are carried out on SDUMLA-HMT, machine learning and data mining lab, Shandong University fingerprint datasets, and MCYT signature biometric recognition group datasets, and encouraging results were obtained.

Download Full-text

Natural Language Understanding for Multi-Level Distributed Intelligent Virtual Sensors

IoT ◽

10.3390/iot1020027 ◽

2020 ◽

Vol 1 (2) ◽

pp. 494-505

Author(s):

Radu-Casian Mihailescu ◽

Georgios Kyriakou ◽

Angelos Papangelis

Keyword(s):

Deep Learning ◽

User Interface ◽

Natural Language ◽

Learning Approach ◽

Language Understanding ◽

Virtual Sensor ◽

Virtual Sensors ◽

Multi Level ◽

High Level ◽

User Friendly

In this paper we address the problem of automatic sensor composition for servicing human-interpretable high-level tasks. To this end, we introduce multi-level distributed intelligent virtual sensors (multi-level DIVS) as an overlay framework for a given mesh of physical and/or virtual sensors already deployed in the environment. The goal for multi-level DIVS is two-fold: (i) to provide a convenient way for the user to specify high-level sensing tasks; (ii) to construct the computational graph that provides the correct output given a specific sensing task. For (i) we resort to a conversational user interface, which is an intuitive and user-friendly manner in which the user can express the sensing problem, i.e., natural language queries, while for (ii) we propose a deep learning approach that establishes the correspondence between the natural language queries and their virtual sensor representation. Finally, we evaluate and demonstrate the feasibility of our approach in the context of a smart city setup.

Download Full-text

Image Classification in HTP Test Based on Convolutional Neural Network Model

Computational Intelligence and Neuroscience ◽

10.1155/2021/6370509 ◽

2021 ◽

Vol 2021 ◽

pp. 1-8

Author(s):

Lin Liu

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Image Classification ◽

Reliability And Validity ◽

Feature Learning ◽

Human Vision ◽

Assessment Process ◽

Learning Networks ◽

Good Reliability

HTP test in psychometrics is a widely studied and applied psychological assessment technique. HTP test is a kind of projection test, which refers to the free expression of painting itself and its creativity. Therefore, the form of group psychological counselling is widely used in mental health education. Compared with traditional neural networks, deep learning networks have deeper and more network layers and can learn more complex processing functions. In this stage, image recognition technology can be used as an assistant of human vision. People can quickly get the information in the picture through retrieval. For example, you can take a picture of an object that is difficult to describe and quickly search the content related to it. Convolutional neural network, which is widely used in the image classification task of computer vision, can automatically complete feature learning on the data without manual feature extraction. Compared with the traditional test, the test can reflect the painting characteristics of different groups. After quantitative scoring, it has good reliability and validity. It has high application value in psychological evaluation, especially in the diagnosis of mental diseases. This paper focuses on the subjectivity of HTP evaluation. Convolutional neural network is a mature technology in deep learning. The traditional HTP assessment process relies on the experience of researchers to extract painting features and classification.

Download Full-text

BLNN: Multiscale Feature Fusion-Based Bilinear Fine-Grained Convolutional Neural Network for Image Classification of Wood Knot Defects

Journal of Sensors ◽

10.1155/2021/8109496 ◽

2021 ◽

Vol 2021 ◽

pp. 1-18

Author(s):

Mingyu Gao ◽

Fei Wang ◽

Peng Song ◽

Junyan Liu ◽

DaWei Qi

Keyword(s):

Neural Network ◽

Deep Learning ◽

Defect Detection ◽

Feature Fusion ◽

Image Features ◽

Optical Image ◽

Training Time ◽

Fine Grained ◽

Detection Model ◽

Defect Recognition

Wood defects are quickly identified from an optical image based on deep learning methodology, which effectively improves the wood utilization. The traditional neural network technique is unemployed for the wood defect detection of optical image used, which results from a long training time, low recognition accuracy, and nonautomatic extraction of defect image features. In this paper, a wood knot defect detection model (so-called BLNN) combined deep learning is reported. Two subnetworks composed of convolutional neural networks are trained by Pytorch. By using the feature extraction capabilities of the two subnetworks and combining the bilinear join operation, the fine-grained features of the image are obtained. The experimental results show that the accuracy has reached up 99.20%, and the training time is obviously reduced with the speed of defect detection about 0.0795 s/image. It indicates that BLNN has the ability to improve the accuracy of defect recognition and has a potential application in the detection of wood knot defects.

Download Full-text

Multiscale Deep Spatial Feature Extraction Using Virtual RGB Image for Hyperspectral Imagery Classification

Remote Sensing ◽

10.3390/rs12020280 ◽

2020 ◽

Vol 12 (2) ◽

pp. 280 ◽

Cited By ~ 3

Author(s):

Liqin Liu ◽

Zhenwei Shi ◽

Bin Pan ◽

Ning Zhang ◽

Huanlin Luo ◽

...

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Hyperspectral Image ◽

Feature Fusion ◽

Hyperspectral Data ◽

Learning Technology ◽

Learning Networks ◽

Spatial Features ◽

Training Samples ◽

Rgb Image

In recent years, deep learning technology has been widely used in the field of hyperspectral image classification and achieved good performance. However, deep learning networks need a large amount of training samples, which conflicts with the limited labeled samples of hyperspectral images. Traditional deep networks usually construct each pixel as a subject, ignoring the integrity of the hyperspectral data and the methods based on feature extraction are likely to lose the edge information which plays a crucial role in the pixel-level classification. To overcome the limit of annotation samples, we propose a new three-channel image build method (virtual RGB image) by which the trained networks on natural images are used to extract the spatial features. Through the trained network, the hyperspectral data are disposed as a whole. Meanwhile, we propose a multiscale feature fusion method to combine both the detailed and semantic characteristics, thus promoting the accuracy of classification. Experiments show that the proposed method can achieve ideal results better than the state-of-art methods. In addition, the virtual RGB image can be extended to other hyperspectral processing methods that need to use three-channel images.

Download Full-text

Multi-Feature Fusion and Machine Learning

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch006 ◽

2021 ◽

pp. 95-118

Author(s):

Hadeer Elziaat ◽

Nashwa El-Bendary ◽

Ramadan Moawad

Keyword(s):

Machine Learning ◽

Early Detection ◽

Feature Fusion ◽

Freezing Of Gait ◽

Feature Learning ◽

Feature Engineering ◽

Common Symptom ◽

Fusion Model ◽

Feature Sets ◽

Detection Model

Freezing of gait (FoG) is a common symptom of Parkinson's disease (PD) that causes intermittent absence of forward progression of patient's feet while walking. Accordingly, FoG momentary episodes are always accompanied with falls. This chapter presents a novel multi-feature fusion model for early detection of FoG episodes in patients with PD. In this chapter, two feature engineering schemes are investigated, namely time-domain hand-crafted feature engineering and convolutional neural network (CNN)-based spectrogram feature learning. Data of tri-axial accelerometer sensors for patients with PD is utilized to characterize the performance of the proposed model through several experiments with various machine learning (ML) algorithms. Obtained experimental results showed that the multi-feature fusion approach has outperformed typical single feature sets. Conclusively, the significance of this chapter is to highlight the impact of using feature fusion of multi-feature sets through investigating the performance of a FoG episodes early detection model.

Download Full-text

Cross-Modal Hybrid Feature Fusion for Image-Sentence Matching

ACM Transactions on Multimedia Computing Communications and Applications ◽

10.1145/3458281 ◽

2021 ◽

Vol 17 (4) ◽

pp. 1-23

Author(s):

Xing Xu ◽

Yifan Wang ◽

Yixuan He ◽

Yang Yang ◽

Alan Hanjalic ◽

...

Keyword(s):

Feature Fusion ◽

Global Features ◽

Fine Grained ◽

Common Space ◽

Multimodal Features ◽

Sentence Similarity ◽

Language And Vision ◽

Sentence Matching ◽

High Level ◽

Ranking Loss

Image-sentence matching is a challenging task in the field of language and vision, which aims at measuring the similarities between images and sentence descriptions. Most existing methods independently map the global features of images and sentences into a common space to calculate the image-sentence similarity. However, the image-sentence similarity obtained by these methods may be coarse as (1) an intermediate common space is introduced to implicitly match the heterogeneous features of images and sentences in a global level, and (2) only the inter-modality relations of images and sentences are captured while the intra-modality relations are ignored. To overcome the limitations, we propose a novel Cross-Modal Hybrid Feature Fusion (CMHF) framework for directly learning the image-sentence similarity by fusing multimodal features with inter- and intra-modality relations incorporated. It can robustly capture the high-level interactions between visual regions in images and words in sentences, where flexible attention mechanisms are utilized to generate effective attention flows within and across the modalities of images and sentences. A structured objective with ranking loss constraint is formed in CMHF to learn the image-sentence similarity based on the fused fine-grained features of different modalities bypassing the usage of intermediate common space. Extensive experiments and comprehensive analysis performed on two widely used datasets—Microsoft COCO and Flickr30K—show the effectiveness of the hybrid feature fusion framework in CMHF, in which the state-of-the-art matching performance is achieved by our proposed CMHF method.

Download Full-text

Identification of line status changes using phasor measurements through deep learning networks

Proceedings of the higher educational institutions ENERGY SECTOR PROBLEMS ◽

10.30724/1998-9903-2020-22-6-55-67 ◽

2021 ◽

Vol 22 (6) ◽

pp. 55-67

Author(s):

N. E. Gotman ◽

G. P. Shumilova

Keyword(s):

Deep Learning ◽

Transmission Lines ◽

Power Grid ◽

Measurement Data ◽

Transient State ◽

Test System ◽

Learning Networks ◽

Status Change ◽

Grid Topology ◽

High Level

THE PURPOSE. To consider the problem of detecting changes in a power grid topology that occurs as a result of the power line outage / turning on. Develop the algorithm for detecting changes in the status of transmission lines in real time by using voltage and current phasors captured by phasor measurement units (PMUs) are placed on buses. Carry out experimental research on IEEE 14-bus test system. METHODS. This paper proposes a method from the field of artificial intelligence such as machine learning in particular "Deep Learning" to solve the problem. Deep Learning arises as a computational learning technique in which high level abstractions are hierarchically modelled from raw data. One of the means to effectively extract the inherent hidden features in data are Convolutional Neural Networks (CNNs). RESULTS. The article describes the topic relevance, offers to apply the method for detecting status of lines using a CNN classifier. The combination of different CNN architectures and the number of time slices from the moment of line status change are used to detect the power grid topology. The effectiveness of the joint use of PMUs and CNN in solving this problem has been proven. CONCLUSION. A solution for the line status change detection in the transient states using a CNN classifier is proposed. A high accuracy of the line status detection was obtained despite the influence of noise on measurement data. A change in the network topology is detected at the very beginning of the transient state almost instantly. It will allow the operator several times during the first seconds to identify the line state in order to make sure that the decisions made are correct.

Download Full-text