GreyReID: A Novel Two-stream Deep Framework with RGB-grey Information for Person Re-identification

Author(s):  
Lei Qi ◽  
Lei Wang ◽  
Jing Huo ◽  
Yinghuan Shi ◽  
Yang Gao

In this article, we observe that most false positive images (i.e., different identities with query images) in the top ranking list usually have the similar color information with the query image in person re-identification (Re-ID). Meanwhile, when we use the greyscale images generated from RGB images to conduct the person Re-ID task, some hard query images can obtain better performance compared with using RGB images. Therefore, RGB and greyscale images seem to be complementary to each other for person Re-ID. In this article, we aim to utilize both RGB and greyscale images to improve the person Re-ID performance. To this end, we propose a novel two-stream deep neural network with RGB-grey information, which can effectively fuse RGB and greyscale feature representations to enhance the generalization ability of Re-ID. First, we convert RGB images to greyscale images in each training batch. Based on these RGB and greyscale images, we train the RGB and greyscale branches, respectively. Second, to build up connections between RGB and greyscale branches, we merge the RGB and greyscale branches into a new joint branch. Finally, we concatenate the features of all three branches as the final feature representation for Re-ID. Moreover, in the training process, we adopt the joint learning scheme to simultaneously train each branch by the independent loss function, which can enhance the generalization ability of each branch. Besides, a global loss function is utilized to further fine-tune the final concatenated feature. The extensive experiments on multiple benchmark datasets fully show that the proposed method can outperform the state-of-the-art person Re-ID methods. Furthermore, using greyscale images can indeed improve the person Re-ID performance in the proposed deep framework.

2020 ◽  
Vol 13 (1) ◽  
pp. 115
Author(s):  
Jiaojiao Li ◽  
Chaoxiong Wu ◽  
Rui Song ◽  
Yunsong Li ◽  
Weiying Xie

Deep convolutional neural networks (CNNs) have been successfully applied to spectral reconstruction (SR) and acquired superior performance. Nevertheless, the existing CNN-based SR approaches integrate hierarchical features from different layers indiscriminately, lacking an investigation of the relationships of intermediate feature maps, which limits the learning power of CNNs. To tackle this problem, we propose a deep residual augmented attentional u-shape network (RA2UN) with several double improved residual blocks (DIRB) instead of paired plain convolutional units. Specifically, a trainable spatial augmented attention (SAA) module is developed to bridge the encoder and decoder to emphasize the features in the informative regions. Furthermore, we present a novel channel augmented attention (CAA) module embedded in the DIRB to rescale adaptively and enhance residual learning by using first-order and second-order statistics for stronger feature representations. Finally, a boundary-aware constraint is employed to focus on the salient edge information and recover more accurate high-frequency details. Experimental results on four benchmark datasets demonstrate that the proposed RA2UN network outperforms the state-of-the-art SR methods under quantitative measurements and perceptual comparison.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Yun Yang ◽  
Xiaofang Liu ◽  
Qiongwei Ye ◽  
Dapeng Tao

As an important application in video surveillance, person reidentification enables automatic tracking of a pedestrian through different disjointed camera views. It essentially focuses on extracting or learning feature representations followed by a matching model using a distance metric. In fact, person reidentification is a difficult task because, first, no universal feature representation can perfectly identify the amount of pedestrians in the gallery obtained by a multicamera system. Although different features can be fused into a composite representation, the fusion still does not fully explore the difference, complementarity, and importance between different features. Second, a matching model always has a limited amount of training samples to learn a distance metric for matching probe images against a gallery, which certainly results in an unstable learning process and poor matching result. In this paper, we address the issues of person reidentification by the ensemble theory, which explores the importance of different feature representations, and reconcile several matching models on different feature representations to an optimal one via our proposed weighting scheme. We have carried out the simulation on two well-recognized person reidentification benchmark datasets: VIPeR and ETHZ. The experimental results demonstrate that our approach achieves state-of-the-art performance.


2019 ◽  
Vol 35 (23) ◽  
pp. 4930-4937 ◽  
Author(s):  
Leyi Wei ◽  
Ran Su ◽  
Shasha Luan ◽  
Zhijun Liao ◽  
Balachandran Manavalan ◽  
...  

Abstract Motivation Accurate identification of N4-methylcytosine (4mC) modifications in a genome wide can provide insights into their biological functions and mechanisms. Machine learning recently have become effective approaches for computational identification of 4mC sites in genome. Unfortunately, existing methods cannot achieve satisfactory performance, owing to the lack of effective DNA feature representations that are capable to capture the characteristics of 4mC modifications. Results In this work, we developed a new predictor named 4mcPred-IFL, aiming to identify 4mC sites. To represent and capture discriminative features, we proposed an iterative feature representation algorithm that enables to learn informative features from several sequential models in a supervised iterative mode. Our analysis results showed that the feature representations learnt by our algorithm can capture the discriminative distribution characteristics between 4mC sites and non-4mC sites, enlarging the decision margin between the positives and negatives in feature space. Additionally, by evaluating and comparing our predictor with the state-of-the-art predictors on benchmark datasets, we demonstrate that our predictor can identify 4mC sites more accurately. Availability and implementation The user-friendly webserver that implements the proposed 4mcPred-IFL is well established, and is freely accessible at http://server.malab.cn/4mcPred-IFL. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Cunli Mao ◽  
Zhibo Man ◽  
Zhengtao Yu ◽  
Shengxiang Gao ◽  
Zhenhan Wang ◽  
...  

The smallest semantic unit of the Burmese language is called the syllable. In the present study, it is intended to propose the first neural joint learning model for Burmese syllable segmentation, word segmentation, and part-of-speech ( POS ) tagging with the BERT. The proposed model alleviates the error propagation problem of the syllable segmentation. More specifically, it extends the neural joint model for Vietnamese word segmentation, POS tagging, and dependency parsing [28] with the pre-training method of the Burmese character, syllable, and word embedding with BiLSTM-CRF-based neural layers. In order to evaluate the performance of the proposed model, experiments are carried out on Burmese benchmark datasets, and we fine-tune the model of multilingual BERT. Obtained results show that the proposed joint model can result in an excellent performance.


2019 ◽  
Vol 20 (5) ◽  
pp. 565-578 ◽  
Author(s):  
Lidong Wang ◽  
Ruijun Zhang

Ubiquitination is an important post-translational modification (PTM) process for the regulation of protein functions, which is associated with cancer, cardiovascular and other diseases. Recent initiatives have focused on the detection of potential ubiquitination sites with the aid of physicochemical test approaches in conjunction with the application of computational methods. The identification of ubiquitination sites using laboratory tests is especially susceptible to the temporality and reversibility of the ubiquitination processes, and is also costly and time-consuming. It has been demonstrated that computational methods are effective in extracting potential rules or inferences from biological sequence collections. Up to the present, the computational strategy has been one of the critical research approaches that have been applied for the identification of ubiquitination sites, and currently, there are numerous state-of-the-art computational methods that have been developed from machine learning and statistical analysis to undertake such work. In the present study, the construction of benchmark datasets is summarized, together with feature representation methods, feature selection approaches and the classifiers involved in several previous publications. In an attempt to explore pertinent development trends for the identification of ubiquitination sites, an independent test dataset was constructed and the predicting results obtained from five prediction tools are reported here, together with some related discussions.


AI ◽  
2021 ◽  
Vol 2 (2) ◽  
pp. 195-208
Author(s):  
Gabriel Dahia ◽  
Maurício Pamplona Segundo

We propose a method that can perform one-class classification given only a small number of examples from the target class and none from the others. We formulate the learning of meaningful features for one-class classification as a meta-learning problem in which the meta-training stage repeatedly simulates one-class classification, using the classification loss of the chosen algorithm to learn a feature representation. To learn these representations, we require only multiclass data from similar tasks. We show how the Support Vector Data Description method can be used with our method, and also propose a simpler variant based on Prototypical Networks that obtains comparable performance, indicating that learning feature representations directly from data may be more important than which one-class algorithm we choose. We validate our approach by adapting few-shot classification datasets to the few-shot one-class classification scenario, obtaining similar results to the state-of-the-art of traditional one-class classification, and that improves upon that of one-class classification baselines employed in the few-shot setting.


Sensors ◽  
2020 ◽  
Vol 21 (1) ◽  
pp. 54
Author(s):  
Peng Liu ◽  
Zonghua Zhang ◽  
Zhaozong Meng ◽  
Nan Gao

Depth estimation is a crucial component in many 3D vision applications. Monocular depth estimation is gaining increasing interest due to flexible use and extremely low system requirements, but inherently ill-posed and ambiguous characteristics still cause unsatisfactory estimation results. This paper proposes a new deep convolutional neural network for monocular depth estimation. The network applies joint attention feature distillation and wavelet-based loss function to recover the depth information of a scene. Two improvements were achieved, compared with previous methods. First, we combined feature distillation and joint attention mechanisms to boost feature modulation discrimination. The network extracts hierarchical features using a progressive feature distillation and refinement strategy and aggregates features using a joint attention operation. Second, we adopted a wavelet-based loss function for network training, which improves loss function effectiveness by obtaining more structural details. The experimental results on challenging indoor and outdoor benchmark datasets verified the proposed method’s superiority compared with current state-of-the-art methods.


2020 ◽  
Vol 176 (2) ◽  
pp. 183-203
Author(s):  
Santosh Chapaneri ◽  
Deepak Jayaswal

Modeling the music mood has wide applications in music categorization, retrieval, and recommendation systems; however, it is challenging to computationally model the affective content of music due to its subjective nature. In this work, a structured regression framework is proposed to model the valence and arousal mood dimensions of music using a single regression model at a linear computational cost. To tackle the subjectivity phenomena, a confidence-interval based estimated consensus is computed by modeling the behavior of various annotators (e.g. biased, adversarial) and is shown to perform better than using the average annotation values. For a compact feature representation of music clips, variational Bayesian inference is used to learn the Gaussian mixture model representation of acoustic features and chord-related features are used to improve the valence estimation by probing the chord progressions between chroma frames. The dimensionality of features is further reduced using an adaptive version of kernel PCA. Using an efficient implementation of twin Gaussian process for structured regression, the proposed work achieves a significant improvement in R2 for arousal and valence dimensions relative to state-of-the-art techniques on two benchmark datasets for music mood estimation.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1234
Author(s):  
Lei Zha ◽  
Yu Yang ◽  
Zicheng Lai ◽  
Ziwei Zhang ◽  
Juan Wen

In recent years, neural networks for single image super-resolution (SISR) have applied more profound and deeper network structures to extract extra image details, which brings difficulties in model training. To deal with deep model training problems, researchers utilize dense skip connections to promote the model’s feature representation ability by reusing deep features of different receptive fields. Benefiting from the dense connection block, SRDensenet has achieved excellent performance in SISR. Despite the fact that the dense connected structure can provide rich information, it will also introduce redundant and useless information. To tackle this problem, in this paper, we propose a Lightweight Dense Connected Approach with Attention for Single Image Super-Resolution (LDCASR), which employs the attention mechanism to extract useful information in channel dimension. Particularly, we propose the recursive dense group (RDG), consisting of Dense Attention Blocks (DABs), which can obtain more significant representations by extracting deep features with the aid of both dense connections and the attention module, making our whole network attach importance to learning more advanced feature information. Additionally, we introduce the group convolution in DABs, which can reduce the number of parameters to 0.6 M. Extensive experiments on benchmark datasets demonstrate the superiority of our proposed method over five chosen SISR methods.


2021 ◽  
Vol 2050 (1) ◽  
pp. 012006
Author(s):  
Xili Dai ◽  
Chunmei Ma ◽  
Jingwei Sun ◽  
Tao Zhang ◽  
Haigang Gong ◽  
...  

Abstract Training deep neural networks from only a few examples has been an interesting topic that motivated few shot learning. In this paper, we study the fine-grained image classification problem in a challenging few-shot learning setting, and propose the Self-Amplificated Network (SAN), a method based on meta-learning to tackle this problem. The SAN model consists of three parts, which are the Encoder, Amplification and Similarity Modules. The Encoder Module encodes a fine-grained image input into a feature vector. The Amplification Module is used to amplify subtle differences between fine-grained images based on the self attention mechanism which is composed of multi-head attention. The Similarity Module measures how similar the query image and the support set are in order to determine the classification result. In-depth experiments on three benchmark datasets have showcased that our network achieves superior performance over the competing baselines.


Sign in / Sign up

Export Citation Format

Share Document