scholarly journals An Efficient Multi-Scale Focusing Attention Network for Person Re-Identification

2021 ◽  
Vol 11 (5) ◽  
pp. 2010
Author(s):  
Wei Huang ◽  
Yongying Li ◽  
Kunlin Zhang ◽  
Xiaoyu Hou ◽  
Jihui Xu ◽  
...  

The multi-scale lightweight network and attention mechanism recently attracted attention in person re-identification (ReID) as it is capable of improving the model’s ability to process information with low computational cost. However, state-of-the-art methods mostly concentrate on the spatial attention and big block channel attention model with high computational complexity while rarely investigate the inside block attention with the lightweight network, which cannot meet the requirements of high efficiency and low latency in the actual ReID system. In this paper, a novel lightweight person ReID model is designed firstly, called Multi-Scale Focusing Attention Network (MSFANet), to capture robust and elaborate multi-scale ReID features, which have fewer float-computing and higher performance. MSFANet is achieved by designing a multi-branch depthwise separable convolution module, combining with an inside block attention module, to extract and fuse multi-scale features independently. In addition, we design a multi-stage backbone with the ‘1-2-3’ form, which can significantly reduce computational cost. Furthermore, the MSFANet is exceptionally lightweight and can be embedded in a ReID framework flexibly. Secondly, an efficient loss function combining softmax loss and TriHard loss, based on the proposed optimal data augmentation method, is designed for faster convergence and better model generalization ability. Finally, the experimental results on two big ReID datasets (Market1501 and DukeMTMC) and two small ReID datasets (VIPeR, GRID) show that the proposed MSFANet achieves the best mAP performance and the lowest computational complexity compared with state-of-the-art methods, which are increasing by 2.3% and decreasing by 18.2%, respectively.

Electronics ◽  
2020 ◽  
Vol 9 (12) ◽  
pp. 2038
Author(s):  
Xi Shao ◽  
Xuan Zhang ◽  
Guijin Tang ◽  
Bingkun Bao

We propose a new end-to-end scene recognition framework, called a Recurrent Memorized Attention Network (RMAN) model, which performs object-based scene classification by recurrently locating and memorizing objects in the image. Based on the proposed framework, we introduce a multi-task mechanism that contiguously attends on the different essential objects in a scene image and recurrently performs memory fusion of the features of object focused by an attention model to improve the scene recognition accuracy. The experimental results show that the RMAN model has achieved better classification performance on the constructed dataset and two public scene datasets, surpassing state-of-the-art image scene recognition approaches.


2020 ◽  
Vol 10 (3) ◽  
pp. 24
Author(s):  
Stefania Preatto ◽  
Andrea Giannini ◽  
Luca Valente ◽  
Guido Masera ◽  
Maurizio Martina

High Efficiency Video Coding (HEVC) is the latest video standard developed by the Joint Video Exploration Team. HEVC is able to offer better compression results than preceding standards but it suffers from a high computational complexity. In particular, one of the most time consuming blocks in HEVC is the fractional-sample interpolation filter, which is used in both the encoding and the decoding processes. Integrating different state-of-the-art techniques, this paper presents an architecture for interpolation filters, able to trade quality for energy and power efficiency by exploiting approximate interpolation filters and by halving the amount of required memory with respect to state-of-the-art implementations.


2018 ◽  
Vol 14 (7) ◽  
pp. 155014771879075 ◽  
Author(s):  
Chi Yoon Jeong ◽  
Hyun S Yang ◽  
KyeongDeok Moon

In this article, we propose a fast method for detecting the horizon line in maritime scenarios by combining a multi-scale approach and region-of-interest detection. Recently, several methods that adopt a multi-scale approach have been proposed, because edge detection at a single is insufficient to detect all edges of various sizes. However, these methods suffer from high processing times, requiring tens of seconds to complete horizon detection. Moreover, the resolution of images captured from cameras mounted on vessels is increasing, which reduces processing speed. Using the region-of-interest is an efficient way of reducing the amount of processing information required. Thus, we explore a way to efficiently use the region-of-interest for horizon detection. The proposed method first detects the region-of-interest using a property of maritime scenes and then multi-scale edge detection is performed for edge extraction at each scale. The results are then combined to produce a single edge map. Then, Hough transform and a least-square method are sequentially used to estimate the horizon line accurately. We compared the performance of the proposed method with state-of-the-art methods using two publicly available databases, namely, Singapore Marine Dataset and buoy dataset. Experimental results show that the proposed method for region-of-interest detection reduces the processing time of horizon detection, and the accuracy with which the proposed method can identify the horizon is superior to that of state-of-the-art methods.


Author(s):  
Shaobo Min ◽  
Xuejin Chen ◽  
Zheng-Jun Zha ◽  
Feng Wu ◽  
Yongdong Zhang

Learning-based methods suffer from a deficiency of clean annotations, especially in biomedical segmentation. Although many semi-supervised methods have been proposed to provide extra training data, automatically generated labels are usually too noisy to retrain models effectively. In this paper, we propose a Two-Stream Mutual Attention Network (TSMAN) that weakens the influence of back-propagated gradients caused by incorrect labels, thereby rendering the network robust to unclean data. The proposed TSMAN consists of two sub-networks that are connected by three types of attention models in different layers. The target of each attention model is to indicate potentially incorrect gradients in a certain layer for both sub-networks by analyzing their inferred features using the same input. In order to achieve this purpose, the attention models are designed based on the propagation analysis of noisy gradients at different layers. This allows the attention models to effectively discover incorrect labels and weaken their influence during parameter updating process. By exchanging multi-level features within two-stream architecture, the effects of noisy labels in each sub-network are reduced by decreasing the noisy gradients. Furthermore, a hierarchical distillation is developed to provide reliable pseudo labels for unlabelded data, which further boosts the performance of TSMAN. The experiments using both HVSMR 2016 and BRATS 2015 benchmarks demonstrate that our semi-supervised learning framework surpasses the state-of-the-art fully-supervised results.


2020 ◽  
Author(s):  
Tahir Mahmood ◽  
Muhammad Owais ◽  
Kyoung Jun Noh ◽  
Hyo Sik Yoon ◽  
Adnan Haider ◽  
...  

BACKGROUND Accurate nuclei segmentation in histopathology images plays a key role in digital pathology. It is considered a prerequisite for the determination of cell phenotype, nuclear morphometrics, cell classification, and the grading and prognosis of cancer. However, it is a very challenging task because of the different types of nuclei, large intra-class variations, and diverse cell morphologies. Consequently, the manual inspection of such images under high-resolution microscopes is tedious and time-consuming. Alternatively, artificial intelligence (AI)-based automated techniques, which are fast, robust, and require less human effort, can be used. Recently, several AI-based nuclei segmentation techniques have been proposed. They have shown a significant performance improvement for this task, but there is room for further improvement. Thus, we propose an AI-based nuclei segmentation technique in which we adopt a new nuclei segmentation network empowered by residual skip connections to address this issue. OBJECTIVE The aim of this study was to develop an AI-based nuclei segmentation method for histopathology images of multiple organs. METHODS Our proposed residual-skip-connections-based nuclei segmentation network (R-NSN) is comprised of two main stages: Stain normalization and nuclei segmentation as shown in Figure 2. In the 1st stage, a histopathology image is stain normalized to balance the color and intensity variation. Subsequently, it is used as an input to the R-NSN in stage 2, which outputs a segmented image. RESULTS Experiments were performed on two publicly available datasets: 1) The Cancer Genomic Atlas (TCGA), and 2) Triple-negative Breast Cancer (TNBC). The results show that our proposed technique achieves an aggregated Jaccard index (AJI) of 0.6794, Dice coefficient of 0.8084, and F1-measure of 0.8547 on the TCGA dataset, and an AJI of 0.7332, Dice coefficient of 0.8441, precision of 0.8352, recall of 0.8306, and F1-measure of 0.8329 on the TNBC dataset. These values are higher than those of the state-of-the-art methods. CONCLUSIONS The proposed R-NSN has the potential to maintain crucial features by using the residual connectivity from the encoder to the decoder and uses only a few layers, which reduces the computational cost of the model. The selection of a good stain normalization technique, the effective use of residual connections to avoid information loss, and the use of only a few layers to reduce the computational cost yielded outstanding results. Thus, our nuclei segmentation method is robust and is superior to the state-of-the-art methods. We expect that this study will contribute to the development of computational pathology software for research and clinical use and enhance the impact of computational pathology.


Author(s):  
Chengfeng Xu ◽  
Pengpeng Zhao ◽  
Yanchi Liu ◽  
Victor S. Sheng ◽  
Jiajie Xu ◽  
...  

Session-based recommendation, which aims to predict the user's immediate next action based on anonymous sessions, is a key task in many online services (e.g., e-commerce, media streaming).  Recently, Self-Attention Network (SAN) has achieved significant success in various sequence modeling tasks without using either recurrent or convolutional network. However, SAN lacks local dependencies that exist over adjacent items and limits its capacity for learning contextualized representations of items in sequences.  In this paper, we propose a graph contextualized self-attention model (GC-SAN), which utilizes both graph neural network and self-attention mechanism, for session-based recommendation. In GC-SAN, we dynamically construct a graph structure for session sequences and capture rich local dependencies via graph neural network (GNN).  Then each session learns long-range dependencies by applying the self-attention mechanism. Finally, each session is represented as a linear combination of the global preference and the current interest of that session. Extensive experiments on two real-world datasets show that GC-SAN outperforms state-of-the-art methods consistently.


Author(s):  
Zachary Teed ◽  
Jia Deng

We introduce Recurrent All-Pairs Field Transforms (RAFT), a new deep network architecture for optical flow. RAFT extracts per-pixel features, builds multi-scale 4D correlation volumes for all pairs of pixels, and iteratively updates a flow field through a recurrent unit that performs lookups on the correlation volumes. RAFT achieves state-of-the-art performance on the KITTI and Sintel datasets. In addition, RAFT has strong cross-dataset generalization as well as high efficiency in inference time, training speed, and parameter count.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1881
Author(s):  
Yuhui Chang ◽  
Jiangtao Xu ◽  
Zhiyuan Gao

To improve the accuracy of stereo matching, the multi-scale dense attention network (MDA-Net) is proposed. The network introduces two novel modules in the feature extraction stage to achieve better exploit of context information: dual-path upsampling (DU) block and attention-guided context-aware pyramid feature extraction (ACPFE) block. The DU block is introduced to fuse different scale feature maps. It introduces sub-pixel convolution to compensate for the loss of information caused by the traditional interpolation upsampling method. The ACPFE block is proposed to extract multi-scale context information. Pyramid atrous convolution is adopted to exploit multi-scale features and the channel-attention is used to fuse the multi-scale features. The proposed network has been evaluated on several benchmark datasets. The three-pixel-error evaluated over all ground truth pixels is 2.10% on KITTI 2015 dataset. The experiment results prove that MDA-Net achieves state-of-the-art accuracy on KITTI 2012 and 2015 datasets.


Author(s):  
Markos Georgopoulos ◽  
James Oldfield ◽  
Mihalis A. Nicolaou ◽  
Yannis Panagakis ◽  
Maja Pantic

AbstractDeep learning has catalysed progress in tasks such as face recognition and analysis, leading to a quick integration of technological solutions in multiple layers of our society. While such systems have proven to be accurate by standard evaluation metrics and benchmarks, a surge of work has recently exposed the demographic bias that such algorithms exhibit–highlighting that accuracy does not entail fairness. Clearly, deploying biased systems under real-world settings can have grave consequences for affected populations. Indeed, learning methods are prone to inheriting, or even amplifying the bias present in a training set, manifested by uneven representation across demographic groups. In facial datasets, this particularly relates to attributes such as skin tone, gender, and age. In this work, we address the problem of mitigating bias in facial datasets by data augmentation. We propose a multi-attribute framework that can successfully transfer complex, multi-scale facial patterns even if these belong to underrepresented groups in the training set. This is achieved by relaxing the rigid dependence on a single attribute label, and further introducing a tensor-based mixing structure that captures multiplicative interactions between attributes in a multilinear fashion. We evaluate our method with an extensive set of qualitative and quantitative experiments on several datasets, with rigorous comparisons to state-of-the-art methods. We find that the proposed framework can successfully mitigate dataset bias, as evinced by extensive evaluations on established diversity metrics, while significantly improving fairness metrics such as equality of opportunity.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6808
Author(s):  
Jianqiang Xiao ◽  
Dianbo Ma ◽  
Satoshi Yamane

Despite recent stereo matching algorithms achieving significant results on public benchmarks, the problem of requiring heavy computation remains unsolved. Most works focus on designing an architecture to reduce the computational complexity, while we take aim at optimizing 3D convolution kernels on the Pyramid Stereo Matching Network (PSMNet) for solving the problem. In this paper, we design a series of comparative experiments exploring the performance of well-known convolution kernels on PSMNet. Our model saves the computational complexity from 256.66G MAdd (Multiply-Add operations) to 69.03G MAdd (198.47G MAdd to 10.84G MAdd for only considering 3D convolutional neural networks) without losing accuracy. On Scene Flow and KITTI 2015 datasets, our model achieves results comparable to the state-of-the-art with a low computational cost.


Sign in / Sign up

Export Citation Format

Share Document