MwoA auxiliary diagnosis via RSN-based 3D deep multiple instance learning with spatial attention mechanism

Video description plays an important role in the field of intelligent imaging technology. Attention perception mechanisms are extensively applied in video description models based on deep learning. Most existing models use a temporal-spatial attention mechanism to enhance the accuracy of models. Temporal attention mechanisms can obtain the global features of a video, whereas spatial attention mechanisms obtain local features. Nevertheless, because each channel of the convolutional neural network (CNN) feature maps has certain spatial semantic information, it is insufficient to merely divide the CNN features into regions and then apply a spatial attention mechanism. In this paper, we propose a temporal-spatial and channel attention mechanism that enables the model to take advantage of various video features and ensures the consistency of visual features between sentence descriptions to enhance the effect of the model. Meanwhile, in order to prove the effectiveness of the attention mechanism, this paper proposes a video visualization model based on the video description. Experimental results show that, our model has achieved good performance on the Microsoft Video Description (MSVD) dataset and a certain improvement on the Microsoft Research-Video to Text (MSR-VTT) dataset.

Download Full-text

Enhanced Multi-Channel Feature Synthesis for Hand Gesture Recognition Based on CNN With a Channel and Spatial Attention Mechanism

IEEE Access ◽

10.1109/access.2020.3010063 ◽

2020 ◽

Vol 8 ◽

pp. 144610-144620

Author(s):

Chuan Du ◽

Lei Zhang ◽

Xiping Sun ◽

Junxu Wang ◽

Jialian Sheng

Keyword(s):

Spatial Attention ◽

Gesture Recognition ◽

Hand Gesture Recognition ◽

Attention Mechanism ◽

Hand Gesture

Download Full-text

Loss-Based Attention for Deep Multiple Instance Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6030 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5742-5749

Author(s):

Xiaoshuang Shi ◽

Fuyong Xing ◽

Yuanpu Xie ◽

Zizhao Zhang ◽

Lei Cui ◽

...

Keyword(s):

State Of The Art ◽

Classification Performance ◽

Multiple Instance Learning ◽

Attention Mechanism ◽

Cross Entropy ◽

General Category ◽

Source Codes ◽

Model Generalization ◽

Fully Connected ◽

Entropy Functions

Although attention mechanisms have been widely used in deep learning for many tasks, they are rarely utilized to solve multiple instance learning (MIL) problems, where only a general category label is given for multiple instances contained in one bag. Additionally, previous deep MIL methods firstly utilize the attention mechanism to learn instance weights and then employ a fully connected layer to predict the bag label, so that the bag prediction is largely determined by the effectiveness of learned instance weights. To alleviate this issue, in this paper, we propose a novel loss based attention mechanism, which simultaneously learns instance weights and predictions, and bag predictions for deep multiple instance learning. Specifically, it calculates instance weights based on the loss function, e.g. softmax+cross-entropy, and shares the parameters with the fully connected layer, which is to predict instance and bag predictions. Additionally, a regularization term consisting of learned weights and cross-entropy functions is utilized to boost the recall of instances, and a consistency cost is used to smooth the training process of neural networks for boosting the model generalization performance. Extensive experiments on multiple types of benchmark databases demonstrate that the proposed attention mechanism is a general, effective and efficient framework, which can achieve superior bag and image classification performance over other state-of-the-art MIL methods, with obtaining higher instance precision and recall than previous attention mechanisms. Source codes are available on https://github.com/xsshi2015/Loss-Attention.

Download Full-text

Information Bottleneck Approach to Spatial Attention Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/108 ◽

2021 ◽

Author(s):

Qiuxia Lai ◽

Yu Li ◽

Ailing Zeng ◽

Minhao Liu ◽

Hanqiu Sun ◽

...

Keyword(s):

Spatial Attention ◽

Visual Recognition ◽

Predictive Accuracy ◽

Visual Awareness ◽

Input Image ◽

Attention Mechanism ◽

Natural Scenes ◽

Fine Grained ◽

Selective Visual Attention ◽

Information Bottleneck

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity. This kind of selectivity acts as an ‘Information Bottleneck (IB)’, which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at this https URL.

Download Full-text

Red Blood Cell Classification Based on Attention Residual Feature Pyramid Network

Frontiers in Medicine ◽

10.3389/fmed.2021.741407 ◽

2021 ◽

Vol 8 ◽

Author(s):

Weiqing Song ◽

Pu Huang ◽

Jing Wang ◽

Yajuan Shen ◽

Jian Zhang ◽

...

Keyword(s):

Blood Cell ◽

Spatial Attention ◽

Red Blood Cell ◽

Classification Accuracy ◽

Cell Types ◽

Attention Mechanism ◽

Cell Classification ◽

Tumor Diseases ◽

Cell Image ◽

Feature Pyramid

Clinically, red blood cell abnormalities are closely related to tumor diseases, red blood cell diseases, internal medicine, and other diseases. Red blood cell classification is the key to detecting red blood cell abnormalities. Traditional red blood cell classification is done manually by doctors, which requires a lot of manpower produces subjective results. This paper proposes an Attention-based Residual Feature Pyramid Network (ARFPN) to classify 14 types of red blood cells to assist the diagnosis of related diseases. The model performs classification directly on the entire red blood cell image. Meanwhile, a spatial attention mechanism and channel attention mechanism are combined with residual units to improve the expression of category-related features and achieve accurate extraction of features. Besides, the RoI align method is used to reduce the loss of spatial symmetry and improve classification accuracy. Five hundred and eighty eight red blood cell images are used to train and verify the effectiveness of the proposed method. The Channel Attention Residual Feature Pyramid Network (C-ARFPN) model achieves an mAP of 86%; the Channel and Spatial Attention Residual Feature Pyramid Network (CS-ARFPN) model achieves an mAP of 86.9%. The experimental results indicate that our method can classify more red blood cell types and better adapt to the needs of doctors, thus reducing the doctor's time and improving the diagnosis efficiency.

Download Full-text

Object Tracking Algorithm Based on Channel-interconnection-spatial Attention Mechanism and Siamese Region Proposal Network

10.1145/3487075.3487120 ◽

2021 ◽

Author(s):

Junchang Zhang ◽

Siqi Lei

Keyword(s):

Object Tracking ◽

Spatial Attention ◽

Attention Mechanism ◽

Tracking Algorithm

Download Full-text

Multi-scale traffic sign detection model with attention

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020950054 ◽

2020 ◽

pp. 095440702095005

Author(s):

Bei Bei Fan ◽

He Yang

Keyword(s):

Spatial Attention ◽

Detection Algorithm ◽

Attention Mechanism ◽

Background Information ◽

Traffic Sign ◽

Data Set ◽

Multi Scale ◽

Sign Detection ◽

Object Area ◽

Traffic Sign Detection

The current traffic sign detection technology is disturbed by factors such as illumination changes, weather, and camera angle, which makes it unsatisfactory for traffic sign detection. The traffic sign data set usually contains a large number of small objects, and the scale variance of the object is a huge challenge for traffic indication detection. In response to the above problems, a multi-scale traffic sign detection algorithm based on attention mechanism is proposed. The attention mechanism is composed of channel attention mechanism and spatial attention mechanism. By filtering the background information on redundant contradictions with channel attention mechanism in the network, the information on the network is more accurate, and the performance of the network to recognize the traffic signs is improved. Using spatial attention mechanism, the proposed method pays more attention to the object area in traffic recognition image and suppresses the non-object area or background areas. The model in this paper is validated on the Tsinghua-Tencent 100K data set, and the accuracy of the experiment reached a higher level compared to state-of-the-art approaches in traffic sign detection.

Download Full-text

One Spatio-Temporal Sharpening Attention Mechanism for Light-Weight YOLO Models Based on Sharpening Spatial Attention

Sensors ◽

10.3390/s21237949 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7949

Author(s):

Mengfan Xue ◽

Minghao Chen ◽

Dongliang Peng ◽

Yunfei Guo ◽

Huajie Chen

Keyword(s):

Spatial Attention ◽

Positive Impact ◽

Low Complexity ◽

Attention Mechanism ◽

Model Complexity ◽

Light Weight ◽

Average Precision ◽

Deep Convolutional Neural Networks ◽

Spatio Temporal ◽

General Object

Attention mechanisms have demonstrated great potential in improving the performance of deep convolutional neural networks (CNNs). However, many existing methods dedicate to developing channel or spatial attention modules for CNNs with lots of parameters, and complex attention modules inevitably affect the performance of CNNs. During our experiments of embedding Convolutional Block Attention Module (CBAM) in light-weight model YOLOv5s, CBAM does influence the speed and increase model complexity while reduce the average precision, but Squeeze-and-Excitation (SE) has a positive impact in the model as part of CBAM. To replace the spatial attention module in CBAM and offer a suitable scheme of channel and spatial attention modules, this paper proposes one Spatio-temporal Sharpening Attention Mechanism (SSAM), which sequentially infers intermediate maps along channel attention module and Sharpening Spatial Attention (SSA) module. By introducing sharpening filter in spatial attention module, we propose SSA module with low complexity. We try to find a scheme to combine our SSA module with SE module or Efficient Channel Attention (ECA) module and show best improvement in models such as YOLOv5s and YOLOv3-tiny. Therefore, we perform various replacement experiments and offer one best scheme that is to embed channel attention modules in backbone and neck of the model and integrate SSAM into YOLO head. We verify the positive effect of our SSAM on two general object detection datasets VOC2012 and MS COCO2017. One for obtaining a suitable scheme and the other for proving the versatility of our method in complex scenes. Experimental results on the two datasets show obvious promotion in terms of average precision and detection performance, which demonstrates the usefulness of our SSAM in light-weight YOLO models. Furthermore, visualization results also show the advantage of enhancing positioning ability with our SSAM.

Download Full-text