MwoA auxiliary diagnosis via RSN-based 3D deep multiple instance learning with spatial attention mechanism

Author(s):  
Xiang Li ◽  
Benzheng Wei ◽  
Tianyang Li ◽  
Na Zhang
2021 ◽  
Author(s):  
Zongbao Liang ◽  
Xing Liu ◽  
Bo Chen ◽  
YunFei Yuan ◽  
Yang Song ◽  
...  

2020 ◽  
Vol 10 (12) ◽  
pp. 4312 ◽  
Author(s):  
Jie Xu ◽  
Haoliang Wei ◽  
Linke Li ◽  
Qiuru Fu ◽  
Jinhong Guo

Video description plays an important role in the field of intelligent imaging technology. Attention perception mechanisms are extensively applied in video description models based on deep learning. Most existing models use a temporal-spatial attention mechanism to enhance the accuracy of models. Temporal attention mechanisms can obtain the global features of a video, whereas spatial attention mechanisms obtain local features. Nevertheless, because each channel of the convolutional neural network (CNN) feature maps has certain spatial semantic information, it is insufficient to merely divide the CNN features into regions and then apply a spatial attention mechanism. In this paper, we propose a temporal-spatial and channel attention mechanism that enables the model to take advantage of various video features and ensures the consistency of visual features between sentence descriptions to enhance the effect of the model. Meanwhile, in order to prove the effectiveness of the attention mechanism, this paper proposes a video visualization model based on the video description. Experimental results show that, our model has achieved good performance on the Microsoft Video Description (MSVD) dataset and a certain improvement on the Microsoft Research-Video to Text (MSR-VTT) dataset.


2020 ◽  
Vol 34 (04) ◽  
pp. 5742-5749
Author(s):  
Xiaoshuang Shi ◽  
Fuyong Xing ◽  
Yuanpu Xie ◽  
Zizhao Zhang ◽  
Lei Cui ◽  
...  

Although attention mechanisms have been widely used in deep learning for many tasks, they are rarely utilized to solve multiple instance learning (MIL) problems, where only a general category label is given for multiple instances contained in one bag. Additionally, previous deep MIL methods firstly utilize the attention mechanism to learn instance weights and then employ a fully connected layer to predict the bag label, so that the bag prediction is largely determined by the effectiveness of learned instance weights. To alleviate this issue, in this paper, we propose a novel loss based attention mechanism, which simultaneously learns instance weights and predictions, and bag predictions for deep multiple instance learning. Specifically, it calculates instance weights based on the loss function, e.g. softmax+cross-entropy, and shares the parameters with the fully connected layer, which is to predict instance and bag predictions. Additionally, a regularization term consisting of learned weights and cross-entropy functions is utilized to boost the recall of instances, and a consistency cost is used to smooth the training process of neural networks for boosting the model generalization performance. Extensive experiments on multiple types of benchmark databases demonstrate that the proposed attention mechanism is a general, effective and efficient framework, which can achieve superior bag and image classification performance over other state-of-the-art MIL methods, with obtaining higher instance precision and recall than previous attention mechanisms. Source codes are available on https://github.com/xsshi2015/Loss-Attention.


Author(s):  
Qiuxia Lai ◽  
Yu Li ◽  
Ailing Zeng ◽  
Minhao Liu ◽  
Hanqiu Sun ◽  
...  

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity. This kind of selectivity acts as an ‘Information Bottleneck (IB)’, which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at this https URL.


2021 ◽  
Vol 8 ◽  
Author(s):  
Weiqing Song ◽  
Pu Huang ◽  
Jing Wang ◽  
Yajuan Shen ◽  
Jian Zhang ◽  
...  

Clinically, red blood cell abnormalities are closely related to tumor diseases, red blood cell diseases, internal medicine, and other diseases. Red blood cell classification is the key to detecting red blood cell abnormalities. Traditional red blood cell classification is done manually by doctors, which requires a lot of manpower produces subjective results. This paper proposes an Attention-based Residual Feature Pyramid Network (ARFPN) to classify 14 types of red blood cells to assist the diagnosis of related diseases. The model performs classification directly on the entire red blood cell image. Meanwhile, a spatial attention mechanism and channel attention mechanism are combined with residual units to improve the expression of category-related features and achieve accurate extraction of features. Besides, the RoI align method is used to reduce the loss of spatial symmetry and improve classification accuracy. Five hundred and eighty eight red blood cell images are used to train and verify the effectiveness of the proposed method. The Channel Attention Residual Feature Pyramid Network (C-ARFPN) model achieves an mAP of 86%; the Channel and Spatial Attention Residual Feature Pyramid Network (CS-ARFPN) model achieves an mAP of 86.9%. The experimental results indicate that our method can classify more red blood cell types and better adapt to the needs of doctors, thus reducing the doctor's time and improving the diagnosis efficiency.


Author(s):  
Bei Bei Fan ◽  
He Yang

The current traffic sign detection technology is disturbed by factors such as illumination changes, weather, and camera angle, which makes it unsatisfactory for traffic sign detection. The traffic sign data set usually contains a large number of small objects, and the scale variance of the object is a huge challenge for traffic indication detection. In response to the above problems, a multi-scale traffic sign detection algorithm based on attention mechanism is proposed. The attention mechanism is composed of channel attention mechanism and spatial attention mechanism. By filtering the background information on redundant contradictions with channel attention mechanism in the network, the information on the network is more accurate, and the performance of the network to recognize the traffic signs is improved. Using spatial attention mechanism, the proposed method pays more attention to the object area in traffic recognition image and suppresses the non-object area or background areas. The model in this paper is validated on the Tsinghua-Tencent 100K data set, and the accuracy of the experiment reached a higher level compared to state-of-the-art approaches in traffic sign detection.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7949
Author(s):  
Mengfan Xue ◽  
Minghao Chen ◽  
Dongliang Peng ◽  
Yunfei Guo ◽  
Huajie Chen

Attention mechanisms have demonstrated great potential in improving the performance of deep convolutional neural networks (CNNs). However, many existing methods dedicate to developing channel or spatial attention modules for CNNs with lots of parameters, and complex attention modules inevitably affect the performance of CNNs. During our experiments of embedding Convolutional Block Attention Module (CBAM) in light-weight model YOLOv5s, CBAM does influence the speed and increase model complexity while reduce the average precision, but Squeeze-and-Excitation (SE) has a positive impact in the model as part of CBAM. To replace the spatial attention module in CBAM and offer a suitable scheme of channel and spatial attention modules, this paper proposes one Spatio-temporal Sharpening Attention Mechanism (SSAM), which sequentially infers intermediate maps along channel attention module and Sharpening Spatial Attention (SSA) module. By introducing sharpening filter in spatial attention module, we propose SSA module with low complexity. We try to find a scheme to combine our SSA module with SE module or Efficient Channel Attention (ECA) module and show best improvement in models such as YOLOv5s and YOLOv3-tiny. Therefore, we perform various replacement experiments and offer one best scheme that is to embed channel attention modules in backbone and neck of the model and integrate SSAM into YOLO head. We verify the positive effect of our SSAM on two general object detection datasets VOC2012 and MS COCO2017. One for obtaining a suitable scheme and the other for proving the versatility of our method in complex scenes. Experimental results on the two datasets show obvious promotion in terms of average precision and detection performance, which demonstrates the usefulness of our SSAM in light-weight YOLO models. Furthermore, visualization results also show the advantage of enhancing positioning ability with our SSAM.


Sign in / Sign up

Export Citation Format

Share Document