feature map
Recently Published Documents


TOTAL DOCUMENTS

692
(FIVE YEARS 230)

H-INDEX

30
(FIVE YEARS 6)

Author(s):  
Zhenhua Huang ◽  
Shunzhi Yang ◽  
Meng Chu Zhou ◽  
Zhetao Li ◽  
Zheng Gong ◽  
...  

2022 ◽  
Author(s):  
Jian hua Yang ◽  
Ke Wang ◽  
Ruifeng LI ◽  
Petra Perner
Keyword(s):  

Sensors ◽  
2021 ◽  
Vol 22 (1) ◽  
pp. 98
Author(s):  
Muksimova Shakhnoza ◽  
Umirzakova Sabina ◽  
Mardieva Sevara ◽  
Young-Im Cho

A fire is an extraordinary event that can damage property and have a notable effect on people’s lives. However, the early detection of smoke and fire has been identified as a challenge in many recent studies. Therefore, different solutions have been proposed to approach the timely detection of fire events and avoid human casualties. As a solution, we used an affordable visual detection system. This method is possibly effective because early fire detection is recognized. In most developed countries, CCTV surveillance systems are installed in almost every public location to take periodic images of a specific area. Notwithstanding, cameras are used under different types of ambient light, and they experience occlusions, distortions of view, and changes in the resulting images from different camera angles and the different seasons of the year, all of which affect the accuracy of currently established models. To address these problems, we developed an approach based on an attention feature map used in a capsule network designed to classify fire and smoke locations at different distances outdoors, given only an image of a single fire and smoke as input. The proposed model was designed to solve two main limitations of the base capsule network input and the analysis of large-sized images, as well as to compensate the absence of a deep network using an attention-based approach to improve the classification of the fire and smoke results. In term of practicality, our method is comparable with prior strategies based on machine learning and deep learning methods. We trained and tested the proposed model using our datasets collected from different sources. As the results indicate, a high classification accuracy in comparison with other modern architectures was achieved. Further, the results indicate that the proposed approach is robust and stable for the classification of images from outdoor CCTV cameras with different viewpoints given the presence of smoke and fire.


2021 ◽  
Vol 2132 (1) ◽  
pp. 012010
Author(s):  
Guorong Xie ◽  
Rongqi Jiang ◽  
Yi Qu

Abstract To alleviate the occlusion problem in a single object tracking scene, this paper proposes an ECO-MHDU object tracking algorithm with a more powerful anti-occlusion performance based on the ECO tracker. The algorithm first uses the pre-trained MobileNetV3 lightweight backbone network on the ImageNet dataset to replace the ResNet network in the ECO to increase the speed of the algorithm to obtain the shallow and deep feature information of the image, while effectively using the attention mechanism in the MobileNetV3 network to strengthen the algorithm’s ability to extract target features; secondly, use the DropBlock operation on the acquired feature map to generate a random continuous mask on the feature map channel to improve the algorithm’s learning of the global robust spatial structure information; finally, a confidence update strategy is introduced into the GMM sample generation space. To improve the quality of training samples, unreliable tracking states such as confidence detection and occlusion are designed to avoid updating the sample space with damaging information. Compared with the ECO algorithm, the ECO-MHDU algorithm proposed in this paper has a success rate of 68.0% on the occlusion attributes of the OTB100 dataset, which is 2.3% higher than the ECO algorithm, and the ECO-MHDU algorithm also showed the best performance on the entire dataset sequence, with a success rate of 69.3%.


2021 ◽  
Vol E104.D (12) ◽  
pp. 2040-2047
Author(s):  
Akira JINGUJI ◽  
Shimpei SATO ◽  
Hiroki NAKAHARA
Keyword(s):  
Low Cost ◽  

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Min Huang ◽  
Cong Cheng ◽  
Gennaro De Luca

Remote sensing images are often of low quality due to the limitations of the equipment, resulting in poor image accuracy, and it is extremely difficult to identify the target object when it is blurred or small. The main challenge is that objects in sensing images have very few pixels. Traditional convolutional networks are complicated to extract enough information through local convolution and are easily disturbed by noise points, so they are usually not ideal for classifying and diagnosing small targets. The current solution is to process the feature map information at multiple scales, but this method does not consider the supplementary effect of the context information of the feature map on the semantics. In this work, in order to enable CNNs to make full use of context information and improve its representation ability, we propose a residual attention function fusion method, which improves the representation ability of feature maps by fusing contextual feature map information of different scales, and then propose a spatial attention mechanism for global pixel point convolution response. This method compresses global pixels through convolution, weights the original feature map pixels, reduces noise interference, and improves the network’s ability to grasp global critical pixel information. In experiments, the remote sensing ship image recognition experiments on remote sensing image data sets show that the network structure can improve the performance of small-target detection. The results on cifar10 and cifar100 prove that the attention mechanism is universal and practical.


Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 7842
Author(s):  
Linlu Zu ◽  
Yanping Zhao ◽  
Jiuqin Liu ◽  
Fei Su ◽  
Yan Zhang ◽  
...  

Since the mature green tomatoes have color similar to branches and leaves, some are shaded by branches and leaves, and overlapped by other tomatoes, the accurate detection and location of these tomatoes is rather difficult. This paper proposes to use the Mask R-CNN algorithm for the detection and segmentation of mature green tomatoes. A mobile robot is designed to collect images round-the-clock and with different conditions in the whole greenhouse, thus, to make sure the captured dataset are not only objects with the interest of users. After the training process, RestNet50-FPN is selected as the backbone network. Then, the feature map is trained through the region proposal network to generate the region of interest (ROI), and the ROIAlign bilinear interpolation is used to calculate the target region, such that the corresponding region in the feature map is pooled to a fixed size based on the position coordinates of the preselection box. Finally, the detection and segmentation of mature green tomatoes is realized by the parallel actions of ROI target categories, bounding box regression and mask. When the Intersection over Union is equal to 0.5, the performance of the trained model is the best. The experimental results show that the F1-Score of bounding box and mask region all achieve 92.0%. The image acquisition processes are fully unobservable, without any user preselection, which are a highly heterogenic mix, the selected Mask R-CNN algorithm could also accurately detect mature green tomatoes. The performance of this proposed model in a real greenhouse harvesting environment is also evaluated, thus facilitating the direct application in a tomato harvesting robot.


2021 ◽  
Author(s):  
◽  
Ibrahim Mohammad Hussain Rahman

<p>The human visual attention system (HVA) encompasses a set of interconnected neurological modules that are responsible for analyzing visual stimuli by attending to those regions that are salient. Two contrasting biological mechanisms exist in the HVA systems; bottom-up, data-driven attention and top-down, task-driven attention. The former is mostly responsible for low-level instinctive behaviors, while the latter is responsible for performing complex visual tasks such as target object detection.  Very few computational models have been proposed to model top-down attention, mainly due to three reasons. The first is that the functionality of top-down process involves many influential factors. The second reason is that there is a diversity in top-down responses from task to task. Finally, many biological aspects of the top-down process are not well understood yet.  For the above reasons, it is difficult to come up with a generalized top-down model that could be applied to all high level visual tasks. Instead, this thesis addresses some outstanding issues in modelling top-down attention for one particular task, target object detection. Target object detection is an essential step for analyzing images to further perform complex visual tasks. Target object detection has not been investigated thoroughly when modelling top-down saliency and hence, constitutes the may domain application for this thesis.  The thesis will investigate methods to model top-down attention through various high-level data acquired from images. Furthermore, the thesis will investigate different strategies to dynamically combine bottom-up and top-down processes to improve the detection accuracy, as well as the computational efficiency of the existing and new visual attention models. The following techniques and approaches are proposed to address the outstanding issues in modelling top-down saliency:  1. A top-down saliency model that weights low-level attentional features through contextual knowledge of a scene. The proposed model assigns weights to features of a novel image by extracting a contextual descriptor of the image. The contextual descriptor plays the role of tuning the weighting of low-level features to maximize detection accuracy. By incorporating context into the feature weighting mechanism we improve the quality of the assigned weights to these features.  2. Two modules of target features combined with contextual weighting to improve detection accuracy of the target object. In this proposed model, two sets of attentional feature weights are learned, one through context and the other through target features. When both sources of knowledge are used to model top-down attention, a drastic increase in detection accuracy is achieved in images with complex backgrounds and a variety of target objects.  3. A top-down and bottom-up attention combination model based on feature interaction. This model provides a dynamic way for combining both processes by formulating the problem as feature selection. The feature selection exploits the interaction between these features, yielding a robust set of features that would maximize both the detection accuracy and the overall efficiency of the system.  4. A feature map quality score estimation model that is able to accurately predict the detection accuracy score of any previously novel feature map without the need of groundtruth data. The model extracts various local, global, geometrical and statistical characteristic features from a feature map. These characteristics guide a regression model to estimate the quality of a novel map.  5. A dynamic feature integration framework for combining bottom-up and top-down saliencies at runtime. If the estimation model is able to predict the quality score of any novel feature map accurately, then it is possible to perform dynamic feature map integration based on the estimated value. We propose two frameworks for feature map integration using the estimation model. The proposed integration framework achieves higher human fixation prediction accuracy with minimum number of feature maps than that achieved by combining all feature maps.  The proposed works in this thesis provide new directions in modelling top-down saliency for target object detection. In addition, dynamic approaches for top-down and bottom-up combination show considerable improvements over existing approaches in both efficiency and accuracy.</p>


Sign in / Sign up

Export Citation Format

Share Document