scholarly journals Learning Attention for Object Tracking with Adversarial Learning Network

2020 ◽  
Author(s):  
Xu Cheng ◽  
Chen Song ◽  
Yongxiang Gu ◽  
Beijing Chen ◽  
Lin Zhou ◽  
...  

Abstract Artificial intelligence has been widely studied on solving intelligent surveillance analysis and security problems in recent years. Although many multimedia security approaches have been proposed by using deep learning network model, there are still some challenges on their performances which deserve in-depth research. On one hand, high computational complexity of current deep learning methods makes it hard to be applied to real-time scenario. On the other hand, it is difficult to obtain the specific features of a video by fine-tuning the network online with the object state of the first frame, which fails to capture rich appearance variations of the object. To solve above two issues, in this paper, an effective object tracking method with learning attention is proposed to achieve the object localization and reduce the training time in adversarial learning framework. First, a prediction network is designed to track the object in video sequences. The object positions of the first ten frames are employed to fine-tune prediction network, which can fully mine a specific features of an object. Second, the prediction network is integrated into the generative adversarial network framework, which randomly generates masks to capture object appearance variations via adaptively dropout input features. Third, we present a spatial attention mechanism to improve the tracking performance. The proposed network can identify the mask that maintains the most robust features of the objects over a long temporal span. Extensive experiments on two large-scale benchmarks demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.

2020 ◽  
Author(s):  
Xu Cheng ◽  
Chen Song ◽  
Yongxiang Gu ◽  
Beijing Chen

Abstract Artificial intelligence has been widely studied on solving intelligent surveillance analysis and security problems in recent years. Although many multimedia security approaches have been proposed by using deep learning network model, there are still some challenges on their performances which deserve in-depth research. On one hand, high computational complexity of current deep learning methods makes it hard to be applied to real-time scenario. On the other hand, it is difficult to obtain the specific features of a video by fine-tuning the network online with the object state of the first frame, which fails to capture rich appearance variations of the object. To solve above two issues, in this paper, an effective object tracking method with learning attention is proposed to achieve the object localization and reduce the training time in adversarial learning framework. First, a prediction network is designed to track the object in video sequences. The object positions of the first ten frames are employed to fine-tune prediction network, which can fully mine a specific features of an object. Second, the prediction network is integrated into the generative adversarial network framework, which randomly generates masks to capture object appearance variations via adaptively dropout input features. Third, we present a spatial attention mechanism to improve the tracking performance. The proposed network can identify the mask that maintains the most robust features of the objects over a long temporal span. Extensive experiments on two large-scale benchmarks demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.


2020 ◽  
Author(s):  
Xu Cheng ◽  
Chen Song ◽  
Yongxiang Gu ◽  
Beijing Chen

Abstract Artificial intelligence has been widely studied on solving intelligent surveillance analysis and security problems in recent years. Although many multimedia security approaches have been proposed by using deep learning network model, there are still some challenges on their performances which deserve in-depth research. On one hand, high computational complexity of current deep learning methods makes it hard to be applied to real-time scenario. On the other hand, it is difficult to obtain the specific features of a video by fine-tuning the network online with the object state of the first frame, which fails to capture rich appearance variations of the object. To solve above two issues, in this paper, an effective object tracking method with learning attention is proposed to achieve the object localization and reduce the training time in adversarial learning framework. First, a prediction network is designed to track the object in video sequences. The object positions of the first ten frames are employed to fine-tune prediction network, which can fully mine a specific features of an object. Second, the prediction network is integrated into the generative adversarial network framework, which randomly generates masks to capture object appearance variations via adaptively dropout input features. Third, we present a spatial attention mechanism to improve the tracking performance. The proposed network can identify the mask that maintains the most robust features of the objects over a long temporal span. Extensive experiments on two large-scale benchmarks demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.


2020 ◽  
Vol 2020 (1) ◽  
Author(s):  
Xu Cheng ◽  
Chen Song ◽  
Yongxiang Gu ◽  
Beijing Chen

Abstract Artificial intelligence has been widely studied on solving intelligent surveillance analysis and security problems in recent years. Although many multimedia security approaches have been proposed by using deep learning network model, there are still some challenges on their performances which deserve in-depth research. On the one hand, high computational complexity of current deep learning methods makes it hard to be applied to real-time scenario. On the other hand, it is difficult to obtain the specific features of a video by fine-tuning the network online with the object state of the first frame, which fails to capture rich appearance variations of the object. To solve above two issues, in this paper, an effective object tracking method with learning attention is proposed to achieve the object localization and reduce the training time in adversarial learning framework. First, a prediction network is designed to track the object in video sequences. The object positions of the first ten frames are employed to fine-tune prediction network, which can fully mine a specific features of an object. Second, the prediction network is integrated into the generative adversarial network framework, which randomly generates masks to capture object appearance variations via adaptively dropout input features. Third, we present a spatial attention mechanism to improve the tracking performance. The proposed network can identify the mask that maintains the most robust features of the objects over a long temporal span. Extensive experiments on two large-scale benchmarks demonstrate that the proposed algorithm performs favorably against state-of-the-art methods.


2020 ◽  
Vol 31 (6) ◽  
pp. 681-689
Author(s):  
Jalal Mirakhorli ◽  
Hamidreza Amindavar ◽  
Mojgan Mirakhorli

AbstractFunctional magnetic resonance imaging a neuroimaging technique which is used in brain disorders and dysfunction studies, has been improved in recent years by mapping the topology of the brain connections, named connectopic mapping. Based on the fact that healthy and unhealthy brain regions and functions differ slightly, studying the complex topology of the functional and structural networks in the human brain is too complicated considering the growth of evaluation measures. One of the applications of irregular graph deep learning is to analyze the human cognitive functions related to the gene expression and related distributed spatial patterns. Since a variety of brain solutions can be dynamically held in the neuronal networks of the brain with different activity patterns and functional connectivity, both node-centric and graph-centric tasks are involved in this application. In this study, we used an individual generative model and high order graph analysis for the region of interest recognition areas of the brain with abnormal connection during performing certain tasks and resting-state or decompose irregular observations. Accordingly, a high order framework of Variational Graph Autoencoder with a Gaussian distributer was proposed in the paper to analyze the functional data in brain imaging studies in which Generative Adversarial Network is employed for optimizing the latent space in the process of learning strong non-rigid graphs among large scale data. Furthermore, the possible modes of correlations were distinguished in abnormal brain connections. Our goal was to find the degree of correlation between the affected regions and their simultaneous occurrence over time. We can take advantage of this to diagnose brain diseases or show the ability of the nervous system to modify brain topology at all angles and brain plasticity according to input stimuli. In this study, we particularly focused on Alzheimer’s disease.


2020 ◽  
Author(s):  
Yuan Yuan ◽  
Lei Lin

Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 1.91% to 6.69%. <div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>


2020 ◽  
Author(s):  
Tuan Pham

Chest X-rays have been found to be very promising for assessing COVID-19 patients, especially for resolving emergency-department and urgent-care-center overcapacity. Deep-learning (DL) methods in artificial intelligence (AI) play a dominant role as high-performance classifiers in the detection of the disease using chest X-rays. While many new DL models have been being developed for this purpose, this study aimed to investigate the fine tuning of pretrained convolutional neural networks (CNNs) for the classification of COVID-19 using chest X-rays. Three pretrained CNNs, which are AlexNet, GoogleNet, and SqueezeNet, were selected and fine-tuned without data augmentation to carry out 2-class and 3-class classification tasks using 3 public chest X-ray databases. In comparison with other recently developed DL models, the 3 pretrained CNNs achieved very high classification results in terms of accuracy, sensitivity, specificity, precision, F1 score, and area under the receiver-operating-characteristic curve. AlexNet, GoogleNet, and SqueezeNet require the least training time among pretrained DL models, but with suitable selection of training parameters, excellent classification results can be achieved without data augmentation by these networks. The findings contribute to the urgent need for harnessing the pandemic by facilitating the deployment of AI tools that are fully automated and readily available in the public domain for rapid implementation.


Database ◽  
2019 ◽  
Vol 2019 ◽  
Author(s):  
Tao Chen ◽  
Mingfen Wu ◽  
Hexi Li

Abstract The automatic extraction of meaningful relations from biomedical literature or clinical records is crucial in various biomedical applications. Most of the current deep learning approaches for medical relation extraction require large-scale training data to prevent overfitting of the training model. We propose using a pre-trained model and a fine-tuning technique to improve these approaches without additional time-consuming human labeling. Firstly, we show the architecture of Bidirectional Encoder Representations from Transformers (BERT), an approach for pre-training a model on large-scale unstructured text. We then combine BERT with a one-dimensional convolutional neural network (1d-CNN) to fine-tune the pre-trained model for relation extraction. Extensive experiments on three datasets, namely the BioCreative V chemical disease relation corpus, traditional Chinese medicine literature corpus and i2b2 2012 temporal relation challenge corpus, show that the proposed approach achieves state-of-the-art results (giving a relative improvement of 22.2, 7.77, and 38.5% in F1 score, respectively, compared with a traditional 1d-CNN classifier). The source code is available at https://github.com/chentao1999/MedicalRelationExtraction.


Author(s):  
C. Xiao ◽  
A. Yilmaz ◽  
S. Lia

Despite having achieved good performance, visual tracking is still an open area of research, especially when target undergoes serious appearance changes which are not included in the model. So, in this paper, we replace the appearance model by a concept model which is learned from large-scale datasets using a deep learning network. The concept model is a combination of high-level semantic information that is learned from myriads of objects with various appearances. In our tracking method, we generate the target’s concept by combining the learned object concepts from classification task. We also demonstrate that the last convolutional feature map can be used to generate a heat map to highlight the possible location of the given target in new frames. Finally, in the proposed tracking framework, we utilize the target image, the search image cropped from the new frame and their heat maps as input into a localization network to find the final target position. Compared to the other state-of-the-art trackers, the proposed method shows the comparable and at times better performance in real-time.


2020 ◽  
Author(s):  
Yuan Yuan ◽  
Lei Lin

<div>Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data is scarce. To address this problem, we propose a novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pre-training is completed, the pre-trained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed method, leading to a classification accuracy increment up to 2.38% to 5.27%. The code and the pre-trained model will be available at https://github.com/linlei1214/SITS-BERT upon publication.</div><div><b>This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.</b></div>


2019 ◽  
Vol 2019 ◽  
pp. 1-9
Author(s):  
Han Zhang ◽  
Yuanbo Guo ◽  
Tao Li

In order to obtain high quality and large-scale labelled data for information security research, we propose a new approach that combines a generative adversarial network with the BiLSTM-Attention-CRF model to obtain labelled data from crowd annotations. We use the generative adversarial network to find common features in crowd annotations and then consider them in conjunction with the domain dictionary feature and sentence dependency feature as additional features to be introduced into the BiLSTM-Attention-CRF model, which is then used to carry out named entity recognition in crowdsourcing. Finally, we create a dataset to evaluate our models using information security data. The experimental results show that our model has better performance than the other baseline models.


Sign in / Sign up

Export Citation Format

Share Document