Visual Object Multimodality Tracking Based on Correlation Filters for Edge Computing

In recent years, visual object tracking has become a very active research field which is mainly divided into the correlation filter-based tracking and deep learning (e.g., deep convolutional neural network and Siamese neural network) based tracking. For target tracking algorithms based on deep learning, a large amount of computation is required, usually deployed on expensive graphics cards. However, for the rich monitoring devices in the Internet of Things, it is difficult to capture all the moving targets in each device in real time, so it is necessary to perform hierarchical processing and use tracking based on correlation filtering in insensitive areas to alleviate the local computing pressure. In sensitive areas, upload the video stream to a cloud computing platform with a faster computing speed to perform an algorithm based on deep features. In this paper, we mainly focus on the correlation filter-based tracking. In the correlation filter-based tracking, the discriminative scale space tracker (DSST) is one of the most popular and typical ones which is successfully applied to many application fields. However, there are still some improvements that need to be further studied for DSST. One is that the algorithms do not consider the target rotation on purpose. The other is that it is a very heavy computational load to extract the histogram of oriented gradient (HOG) features from too many patches centered at the target position in order to ensure the scale estimation accuracy. To address these two problems, we introduce the alterable patch number for target scale tracking and the space searching for target rotation tracking into the standard DSST tracking method and propose a visual object multimodality tracker based on correlation filters (MTCF) to simultaneously cope with translation, scale, and rotation in plane for the tracked target and to obtain the target information of position, scale, and attitude angle at the same time. Finally, in Visual Tracker Benchmark data set, the experiments are performed on the proposed algorithms to show their effectiveness in multimodality tracking.

Download Full-text

RISC-V Virtual Platform-Based Convolutional Neural Network Accelerator Implemented in SystemC

Electronics ◽

10.3390/electronics10131514 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1514

Author(s):

Seung-Ho Lim ◽

WoonSik William Suh ◽

Jin-Young Kim ◽

Sang-Young Cho

Keyword(s):

Neural Network ◽

Deep Learning ◽

Network Model ◽

Neural Network Model ◽

Deep Neural Network ◽

System Level ◽

Neural Network Models ◽

Data Set ◽

Embedded Device ◽

Virtual Platform

The optimization for hardware processor and system for performing deep learning operations such as Convolutional Neural Networks (CNN) in resource limited embedded devices are recent active research area. In order to perform an optimized deep neural network model using the limited computational unit and memory of an embedded device, it is necessary to quickly apply various configurations of hardware modules to various deep neural network models and find the optimal combination. The Electronic System Level (ESL) Simulator based on SystemC is very useful for rapid hardware modeling and verification. In this paper, we designed and implemented a Deep Learning Accelerator (DLA) that performs Deep Neural Network (DNN) operation based on the RISC-V Virtual Platform implemented in SystemC in order to enable rapid and diverse analysis of deep learning operations in an embedded device based on the RISC-V processor, which is a recently emerging embedded processor. The developed RISC-V based DLA prototype can analyze the hardware requirements according to the CNN data set through the configuration of the CNN DLA architecture, and it is possible to run RISC-V compiled software on the platform, can perform a real neural network model like Darknet. We performed the Darknet CNN model on the developed DLA prototype, and confirmed that computational overhead and inference errors can be analyzed with the DLA prototype developed by analyzing the DLA architecture for various data sets.

Download Full-text

A Deep Learning Model for Quick and Accurate Rock Recognition with Smartphones

Mobile Information Systems ◽

10.1155/2020/7462524 ◽

2020 ◽

Vol 2020 ◽

pp. 1-14

Author(s):

Guangpeng Fan ◽

Feixiang Chen ◽

Danyu Chen ◽

Yan Li ◽

Yanqi Dong

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Test Data ◽

Smartphone Application ◽

Geological Survey ◽

Data Set ◽

Recognition Model ◽

Communication Device ◽

Rock Lithology

In the geological survey, the recognition and classification of rock lithology are an important content. The recognition method based on rock thin section leads to long recognition period and high recognition cost, and the recognition accuracy cannot be guaranteed. Moreover, the above method cannot provide an effective solution in the field. As a communication device with multiple sensors, smartphones are carried by most geological survey workers. In this paper, a smartphone application based on the convolutional neural network is developed. In this application, the phone’s camera can be used to take photos of rocks. And the types and lithology of rocks can be quickly and accurately identified in a very short time. This paper proposed a method for quickly and accurately recognizing rock lithology in the field. Based on ShuffleNet, a lightweight convolutional neural network used in deep learning, combined with the transfer learning method, the recognition model of the rock image was established. The trained model was then deployed to the smartphone. A smartphone application for identifying rock lithology was designed and developed to verify its usability and accuracy. The research results showed that the accuracy of the recognition model in this paper was 97.65% on the verification data set of the PC. The accuracy of recognition on the test data set of the smartphone was 95.30%, among which the average recognition time of the single sheet was 786 milliseconds, the maximum value was 1,045 milliseconds, and the minimum value was 452 milliseconds. And the single-image accuracy above 96% accounted for 95% of the test data set. This paper presented a new solution for the rapid and accurate recognition of rock lithology in field geological surveys, which met the needs of geological survey personnel to quickly and accurately identify rock lithology in field operations.

Download Full-text

Utilizing the Road Mark Training Set from Ground-Based Mapping System to Airborne Imagery in Deep Learning Framework

Abstracts of the ICA ◽

10.5194/ica-abs-1-364-2019 ◽

2019 ◽

Vol 1 ◽

pp. 1-1

Author(s):

Tee-Ann Teo

Keyword(s):

Neural Network ◽

Deep Learning ◽

Spatial Resolution ◽

Image Features ◽

Training Data ◽

Training Set ◽

Data Set ◽

The Road ◽

Mapping System ◽

Close Range

<p><strong>Abstract.</strong> Deep Learning is a kind of Machine Learning technology which utilizing the deep neural network to learn a promising model from a large training data set. Convolutional Neural Network (CNN) has been successfully applied in image segmentation and classification with high accuracy results. The CNN applies multiple kernels (also called filters) to extract image features via image convolution. It is able to determine multiscale features through the multiple layers of convolution and pooling processes. The variety of training data plays an important role to determine a reliable CNN model. The benchmarking training data for road mark extraction is mainly focused on close-range imagery because it is easier to obtain a close-range image rather than an airborne image. For example, KITTI Vision Benchmark Suite. This study aims to transfer the road mark training data from mobile lidar system to aerial orthoimage in Fully Convolutional Networks (FCN). The transformation of the training data from ground-based system to airborne system may reduce the effort of producing a large training data set.</p><p>This study uses FCN technology and aerial orthoimage to localize road marks on the road regions. The road regions are first extracted from 2-D large-scale vector map. The input aerial orthoimage is 10&thinsp;cm spatial resolution and the non-road regions are masked out before the road mark localization. The training data are road mark’s polygons, which are originally digitized from ground-based mobile lidar and prepared for the road mark extraction using mobile mapping system. This study reuses these training data and applies them for the road mark extraction using aerial orthoimage. The digitized training road marks are then transformed to road polygon based on mapping coordinates. As the detail of ground-based lidar is much better than the airborne system, the partially occulted parking lot in aerial orthoimage can also be obtained from the ground-based system. The labels (also called annotations) for FCN include road region, non-regions and road mark. The size of a training batch is 500&thinsp;pixel by 500&thinsp;pixel (50&thinsp;m by 50&thinsp;m on the ground), and the total number of training batches for training is 75 batches. After the FCN training stage, an independent aerial orthoimage (Figure 1a) is applied to predict the road marks. The results of FCN provide initial regions for road marks (Figure 1b). Usually, road marks show higher reflectance than road asphalts. Therefore, this study uses this characteristic to refine the road marks (Figure 1c) by a binary classification inside the initial road mark’s region.</p><p>To compare the automatically extracted road marks (Figure 1c) and manually digitized road marks (Figure 1d), most road marks can be extracted using the training set from ground-based system. This study also selects an area of 600&thinsp;m&thinsp;&times;&thinsp;200&thinsp;m in quantitative analysis. Among the 371 reference road marks, 332 can be extracted from proposed scheme, and the completeness reached 89%. The preliminary experiment demonstrated that most road marks can be successfully extracted by the proposed scheme. Therefore, the training data from the ground-based mapping system can be utilized in airborne orthoimage in similar spatial resolution.</p>

Download Full-text

A convolutional neural network for predicting transcriptional regulators of genes in Arabidopsis transcriptome data reveals classification based on positive regulatory interactions

10.1101/618926 ◽

2019 ◽

Cited By ~ 3

Author(s):

Dan MacLean

Keyword(s):

Neural Network ◽

Gene Expression ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Convolutional Neural Networks ◽

Expression Profiles ◽

Biological Data ◽

Expression Data ◽

Data Set

AbstractGene Regulatory networks that control gene expression are widely studied yet the interactions that make them up are difficult to predict from high throughput data. Deep Learning methods such as convolutional neural networks can perform surprisingly good classifications on a variety of data types and the matrix-like gene expression profiles would seem to be ideal input data for deep learning approaches. In this short study I compiled training sets of expression data using the Arabidopsis AtGenExpress global stress expression data set and known transcription factor-target interactions from the Arabidopsis PLACE database. I built and optimised convolutional neural networks with a best model providing 95 % accuracy of classification on a held-out validation set. Investigation of the activations within this model revealed that classification was based on positive correlation of expression profiles in short sections. This result shows that a convolutional neural network can be used to make classifications and reveal the basis of those calssifications for gene expression data sets, indicating that a convolutional neural network is a useful and interpretable tool for exploratory classification of biological data. The final model is available for download and as a web application.

Download Full-text

A Deep Recurrent Neural Network with Gated Momentum Unit for CT Image Reconstruction

10.36227/techrxiv.15066138 ◽

2021 ◽

Author(s):

Masaki Ikuta

Keyword(s):

Neural Network ◽

Deep Learning ◽

Image Reconstruction ◽

Image Data ◽

Observation Data ◽

Ct Image ◽

Image Study ◽

Data Set ◽

Image Domain ◽

Ct Image Reconstruction

<div><div><div><p>Many algorithms and methods have been proposed for Computed Tomography (CT) image reconstruction, partic- ularly with the recent surge of interest in machine learning and deep learning methods. The majority of recently proposed methods are, however, limited to the image domain processing where deep learning is used to learn the mapping from a noisy image data set to a true image data set. While deep learning-based methods can produce higher quality images than conventional model-based post-processing algorithms, these methods have lim- itations. Deep learning-based methods used in the image domain are not sufficient for compensating for lost information during a forward and a backward projection in CT image reconstruction especially with a presence of high noise. In this paper, we propose a new Recurrent Neural Network (RNN) architecture for CT image reconstruction. We propose the Gated Momentum Unit (GMU) that has been extended from the Gated Recurrent Unit (GRU) but it is specifically designed for image processing inverse problems. This new RNN cell performs an iterative optimization with an accelerated convergence. The GMU has a few gates to regulate information flow where the gates decide to keep important long-term information and discard insignificant short- term detail. Besides, the GMU has a likelihood term and a prior term analogous to the Iterative Reconstruction (IR). This helps ensure estimated images are consistent with observation data while the prior term makes sure the likelihood term does not overfit each individual observation data. We conducted a synthetic image study along with a real CT image study to demonstrate this proposed method achieved the highest level of Peak Signal to Noise Ratio (PSNR) and Structure Similarity (SSIM). Also, we showed this algorithm converged faster than other well-known methods.</p></div></div></div>

Download Full-text

Benchmarking Deep Trackers on Aerial Videos

Sensors ◽

10.3390/s20020547 ◽

2020 ◽

Vol 20 (2) ◽

pp. 547

Author(s):

Abu Md Niamul Taufique ◽

Breton Minnehan ◽

Andreas Savakis

Keyword(s):

Deep Learning ◽

Ground Level ◽

Visual Object ◽

Camera Motion ◽

Correlation Filters ◽

Advantages And Disadvantages ◽

Learning Techniques ◽

Benchmark Datasets ◽

Siamese Networks ◽

Aerial Tracking

In recent years, deep learning-based visual object trackers have achieved state-of-the-art performance on several visual object tracking benchmarks. However, most tracking benchmarks are focused on ground level videos, whereas aerial tracking presents a new set of challenges. In this paper, we compare ten trackers based on deep learning techniques on four aerial datasets. We choose top performing trackers utilizing different approaches, specifically tracking by detection, discriminative correlation filters, Siamese networks and reinforcement learning. In our experiments, we use a subset of OTB2015 dataset with aerial style videos; the UAV123 dataset without synthetic sequences; the UAV20L dataset, which contains 20 long sequences; and DTB70 dataset as our benchmark datasets. We compare the advantages and disadvantages of different trackers in different tracking situations encountered in aerial data. Our findings indicate that the trackers perform significantly worse in aerial datasets compared to standard ground level videos. We attribute this effect to smaller target size, camera motion, significant camera rotation with respect to the target, out of view movement, and clutter in the form of occlusions or similar looking distractors near tracked object.

Download Full-text

Fish Detection Using Deep Learning

Applied Computational Intelligence and Soft Computing ◽

10.1155/2020/3738108 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 3

Author(s):

Suxia Cui ◽

Yu Zhou ◽

Yonghui Wang ◽

Lujun Zhai

Keyword(s):

Neural Network ◽

Deep Learning ◽

Data Augmentation ◽

Autonomous Underwater Vehicle ◽

Training Data ◽

Training Process ◽

Data Set ◽

Speed Up ◽

Fish Detection ◽

Process Speed

Recently, human being’s curiosity has been expanded from the land to the sky and the sea. Besides sending people to explore the ocean and outer space, robots are designed for some tasks dangerous for living creatures. Take the ocean exploration for an example. There are many projects or competitions on the design of Autonomous Underwater Vehicle (AUV) which attracted many interests. Authors of this article have learned the necessity of platform upgrade from a previous AUV design project, and would like to share the experience of one task extension in the area of fish detection. Because most of the embedded systems have been improved by fast growing computing and sensing technologies, which makes them possible to incorporate more and more complicated algorithms. In an AUV, after acquiring surrounding information from sensors, how to perceive and analyse corresponding information for better judgement is one of the challenges. The processing procedure can mimic human being’s learning routines. An advanced system with more computing power can facilitate deep learning feature, which exploit many neural network algorithms to simulate human brains. In this paper, a convolutional neural network (CNN) based fish detection method was proposed. The training data set was collected from the Gulf of Mexico by a digital camera. To fit into this unique need, three optimization approaches were applied to the CNN: data augmentation, network simplification, and training process speed up. Data augmentation transformation provided more learning samples; the network was simplified to accommodate the artificial neural network; the training process speed up is introduced to make the training process more time efficient. Experimental results showed that the proposed model is promising, and has the potential to be extended to other underwear objects.

Download Full-text

Unsupervised Representation High-Resolution Remote Sensing Image Scene Classification via Contrastive Learning Convolutional Neural Network

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.8.577 ◽

2021 ◽

Vol 87 (8) ◽

pp. 577-591

Author(s):

Fengpeng Li ◽

Jiabao Li ◽

Wei Han ◽

Ruyi Feng ◽

Lizhe Wang

Keyword(s):

Neural Network ◽

Remote Sensing ◽

Deep Learning ◽

High Resolution ◽

Convolutional Neural Network ◽

State Of The Art ◽

Remote Sensing Image ◽

Scene Classification ◽

Data Set ◽

Unsupervised Deep Learning

Inspired by the outstanding achievement of deep learning, supervised deep learning representation methods for high-spatial-resolution remote sensing image scene classification obtained state-of-the-art performance. However, supervised deep learning representation methods need a considerable amount of labeled data to capture class-specific features, limiting the application of deep learning-based methods while there are a few labeled training samples. An unsupervised deep learning representation, high-resolution remote sensing image scene classification method is proposed in this work to address this issue. The proposed method, called contrastive learning, narrows the distance between positive views: color channels belonging to the same images widens the gaps between negative view pairs consisting of color channels from different images to obtain class-specific data representations of the input data without any supervised information. The classifier uses extracted features by the convolutional neural network (CNN)-based feature extractor with labeled information of training data to set space of each category and then, using linear regression, makes predictions in the testing procedure. Comparing with existing unsupervised deep learning representation high-resolution remote sensing image scene classification methods, contrastive learning CNN achieves state-of-the-art performance on three different scale benchmark data sets: small scale RSSCN7 data set, midscale aerial image data set, and large-scale NWPU-RESISC45 data set.

Download Full-text

Aerodynamic Prediction on the Off-Design Performance of a S-CO2 Turbine Based on Deep Learning

10.1115/gt2021-60056 ◽

2021 ◽

Author(s):

Yuqi Wang ◽

Tianyuan Liu ◽

Di Zhang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Performance Prediction ◽

Aerodynamic Performance ◽

Operating Conditions ◽

Training Data ◽

Brayton Cycle ◽

Data Set ◽

Set Size ◽

The Mean

Abstract The research on the supercritical carbon dioxide (S-CO2) Brayton cycle has gradually become a hot spot in recent years. The off-design performance of turbine is an important reference for analyzing the variable operating conditions of the cycle. With the development of deep learning technology, the research of surrogate models based on neural network has received extensive attention. In order to improve the inefficiency in traditional off-design analyses, this research establishes a data-driven deep learning off-design aerodynamic prediction model for a S-CO2 centrifugal turbine, which is based on a deep convolutional neural network. The network can rapidly and adaptively provide dynamic aerodynamic performance prediction results for varying blade profiles and operating conditions. Meanwhile, it can illustrate the mechanism based on the field reconstruction results for the generated aerodynamic performance. The training results show that the off-design aerodynamic prediction convolutional neural network (OAP-CNN) has reduced the mean and maximum error of efficiency prediction compared with the traditional Gaussian Process Regression (GPR) and Artificial Neural Network (ANN). Aiming at the off-design conditions, the pressure and temperature distributions with acceptable error can be obtained without a CFD calculation. Besides, the influence of off-design parameters on the efficiency and power can be conveniently acquired, thus providing the reference for an optimized operation strategy. Analyzing the sensitivity of AOP-CNN to training data set size, the prediction accuracy is acceptable when the percentage of training samples exceeds 50%. The minimum error appears when the training data set size is 0.8. The mean and maximum errors are respectively 1.46% and 6.42%. In summary, this research provides a precise and fast aerodynamic performance prediction model in the analyses of off-design conditions for S-CO2 turbomachinery and Brayton cycle.

Download Full-text

Visual Object Tracking from Correlation Filter to Deep Learning

10.1007/978-981-16-6242-3 ◽

2021 ◽

Author(s):

Weiwei Xing ◽

Weibin Liu ◽

Jun Wang ◽

Shunli Zhang ◽

Lihui Wang ◽

...

Keyword(s):

Deep Learning ◽

Object Tracking ◽

Correlation Filter ◽

Visual Object ◽

Visual Object Tracking

Download Full-text