Deep Learning Framework for Vehicle and Pedestrian Detection in Rural Roads on an Embedded GPU

Luis Barba-Guaman; José Eugenio Naranjo; Anthony Ortiz

doi:10.3390/electronics9040589

Deep Learning Framework for Vehicle and Pedestrian Detection in Rural Roads on an Embedded GPU

Electronics ◽

10.3390/electronics9040589 ◽

2020 ◽

Vol 9 (4) ◽

pp. 589 ◽

Cited By ~ 3

Author(s):

Luis Barba-Guaman ◽

José Eugenio Naranjo ◽

Anthony Ortiz

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Embedded System ◽

Rural Areas ◽

High Performance ◽

Processing Time ◽

Graphics Processing Unit ◽

Pedestrian Detection ◽

Processing Unit

Object detection, one of the most fundamental and challenging problems in computer vision. Nowadays some dedicated embedded systems have emerged as a powerful strategy for deliver high processing capabilities including the NVIDIA Jetson family. The aim of the present work is the recognition of objects in complex rural areas through an embedded system, as well as the verification of accuracy and processing time. For this purpose, a low power embedded Graphics Processing Unit (Jetson Nano) has been selected, which allows multiple neural networks to be run in simultaneous and a computer vision algorithm to be applied for image recognition. As well, the performance of these deep learning neural networks such as ssd-mobilenet v1 and v2, pednet, multiped and ssd-inception v2 has been tested. Moreover, it was found that the accuracy and processing time were in some cases improved when all the models suggested in the research were applied. The pednet network model provides a high performance in pedestrian recognition, however, the sdd-mobilenet v2 and ssd-inception v2 models are better at detecting other objects such as vehicles in complex scenarios.

Download Full-text

Detection of Inflatable Boats and People in Thermal Infrared with Deep Learning Methods

Sensors ◽

10.3390/s21165330 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5330

Author(s):

Marcin Łukasz Kowalski ◽

Norbert Pałka ◽

Jarosław Młyńczak ◽

Mateusz Karol ◽

Elżbieta Czerwińska ◽

...

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Deep Learning ◽

Processing Time ◽

Graphics Processing Unit ◽

Weather Conditions ◽

Processing Unit ◽

Long Time ◽

Financial Interests ◽

Graphics Processing

Smuggling of drugs and cigarettes in small inflatable boats across border rivers is a serious threat to the EU’s financial interests. Early detection of such threats is challenging due to difficult and changing environmental conditions. This study reports on the automatic detection of small inflatable boats and people in a rough wild terrain in the infrared thermal domain. Three acquisition campaigns were carried out during spring, summer, and fall under various weather conditions. Three deep learning algorithms, namely, YOLOv2, YOLOv3, and Faster R-CNN working with six different feature extraction neural networks were trained and evaluated in terms of performance and processing time. The best performance was achieved with Faster R-CNN with ResNet101, however, processing requires a long time and a powerful graphics processing unit.

Download Full-text

Artificial Intelligence & Machine Learning in Computer Vision Applications

Embedded Selforganising Systems ◽

10.14464/ess71432 ◽

2020 ◽

Vol 7 (1) ◽

pp. 2-3

Author(s):

Shadi Saleh

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Large Scale ◽

Short Term Memory ◽

Graphics Processing Unit ◽

Multimedia Data ◽

Processing Unit

Deep learning and machine learning innovations are at the core of the ongoing revolution in Artificial Intelligence for the interpretation and analysis of multimedia data. The convergence of large-scale datasets and more affordable Graphics Processing Unit (GPU) hardware has enabled the development of neural networks for data analysis problems that were previously handled by traditional handcrafted features. Several deep learning architectures such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM)/Gated Recurrent Unit (GRU), Deep Believe Networks (DBN), and Deep Stacking Networks (DSNs) have been used with new open source software and libraries options to shape an entirely new scenario in computer vision processing.

Download Full-text

SWIRL: High-performance many-core CPU code generation for deep neural networks

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019866247 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1275-1289 ◽

Cited By ~ 3

Author(s):

Anand Venkat ◽

Tharindu Rusira ◽

Raj Barik ◽

Mary Hall ◽

Leonard Truong

Keyword(s):

Neural Networks ◽

Language Processing ◽

Code Generation ◽

High Performance ◽

Deep Neural Networks ◽

Graphics Processing Unit ◽

Processing Unit ◽

Data Movement ◽

Central Processing ◽

The Status

Deep neural networks (DNNs) have demonstrated effectiveness in many domains including object recognition, speech recognition, natural language processing, and health care. Typically, the computations involved in DNN training and inferencing are time consuming and require efficient implementations. Existing frameworks such as TensorFlow, Theano, Torch, Cognitive Tool Kit (CNTK), and Caffe enable Graphics Processing Unit (GPUs) as the status quo devices for DNN execution, leaving Central Processing Unit (CPUs) behind. Moreover, existing frameworks forgo or limit cross layer optimization opportunities that have the potential to improve performance by significantly reducing data movement through the memory hierarchy. In this article, we describe an alternative approach called SWIRL, a compiler that provides high-performance CPU implementations for DNNs. SWIRL is built on top of the existing domain-specific language (DSL) for DNNs called LATTE. SWIRL separates DNN specification and its schedule using predefined transformation recipes for tensors and layers commonly found in DNN layers. These recipes synergize with DSL constructs to generate high-quality fused, vectorized, and parallelized code for CPUs. On an Intel Xeon Platinum 8180M CPU, SWIRL achieves performance comparable with Tensorflow integrated with MKL-DNN; on average 1.00× of Tensorflow inference and 0.99× of Tensorflow training. It also outperforms the original LATTE compiler on average by 1.22× and 1.30× on inference and training, respectively.

Download Full-text

A Framework for Simulating and Estimating the State and Functional Topology of Complex Dynamic Geometric Networks

Neural Computation ◽

10.1162/neco_a_00065 ◽

2011 ◽

Vol 23 (1) ◽

pp. 183-214 ◽

Cited By ~ 11

Author(s):

Marius Buibas ◽

Gabriel A. Silva

Keyword(s):

Neural Networks ◽

High Performance ◽

Graphics Processing Unit ◽

Signal Propagation ◽

Cellular Neural Networks ◽

Standard Test ◽

The State ◽

Processing Unit ◽

Complex Dynamic ◽

Functional Topology

We introduce a framework for simulating signal propagation in geometric networks (networks that can be mapped to geometric graphs in some space) and developing algorithms that estimate (i.e., map) the state and functional topology of complex dynamic geometric networks. Within the framework, we define the key features typically present in such networks and of particular relevance to biological cellular neural networks: dynamics, signaling, observation, and control. The framework is particularly well suited for estimating functional connectivity in cellular neural networks from experimentally observable data and has been implemented using graphics processing unit high-performance computing. Computationally, the framework can simulate cellular network signaling close to or faster than real time. We further propose a standard test set of networks to measure performance and compare different mapping algorithms.

Download Full-text

Prediksi Kelas Jamak dengan Deep Learning Berbasis Graphics Processing Units

Jurnal Kajian Ilmiah ◽

10.31599/jki.v20i1.71 ◽

2020 ◽

Vol 20 (1) ◽

pp. 67-76

Author(s):

Rahmadya Trias Handayanto ◽

Herlawati Herlawati

Keyword(s):

Neural Networks ◽

Support Vector Machine ◽

Artificial Neural Networks ◽

Deep Learning ◽

Decision Tree ◽

Graphics Processing Unit ◽

Support Vector ◽

Processing Unit ◽

Class 1 ◽

Graphics Processing

For the first time, machine learning did the classical classification process using two classes (bi-class) such as class -1 and class +1, 0 and 1, or the form of categories such as true and false. Famous methods used are Artificial Neural Networks (ANN) and Support Vector Machine (SVM). The current development was a problem with more than two classes, known as multi-class classes. For SVM sometimes the plural classes are overcome by doing a gradual process like a decision tree (DT) method. Meanwhile, ANN has experienced rapid development and is currently being developed with a large number of layers with the new activation functions, i.e. the rectified linear units (ReLu), and the probabilistic-based activation, i.e. softmax, including its optimizer methods (adam, sgd, and others). Then the term changed to Deep Learning (DL). This study aimed to compare two well-known methods (DL and SVM) in classifying multiple classes. The number of DL layers was six with the neuron composition are 128, 64, 32, 8, 4, and 3, while SVM uses a radial kernel base function with gamma and c respectively 0.7 and 5. Besides, this study intends to compare the use of the Graphics Processing Unit (GPU) available on Google Interactive Notebook (Google Colab), an online Python language programming application. The results showed that DL accuracy outperformed SVM but required large computational resources, with the accuracy for DL and SVM are 99% and 98%, respectively. However, the use of the GPU can overcome these problems and is proven to increase the speed of the process as much as 47 times. Keywords: Artificial Neural Networks, Graphics Processing Unit, Google Interactive Notebook, Rectified Linear units, Support Vector Machine. Abstrak Di awal perkembangannya mesin pembelajaran melakukan proses klasikfikasi menggunakan dua kelas (bi-class) misalnya kelas -1 dan kelas +1, 0 dan 1, atau bentuk kategori seperti benar dan salah. Metode terkenal yang digunakan adalah Jaringan Syaraf Tiruan (JST) dan Support Vector Machine (SVM). Perkembangan selanjutnya adalah problem dengan kelas yang lebih dari dua kelas, dikenal dengan istilah kelas jamak (multi-class). Untuk SVM terkadang kelas jamak diatasi dengan melakukan proses berjenjang mirip pohon keputusan (decision tree). Sementara itu JST telah mengalami perkembangan yang pesat dan saat ini sudah dikembangkan dengan jumlah layer yang banyak disertai dengan fungsi-fungsi aktivasi terkini seperti rectified linear unit (ReLu), dan softmax yang berbasis probabilistik, termasuk juga metode-metode optimizernya (adam, sgd, dan lain-lain). Kemudian istilahnya berubah menjadi Deep Learning (DL). Penelitian ini mencoba membandingkan dua metode terkenal (DL dan SVM) dalam melakukan klasifikasi kelas jamak. Jumlah layer DL sebanyak enam dengan masing-masing neuron sebesar 128, 64, 32, 8, 4, dan 3, sementara SVM menggunakan kernel radial basis function dengan gamma dan c berturut-turut 0.7 dan 5. Selain itu penelitian ini bermaksud membandingkan penggunaan Graphics Processing Unit (GPU) yang tersedia di Google Interactive Notebook (Google Colab), sebuah aplikasi online pemrograman bahasa Python. Hasil penelitian menunjukan akurasi DL unggul tipis dibanding SVM namun memerlukan sumber daya komputasi yang besar masing-masing dengan akurasi 99% dan 98%. Namun penggunaan GPU mampu mengatasi permasalahan tersebut dan terbukti meningkatkan kecepatan proses sebanyak 47 kali. Kata kunci: Jaringan Syaraf Tiruan, Graphics Processing Unit, Google Interactive Notebook, Rectified Linear units, Support Vector Machine.

Download Full-text

Aspects of programming for implementation of convolutional neural networks on multisystem HPC architectures

Journal of Physics Conference Series ◽

10.1088/1742-6596/2062/1/012016 ◽

2021 ◽

Vol 2062 (1) ◽

pp. 012016

Author(s):

Sunil Pandey ◽

Naresh Kumar Nagwani ◽

Shrish Verma

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

High Performance Computing ◽

Convolutional Neural Networks ◽

High Performance ◽

Processing Unit ◽

Computational Performance ◽

Performance Computing

Abstract The training of deep learning convolutional neural networks is extremely compute intensive and takes long times for completion, on all except small datasets. This is a major limitation inhibiting the widespread adoption of convolutional neural networks in real world applications despite their better image classification performance in comparison with other techniques. Multidirectional research and development efforts are therefore being pursued with the objective of boosting the computational performance of convolutional neural networks. Development of parallel and scalable deep learning convolutional neural network implementations for multisystem high performance computing architectures is important in this background. Prior analysis based on computational experiments indicates that a combination of pipeline and task parallelism results in significant convolutional neural network performance gains of up to 18 times. This paper discusses the aspects which are important from the perspective of implementation of parallel and scalable convolutional neural networks on central processing unit based multisystem high performance computing architectures including computational pipelines, convolutional neural networks, convolutional neural network pipelines, multisystem high performance computing architectures and parallel programming models.

Download Full-text

Computational Complexity Reduction of Neural Networks of Brain Tumor Image Segmentation by Introducing Fermi–Dirac Correction Functions

Entropy ◽

10.3390/e23020223 ◽

2021 ◽

Vol 23 (2) ◽

pp. 223

Author(s):

Yen-Ling Tai ◽

Shin-Jhe Huang ◽

Chien-Chang Chen ◽

Henry Horng-Shing Lu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Computational Complexity ◽

High Performance ◽

Low Cost ◽

Structural Complexity ◽

Correction Function ◽

Computational Time ◽

Learning Methods ◽

Band Theory

Nowadays, deep learning methods with high structural complexity and flexibility inevitably lean on the computational capability of the hardware. A platform with high-performance GPUs and large amounts of memory could support neural networks having large numbers of layers and kernels. However, naively pursuing high-cost hardware would probably drag the technical development of deep learning methods. In the article, we thus establish a new preprocessing method to reduce the computational complexity of the neural networks. Inspired by the band theory of solids in physics, we map the image space into a noninteraction physical system isomorphically and then treat image voxels as particle-like clusters. Then, we reconstruct the Fermi–Dirac distribution to be a correction function for the normalization of the voxel intensity and as a filter of insignificant cluster components. The filtered clusters at the circumstance can delineate the morphological heterogeneity of the image voxels. We used the BraTS 2019 datasets and the dimensional fusion U-net for the algorithmic validation, and the proposed Fermi–Dirac correction function exhibited comparable performance to other employed preprocessing methods. By comparing to the conventional z-score normalization function and the Gamma correction function, the proposed algorithm can save at least 38% of computational time cost under a low-cost hardware architecture. Even though the correction function of global histogram equalization has the lowest computational time among the employed correction functions, the proposed Fermi–Dirac correction function exhibits better capabilities of image augmentation and segmentation.

Download Full-text

Deep learning for denoising

Geophysics ◽

10.1190/geo2018-0668.1 ◽

2019 ◽

Vol 84 (6) ◽

pp. V333-V350 ◽

Cited By ~ 15

Author(s):

Siwei Yu ◽

Jianwei Ma ◽

Wenlong Wang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Graphics Processing Unit ◽

Random Noise ◽

Stochastic Gradient Descent ◽

Processing Unit ◽

Noise Attenuation ◽

Optimal Parameters ◽

Training Set ◽

Unknown Variance

Compared with traditional seismic noise attenuation algorithms that depend on signal models and their corresponding prior assumptions, removing noise with a deep neural network is trained based on a large training set in which the inputs are the raw data sets and the corresponding outputs are the desired clean data. After the completion of training, the deep-learning (DL) method achieves adaptive denoising with no requirements of (1) accurate modelings of the signal and noise or (2) optimal parameters tuning. We call this intelligent denoising. We have used a convolutional neural network (CNN) as the basic tool for DL. In random and linear noise attenuation, the training set is generated with artificially added noise. In the multiple attenuation step, the training set is generated with the acoustic wave equation. The stochastic gradient descent is used to solve the optimal parameters for the CNN. The runtime of DL on a graphics processing unit for denoising has the same order as the [Formula: see text]-[Formula: see text] deconvolution method. Synthetic and field results indicate the potential applications of DL in automatic attenuation of random noise (with unknown variance), linear noise, and multiples.

Download Full-text

Off-the-shelf deep learning is not enough, and requires parsimony, Bayesianity, and causality

npj Computational Materials ◽

10.1038/s41524-020-00487-0 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Rama K. Vasudevan ◽

Maxim Ziatdinov ◽

Lukas Vlcek ◽

Sergei V. Kalinin

Keyword(s):

Neural Networks ◽

Computer Vision ◽

Deep Learning ◽

Bayesian Methods ◽

Deep Neural Networks ◽

Applied Research ◽

Modern Science ◽

Generative Models ◽

Knowledge Development ◽

Physical Constraints

AbstractDeep neural networks (‘deep learning’) have emerged as a technology of choice to tackle problems in speech recognition, computer vision, finance, etc. However, adoption of deep learning in physical domains brings substantial challenges stemming from the correlative nature of deep learning methods compared to the causal, hypothesis driven nature of modern science. We argue that the broad adoption of Bayesian methods incorporating prior knowledge, development of solutions with incorporated physical constraints and parsimonious structural descriptors and generative models, and ultimately adoption of causal models, offers a path forward for fundamental and applied research.

Download Full-text

A lightweight approach to performance portability with targetDP

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016682071 ◽

2016 ◽

Vol 32 (2) ◽

pp. 288-301

Author(s):

Alan Gray ◽

Kevin Stratford

Keyword(s):

Particle Physics ◽

Message Passing ◽

Graphics Processing Units ◽

High Performance ◽

Large Scale ◽

Message Passing Interface ◽

Graphics Processing Unit ◽

Processing Unit ◽

Performance Portability ◽

Graphics Processing

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.

Download Full-text