Towards Autonomous Drone Racing without GPU Using an OAK-D Smart Camera

Recent advances have shown for the first time that it is possible to beat a human with an autonomous drone in a drone race. However, this solution relies heavily on external sensors, specifically on the use of a motion capture system. Thus, a truly autonomous solution demands performing computationally intensive tasks such as gate detection, drone localisation, and state estimation. To this end, other solutions rely on specialised hardware such as graphics processing units (GPUs) whose onboard hardware versions are not as powerful as those available for desktop and server computers. An alternative is to combine specialised hardware with smart sensors capable of processing specific tasks on the chip, alleviating the need for the onboard processor to perform these computations. Motivated by this, we present the initial results of adapting a novel smart camera, known as the OpenCV AI Kit or OAK-D, as part of a solution for the ADR running entirely on board. This smart camera performs neural inference on the chip that does not use a GPU. It can also perform depth estimation with a stereo rig and run neural network models using images from a 4K colour camera as the input. Additionally, seeking to limit the payload to 200 g, we present a new 3D-printed design of the camera’s back case, reducing the original weight 40%, thus enabling the drone to carry it in tandem with a host onboard computer, the Intel Stick compute, where we run a controller based on gate detection. The latter is performed with a neural model running on an OAK-D at an operation frequency of 40 Hz, enabling the drone to fly at a speed of 2 m/s. We deem these initial results promising toward the development of a truly autonomous solution that will run intensive computational tasks fully on board.

Download Full-text

Towards Enhanced Performance of Neural-Network-Based Fault Detection Using an Sequential D-Optimum Experimental Design

Applied Sciences ◽

10.3390/app8081290 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1290 ◽

Cited By ~ 2

Author(s):

Beata Mrugalska

Keyword(s):

Neural Network ◽

Experimental Design ◽

Fault Detection ◽

Network Models ◽

Neural Model ◽

Neural Network Models ◽

Optimum Experimental Design ◽

The Neural Network ◽

Adaptive Thresholds ◽

Linear Neural Network

Increasing expectations of industrial system reliability require development of more effective and robust fault diagnosis methods. The paper presents a framework for quality improvement on the neural model applied for fault detection purposes. In particular, the proposed approach starts with an adaptation of the modified quasi-outer-bounding algorithm towards non-linear neural network models. Subsequently, its convergence is proven using quadratic boundedness paradigm. The obtained algorithm is then equipped with the sequential D-optimum experimental design mechanism allowing gradual reduction of the neural model uncertainty. Finally, an emerging robust fault detection framework on the basis of the neural network uncertainty description as the adaptive thresholds is proposed.

Download Full-text

Extending the usage of graphics processing units on the cloud for cost savings on seismic data regularization

Brazilian Journal of Geophysics ◽

10.22564/rbgf.v38i2.2048 ◽

2021 ◽

Vol 38 (2) ◽

Author(s):

Nicholas Torres Okita ◽

Tiago A. Coimbra ◽

José Ribeiro ◽

Martin Tygel

Keyword(s):

Cloud Computing ◽

Graphics Processing Units ◽

Cost Savings ◽

Data Sets ◽

Computing Paradigm ◽

Common Reflection Surface ◽

User Demand ◽

Computationally Intensive ◽

Zero Offset ◽

Graphics Processing

ABSTRACT. The usage of graphics processing units is already known as an alternative to traditional multi-core CPU processing, offering faster performance in the order of dozens of times in parallel tasks. Another new computing paradigm is cloud computing usage as a replacement to traditional in-house clusters, enabling seemingly unlimited computation power, no maintenance costs, and cutting-edge technology, dynamically on user demand. Previously those two tools were used to accelerate the estimation of Common Reflection Surface (CRS) traveltime parameters, both in zero-offset and finite-offset domain, delivering very satisfactory results with large time savings from GPU devices alongside cost savings on the cloud. This work extends those results by using GPUs on the cloud to accelerate the Offset Continuation Trajectory (OCT) traveltime parameter estimation. The results have shown that the time and cost savings from GPU devices’ usage are even larger than those seen in the CRS results, being up to fifty times faster and sixty times cheaper. This analysis reaffirms that it is possible to save both time and money when using GPU devices on the cloud and concludes that the larger the data sets are and the more computationally intensive the traveltime operators are, we can see larger improvements.Keywords: cloud computing, GPU, seismic processing. Estendendo o uso de placas gráficas na nuvem para economias em regularização de dados sísmicosRESUMO. O uso de aceleradores gráficos para processamento já é uma alternativa conhecida ao uso de CPUs multi-cores, oferecendo um desempenho na ordem de dezenas de vezes mais rápido em tarefas paralelas. Outro novo paradigma de computação é o uso da nuvem computacional como substituta para os tradicionais clusters internos, possibilitando o uso de um poder computacional aparentemente infinito sem custo de manutenção e com tecnologia de ponta, dinamicamente sob demanda de usuário. Anteriormente essas duas ferramentas foram utilizadas para acelerar a estimação de parâmetros do tempo de trânsito de Common Reflection Surface (CRS), tanto em zero-offset quanto em offsets finitos, obtendo resultados satisfatórios com amplas economias tanto de tempo quanto de dinheiro na nuvem. Este trabalho estende os resultados obtidos anteriormente, desta vez utilizando GPUs na nuvem para acelerar a estimação de parâmetros do tempo de trânsito em Offset Continuation Trajectory (OCT). Os resultados obtidos mostraram que as economias de tempo e dinheiro foram ainda maiores do que aquelas obtidas no CRS, sendo até cinquenta vezes mais rápido e sessenta vezes mais barato. Esta análise reafirma que é possível economizar tanto tempo quanto dinheiro usando GPUs na nuvem, e conclui que quanto maior for o dado e quanto mais computacionalmente intenso for o operador, maiores serão os ganhos de desempenho observados e economias.Palavras-chave: computação em nuvem, GPU, processamento sísmico.

Download Full-text

A GPU based multidimensional amplitude analysis to search for tetraquark candidates

10.21203/rs.3.rs-51185/v3 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Multiple Cores ◽

Analysis Strategies ◽

Computationally Intensive ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with an aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text

Tongue fissure visualization by using deep learning – an example of the application of artificial intelligence in traditional medicine

10.21203/rs.2.19210/v2 ◽

2020 ◽

Author(s):

Wen-Hsien Chang ◽

Han-Kuei Wu ◽

Lun-chien Lo ◽

William W. L. Hsiao ◽

Hsueh-Ting Chu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Transfer Learning ◽

Network Model ◽

Neural Network Model ◽

Graphics Processing Units ◽

Network Models ◽

Neural Network Models ◽

The Neural Network ◽

Objective Evidence

Abstract Background: Traditional Chinese medicine (TCM) describes physiological and pathological changes inside and outside the human body by the application of four methods of diagnosis. One of the four methods, tongue diagnosis, is widely used by TCM physicians, since it allows direct observations that prevent discrepancies in the patient’s history and, as such, provides clinically important, objective evidence. The clinical significance of tongue features has been explored in both TCM and modern medicine. However, TCM physicians may have different interpretations of the features displayed by the same tongue, and therefore intra- and inter-observer agreements are relatively low. If an automated interpretation system could be developed, more consistent results could be obtained, and learning could also be more efficient. This study will apply a recently developed deep learning method to the classification of tongue features, and indicate the regions where the features are located.Methods: A large number of tongue photographs with labeled fissures were used. Transfer learning was conducted using the ImageNet-pretrained ResNet50 model to determine whether tongue fissures were identified on a tongue photograph. Often, the neural network model lacks interpretability, and users cannot understand how the model determines the presence of tongue fissures. Therefore, Gradient-weighted Class Activation Mapping (Grad-CAM) was also applied to directly mark the tongue features on the tongue image. Results: Only 6 epochs were trained in this study and no graphics processing units (GPUs) were used. It took less than 4 minutes for each epoch to be trained. The correct rate for the test set was approximately 70%. After the model training was completed, Grad-CAM was applied to localize tongue fissures in each image. The neural network model not only determined whether tongue fissures existed, but also allowed users to learn about the tongue fissure regions.Conclusions: This study demonstrated how to apply transfer learning using the ImageNet-pretrained ResNet50 model for the identification and localization of tongue fissures and regions. The neural network model built in this study provided interpretability and intuitiveness, (often lacking in general neural network models), and improved the feasibility for clinical application.

Download Full-text

A GPU based multidimensional amplitude analysis to search for tetraquark candidates

10.21203/rs.3.rs-51185/v2 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Multiple Cores ◽

Analysis Strategies ◽

Computationally Intensive ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data and physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with the aim to study the internal structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays and can also be seamlessly adapted for other similar analyses. The GooFit fitter, running on GPUs, shows a remarkable improvement in the computing speed compared to a ROOT/RooFit implementation of the same analysis running on multi-core CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text

Analyzing Malaria Disease Using Effective Deep Learning Approach

Diagnostics ◽

10.3390/diagnostics10100744 ◽

2020 ◽

Vol 10 (10) ◽

pp. 744

Author(s):

Krit Sriporn ◽

Cheng-Fa Tsai ◽

Chia-En Tsai ◽

Paohsi Wang

Keyword(s):

Diagnostic System ◽

Network Models ◽

Neural Model ◽

Activation Function ◽

Training Dataset ◽

Neural Network Models ◽

Recall Accuracy ◽

Detection Approach ◽

Computer Aided ◽

Rotational Method

Medical tools used to bolster decision-making by medical specialists who offer malaria treatment include image processing equipment and a computer-aided diagnostic system. Malaria images can be employed to identify and detect malaria using these methods, in order to monitor the symptoms of malaria patients, although there may be atypical cases that need more time for an assessment. This research used 7000 images of Xception, Inception-V3, ResNet-50, NasNetMobile, VGG-16 and AlexNet models for verification and analysis. These are prevalent models that classify the image precision and use a rotational method to improve the performance of validation and the training dataset with convolutional neural network models. Xception, using the state of the art activation function (Mish) and optimizer (Nadam), improved the effectiveness, as found by the outcomes of the convolutional neural model evaluation of these models for classifying the malaria disease from thin blood smear images. In terms of the performance, recall, accuracy, precision, and F1 measure, a combined score of 99.28% was achieved. Consequently, 10% of all non-dataset training and testing images were evaluated utilizing this pattern. Notable aspects for the improvement of a computer-aided diagnostic to produce an optimum malaria detection approach have been found, supported by a 98.86% accuracy level.

Download Full-text

Criteria for comparing artificial intelligence systems

MORSKIE INTELLEKTUAL`NYE TEHNOLOGII ◽

10.37220/mit.2021.54.4.084 ◽

2021 ◽

pp. 21-27

Author(s):

В.А. Пятакович ◽

В.Ф. Рычкова ◽

Н.Г. Левченко

Keyword(s):

Neural Network ◽

Network Models ◽

Neural Model ◽

Neural Network Models ◽

Actual Problem ◽

Neuro Fuzzy ◽

Classification Of Images ◽

Fuzzy Network ◽

First Time

Модели нейронных и нейро-нечетких сетевых критериев сравнения в задачах диагностики и классификации образов. Предложен комплекс критериев для оценки свойств искусственных нейронных и нейро-нечетких сетей. Он включает в себя критерии разнообразия, подгонки, эластичности, равнозначности, устойчивости к шуму, аварийной ситуации, а также заданную монотонность для построения нейронной модели. Применение предложенных критериев на практике позволяет автоматизировать процесс построения, анализа и сравнения нейронных моделей для решения задач диагностики и классификации паттернов. Предложено решение задачи повышения эффективности параметрического синтеза нейросетевых моделей сложных систем для обоснованного принятия решений о классификации подводных целей. Научная новизна работы заключается в том, что впервые предложен комплекс моделей критериев, характеризующих такие свойства нейронных и нейро-нечетких сетей как разнообразие, переобученность, эластичность, эквифинальность, устойчивость к шуму, эмерджентность, что позволяет автоматизировать решение задачи анализа свойств и сравнения нейросетевых и нейро-нечетких моделей при решении задач диагностики и классификации образов. В работе решена актуальная задача автоматизации анализа свойств и сравнения нейросетевых моделей. Models of neural and neuro-fuzzy network comparison criterions in the tasks of diagnostics and pattern classification. The complex of criterions for an estimation of properties artificial neural and neuro-fuzzy networks is proposed. It includes criterions of variety, overfitting, elasticity, equifinality, stability to a noise, emergency, and also set monotonicity for a neural model construction. The application of offered criterions in practice allows to automatize the process of a construction, analysis and comparison of neural models for problem solving of diagnostics and patternt classification. The solution of the problem of increasing the efficiency of parametric synthesis of neural network models of complex systems for informed decision-making on the classification of underwater targets is proposed. The scientific novelty of the work lies in the fact that for the first time a set of models of criteria characterizing such properties of neural and neuro-fuzzy networks as diversity, retraining, elasticity, equifinality, noise resistance, emergence is proposed, which allows automating the solution of the problem of analyzing the properties and comparing neural network and neuro-fuzzy models when solving problems of diagnostics and classification of images. The paper solves the actual problem of automating the analysis of properties and comparison of neural network models.

Download Full-text

POM.gpu-v1.0: a GPU-based Princeton Ocean Model

Geoscientific Model Development ◽

10.5194/gmd-8-2815-2015 ◽

2015 ◽

Vol 8 (9) ◽

pp. 2815-2827 ◽

Cited By ~ 13

Author(s):

S. Xu ◽

X. Huang ◽

L.-Y. Oey ◽

F. Xu ◽

H. Fu ◽

...

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Climate Models ◽

Ocean Model ◽

Compute Unified Device Architecture ◽

Princeton Ocean Model ◽

Central Processing ◽

Device Architecture ◽

Computationally Intensive ◽

Graphics Processing

Abstract. Graphics processing units (GPUs) are an attractive solution in many scientific applications due to their high performance. However, most existing GPU conversions of climate models use GPUs for only a few computationally intensive regions. In the present study, we redesign the mpiPOM (a parallel version of the Princeton Ocean Model) with GPUs. Specifically, we first convert the model from its original Fortran form to a new Compute Unified Device Architecture C (CUDA-C) code, then we optimize the code on each of the GPUs, the communications between the GPUs, and the I / O between the GPUs and the central processing units (CPUs). We show that the performance of the new model on a workstation containing four GPUs is comparable to that on a powerful cluster with 408 standard CPU cores, and it reduces the energy consumption by a factor of 6.8.

Download Full-text

dadi.CUDA: Accelerating population genetic inference with Graphics Processing Units

10.1101/2020.07.30.229336 ◽

2020 ◽

Author(s):

Ryan N Gutenkunst

Keyword(s):

Graphics Processing Units ◽

Population Genetic ◽

Gpu Computing ◽

Graphics Processing Unit ◽

Demographic History ◽

Processing Unit ◽

Speed Increase ◽

Population Genetic Inference ◽

Computationally Intensive ◽

Graphics Processing

Extracting insight from population genetic data often demands computationally intensive modeling. dadi is a popular program for fitting models of demographic history and natural selection to such data. Here, I show that running dadi on a Graphics Processing Unit (GPU) can speed computation by orders of magnitude compared to the CPU implementation, with minimal user burden. This speed increase enables the analysis of more complex models, which motivated the extension of dadi to four- and five-population models. Remarkably, dadi performs almost as well on inexpensive consumer-grade GPUs as on expensive server-grade GPUs. GPU computing thus offers large and accessible benefits to the community of dadi users. This functionality is available in dadi version 2.1.0.

Download Full-text

A GPU Based Multidimensional Amplitude Analysis to Search for Tetraquark Candidates

10.21203/rs.3.rs-51185/v1 ◽

2020 ◽

Author(s):

Nairit Sur ◽

Leonardo Cristella ◽

Adriano Di Florio ◽

Vincenzo Mastrapasqua

Keyword(s):

Graphics Processing Units ◽

High Energy Physics ◽

High Energy ◽

Amplitude Analysis ◽

Hadron Spectroscopy ◽

Speed Up ◽

Computationally Intensive ◽

Quark Structure ◽

Computational Resources ◽

Graphics Processing

Abstract The demand for computational resources is steadily increasing in experimental high energy physics as the current collider experiments continue to accumulate huge amounts of data while physicists indulge in more complex and ambitious analysis strategies. This is especially true in the fields of hadron spectroscopy and flavour physics where the analyses often depend on complex multidimensional unbinned maximum-likelihood fits, with several dozens of free parameters, with the aim to study the quark structure of hadrons. Graphics processing units (GPUs) represent one of the most sophisticated and versatile parallel computing architectures that are becoming popular toolkits for high energy physicists to meet their computational demands. GooFit is an upcoming open-source tool interfacing ROOT/RooFit to the CUDA platform on NVIDIA GPUs that acts as a bridge between the MINUIT minimization algorithm and a parallel processor, allowing probability density functions to be estimated on multiple cores simultaneously. In this article, a full-fledged amplitude analysis framework developed using GooFit is tested for its speed and reliability. The four-dimensional fitter framework, one of the firsts of its kind to be built on GooFit, is geared towards the search for exotic tetraquark states in the [[EQUATION]] decays that can also be seamlessly adapted for other similar analyses. The GooFit fitter running on GPUs shows a remarkable speed-up in the computing performance when compared to a ROOT/RooFit implementation of the same, running on multicore CPU clusters. Furthermore, it shows sensitivity to components with small contributions to the overall fit. It has the potential to be a powerful tool for sensitive and computationally intensive physics analyses.

Download Full-text