scholarly journals An Improved Back-Projection Algorithm for GNSS-R BSAR Imaging Based on CPU and GPU Platform

2021 ◽  
Vol 13 (11) ◽  
pp. 2107
Author(s):  
Shiyu Wu ◽  
Zhichao Xu ◽  
Feng Wang ◽  
Dongkai Yang ◽  
Gongjian Guo

Global Navigation Satellite System Reflectometry Bistatic Synthetic Aperture Radar (GNSS-R BSAR) is becoming more and more important in remote sensing because of its low power, low mass, low cost, and real-time global coverage capability. The Back Projection Algorithm (BPA) was usually selected as the GNSS-R BSAR imaging algorithm because it can process echo signals of complex geometric configurations. However, the huge computational cost is a challenge for its application in GNSS-R BSAR. Graphics Processing Units (GPU) provides an efficient computing platform for GNSS-R BSAR processing. In this paper, a solution accelerating the BPA of GNSS-R BSAR using GPU is proposed to improve imaging efficiency, and a matching pre-processing program was proposed to synchronize direct and echo signals to improve imaging quality. To process hundreds of gigabytes of data collected by a long-time synthetic aperture in fixed station mode, a stream processing structure was used to process such a large amount of data to solve the problem of limited GPU memory. In the improvement of the imaging efficiency, the imaging task is divided into pre-processing and BPA, which are performed in the Central Processing Unit (CPU) and GPU, respectively, and a pixel-oriented parallel processing method in back projection is adopted to avoid memory access conflicts caused by excessive data volume. The improved BPA with the long synthetic aperture time is verified through the simulation of and experimenting on the GPS-L5 signal. The results show that the proposed accelerating solution is capable of taking approximately 128.04 s, which is 156 times lower than pure CPU framework for producing a size of 600 m × 600 m image with 1800 s synthetic aperture time; in addition, the same imaging quality with the existing processing solution can be retained.

Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


Author(s):  
Jan Svedin ◽  
Anders Bernland ◽  
Andreas Gustafsson ◽  
Eric Claar ◽  
John Luong

Abstract This paper describes a small unmanned aerial vehicle (UAV)-based synthetic aperture radar (SAR) system using low-cost radar (5–6 GHz), position (GNSS/RTK) and attitude (IMU) sensors for the generation of high-resolution images. Measurements using straight as well as highly curved flight trajectories and varying flight speeds are presented, showing range and cross-range lobe-widths close to the theoretical limits. An analysis of the improvements obtained by the use of attitude angles (roll, pitch, and yaw), to correct for the relative offsets in antenna positions as the UAV moves, is included. A capability to generate SAR images onboard with the back-projection algorithm has been implemented using a GPU accelerated single-board computer. Generated images are transmitted to ground using a Wi-Fi data link.


2016 ◽  
Vol 26 (03) ◽  
pp. 1650013 ◽  
Author(s):  
Omar Abdelkafi ◽  
Lhassane Idoumghar ◽  
Julien Lepagnot

The computational power requirements of real-world optimization problems begin to exceed the general performance of the Central Processing Unit (CPU). The modeling of such problems is in constant evolution and requires more computational power. Solving them is expensive in computation time and even metaheuristics, well known for their eficiency, begin to be unsuitable for the increasing amount of data. Recently, thanks to the advent of languages such as CUDA, the development of parallel metaheuristics on Graphic Processing Unit (GPU) platform to solve combinatorial problems such as the Quadratic Assignment Problem (QAP) has received a growing interest. It is one of the most studied NP-hard problems and it is known for its high computational cost. In this paper, we survey several of the most important metaheuristics approaches for the QAP and we focus our survey on parallel metaheuristics using the GPU.


2021 ◽  
Vol 119 ◽  
pp. 07002
Author(s):  
Youness Rtal ◽  
Abdelkader Hadjoudja

Graphics Processing Units (GPUs) are microprocessors attached to graphics cards, which are dedicated to the operation of displaying and manipulating graphics data. Currently, such graphics cards (GPUs) occupy all modern graphics cards. In a few years, these microprocessors have become potent tools for massively parallel computing. Such processors are practical instruments that serve in developing several fields like image processing, video and audio encoding and decoding, the resolution of a physical system with one or more unknowns. Their advantages: faster processing and consumption of less energy than the power of the central processing unit (CPU). In this paper, we will define and implement the Lagrange polynomial interpolation method on GPU and CPU to calculate the sodium density at different temperatures Ti using the NVIDIA CUDA C parallel programming model. It can increase computational performance by harnessing the power of the GPU. The objective of this study is to compare the performance of the implementation of the Lagrange interpolation method on CPU and GPU processors and to deduce the efficiency of the use of GPUs for parallel computing.


SIMULATION ◽  
2016 ◽  
Vol 93 (1) ◽  
pp. 69-84 ◽  
Author(s):  
Shailesh Tamrakar ◽  
Paul Richmond ◽  
Roshan M D’Souza

Agent-based models (ABMs) are increasingly being used to study population dynamics in complex systems, such as the human immune system. Previously, Folcik et al. (The basic immune simulator: an agent-based model to study the interactions between innate and adaptive immunity. Theor Biol Med Model 2007; 4: 39) developed a Basic Immune Simulator (BIS) and implemented it using the Recursive Porous Agent Simulation Toolkit (RePast) ABM simulation framework. However, frameworks such as RePast are designed to execute serially on central processing units and therefore cannot efficiently handle large model sizes. In this paper, we report on our implementation of the BIS using FLAME GPU, a parallel computing ABM simulator designed to execute on graphics processing units. To benchmark our implementation, we simulate the response of the immune system to a viral infection of generic tissue cells. We compared our results with those obtained from the original RePast implementation for statistical accuracy. We observe that our implementation has a 13× performance advantage over the original RePast implementation.


2010 ◽  
Vol 133 (2) ◽  
Author(s):  
Tobias Brandvik ◽  
Graham Pullan

A new three-dimensional Navier–Stokes solver for flows in turbomachines has been developed. The new solver is based on the latest version of the Denton codes but has been implemented to run on graphics processing units (GPUs) instead of the traditional central processing unit. The change in processor enables an order-of-magnitude reduction in run-time due to the higher performance of the GPU. The scaling results for a 16 node GPU cluster are also presented, showing almost linear scaling for typical turbomachinery cases. For validation purposes, a test case consisting of a three-stage turbine with complete hub and casing leakage paths is described. Good agreement is obtained with previously published experimental results. The simulation runs in less than 10 min on a cluster with four GPUs.


Author(s):  
Ana Moreton–Fernandez ◽  
Hector Ortega–Arranz ◽  
Arturo Gonzalez–Escribano

Nowadays the use of hardware accelerators, such as the graphics processing units or XeonPhi coprocessors, is key in solving computationally costly problems that require high performance computing. However, programming solutions for an efficient deployment for these kind of devices is a very complex task that relies on the manual management of memory transfers and configuration parameters. The programmer has to carry out a deep study of the particular data that needs to be computed at each moment, across different computing platforms, also considering architectural details. We introduce the controller concept as an abstract entity that allows the programmer to easily manage the communications and kernel launching details on hardware accelerators in a transparent way. This model also provides the possibility of defining and launching central processing unit kernels in multi-core processors with the same abstraction and methodology used for the accelerators. It internally combines different native programming models and technologies to exploit the potential of each kind of device. Additionally, the model also allows the programmer to simplify the proper selection of values for several configuration parameters that can be selected when a kernel is launched. This is done through a qualitative characterization process of the kernel code to be executed. Finally, we present the implementation of the controller model in a prototype library, together with its application in several case studies. Its use has led to reductions in the development and porting costs, with significantly low overheads in the execution times when compared to manually programmed and optimized solutions which directly use CUDA and OpenMP.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255397
Author(s):  
Moritz Knolle ◽  
Georgios Kaissis ◽  
Friederike Jungmann ◽  
Sebastian Ziegelmayer ◽  
Daniel Sasse ◽  
...  

The success of deep learning in recent years has arguably been driven by the availability of large datasets for training powerful predictive algorithms. In medical applications however, the sensitive nature of the data limits the collection and exchange of large-scale datasets. Privacy-preserving and collaborative learning systems can enable the successful application of machine learning in medicine. However, collaborative protocols such as federated learning require the frequent transfer of parameter updates over a network. To enable the deployment of such protocols to a wide range of systems with varying computational performance, efficient deep learning architectures for resource-constrained environments are required. Here we present MoNet, a small, highly optimized neural-network-based segmentation algorithm leveraging efficient multi-scale image features. MoNet is a shallow, U-Net-like architecture based on repeated, dilated convolutions with decreasing dilation rates. We apply and test our architecture on the challenging clinical tasks of pancreatic segmentation in computed tomography (CT) images as well as brain tumor segmentation in magnetic resonance imaging (MRI) data. We assess our model’s segmentation performance and demonstrate that it provides performance on par with compared architectures while providing superior out-of-sample generalization performance, outperforming larger architectures on an independent validation set, while utilizing significantly fewer parameters. We furthermore confirm the suitability of our architecture for federated learning applications by demonstrating a substantial reduction in serialized model storage requirement as a surrogate for network data transfer. Finally, we evaluate MoNet’s inference latency on the central processing unit (CPU) to determine its utility in environments without access to graphics processing units. Our implementation is publicly available as free and open-source software.


Sensors ◽  
2019 ◽  
Vol 19 (9) ◽  
pp. 2124 ◽  
Author(s):  
Yingzhong Tian ◽  
Xining Liu ◽  
Long Li ◽  
Wenbin Wang

Iterative closest point (ICP) is a method commonly used to perform scan-matching and registration. To be a simple and robust algorithm, it is still computationally expensive, and it has been regarded as having a crucial challenge especially in a real-time application as used for the simultaneous localization and mapping (SLAM) problem. For these reasons, this paper presents a new method for the acceleration of ICP with an assisted intensity. Unlike the conventional ICP, this method is proposed to reduce the computational cost and avoid divergences. An initial transformation guess is computed with an assisted intensity for their relative rigid-body transformation. Moreover, a target function is proposed to determine the best initial transformation guess based on the statistic of their spatial distances and intensity residuals. Additionally, this method is also proposed to reduce the iteration number. The Anderson acceleration is utilized for increasing the iteration speed which has better ability than the Picard iteration procedure. The proposed algorithm is operated in real time with a single core central processing unit (CPU) thread. Hence, it is suitable for the robot which has limited computation resources. To validate the novelty, this proposed method is evaluated on the SEMANTIC3D.NET benchmark dataset. According to comparative results, the proposed method is declared as having better accuracy and robustness than the conventional ICP methods.


Sign in / Sign up

Export Citation Format

Share Document