Modelling realistic ballast shape to study the lateral pull behaviour using GPU computing

EPJ Web of Conferences ◽

10.1051/epjconf/202124906003 ◽

2021 ◽

Vol 249 ◽

pp. 06003

Author(s):

François Nader ◽

Patrick Pizette ◽

Nicolin Govender ◽

Daniel N. Wilke ◽

Jean-François Ferellec

Keyword(s):

Particle Shape ◽

Gpu Computing ◽

Computational Cost ◽

Processing Unit ◽

Engineering Structures ◽

Stick Slip ◽

Pull Test ◽

Central Processing ◽

Polyhedral Shape ◽

Spherical Grains

The use of the Discrete Element Method to model engineering structures implementing granular materials has proven to be an efficient method to response under various behaviour conditions. However, the computational cost of the simulations increases rapidly, as the number of particles and particle shape complexity increases. An affordable solution to render problems computationally tractable is to use graphical processing units (GPU) for computing. Modern GPUs offer up 10496 compute cores, which allows for a greater parallelisation relative to 32-cores offered by high-end Central Processing Unit (CPU) compute. This study outlines the application of BlazeDEM-GPU, using an RTX 2080Ti GPU (4352 cores), to investigate the influence of the modelling of particle shape on the lateral pull behaviour of granular ballast systems used in railway applications. The idea is to validate the model and show the benefits of simulating non-spherical shapes in future large-scale tests. The algorithm, created to generate the shape of the ballast based on real grain scans, and using polyhedral shape approximations of varying degrees of complexity is shown. The particle size is modelled to scale. A preliminary investigation of the effect of the grain shape is conducted, where a sleeper lateral pull test is carried out in a spherical grains sample, and a cubic grains sample. Preliminary results show that elementary polyhedral shape representations (cubic) recreate some of the characteristic responses in the lateral pull test, such as stick/slip phenomena and force chain distributions, which looks promising for future works on railway simulations. These responses that cannot be recreated with simple spherical grains, unless heuristics are added, which requires additional calibration and approximations. The significant reduction in time when using non-spherical grains also implies that larger granular systems can be investigated.

Get full-text (via PubEx)

An efficient solution for fast generation of multi-GNSS real-time products

10.5194/egusphere-egu21-8306 ◽

2021 ◽

Author(s):

Hongjie Zheng ◽

Hanyu Chang ◽

Yongqiang Yuan ◽

Qingyun Wang ◽

Yuhao Li ◽

...

Keyword(s):

Data Processing ◽

Real Time ◽

Processing Time ◽

Efficient Solution ◽

Gpu Computing ◽

Sampling Rate ◽

Precise Orbit Determination ◽

Processing Unit ◽

Processing Efficiency ◽

Central Processing

Global navigation satellite systems (GNSS) have been playing an indispensable role in providing positioning, navigation and timing (PNT) services to global users. Over the past few years, GNSS have been rapidly developed with abundant networks, modern constellations, and multi-frequency observations. To take full advantages of multi-constellation and multi-frequency GNSS, several new mathematic models have been developed such as multi-frequency ambiguity resolution (AR) and the uncombined data processing with raw observations. In addition, new GNSS products including the uncalibrated phase delay (UPD), the observable signal bias (OSB), and the integer recovery clock (IRC) have been generated and provided by analysis centers to support advanced GNSS applications.&#160;&#160;&#160;&#160;&#160;&#160; However, the increasing number of GNSS observations raises a great challenge to the fast generation of multi-constellation and multi-frequency products. In this study, we proposed an efficient solution to realize the fast updating of multi-GNSS real-time products by making full use of the advanced computing techniques. Firstly, instead of the traditional vector operations, the &#8220;level-3 operations&#8221; (matrix by matrix) of Basic Liner Algebra Subprograms (BLAS) is used as much as possible in the Least Square (LSQ) processing, which can improve the efficiency due to the central processing unit (CPU) optimization and faster memory data transmission. Furthermore, most steps of multi-GNSS data processing are transformed from serial mode to parallel mode to take advantage of the multi-core CPU architecture and graphics processing unit (GPU) computing resources. Moreover, we choose the OpenBLAS library for matrix computation as it has good performances in parallel environment.&#160;&#160;&#160;&#160;&#160;&#160; The proposed method is then validated on a 3.30 GHz AMD CPU with 6 cores. The result demonstrates that the proposed method can substantially improve the processing efficiency for multi-GNSS product generation. For the precise orbit determination (POD) solution with 150 ground stations and 128 satellites (GPS/BDS/Galileo/GLONASS/QZSS) in ionosphere-free (IF) mode, the processing time can be shortened from 50 to 10 minutes, which can guarantee the hourly updating of multi-GNSS ultra-rapid orbit products. The processing time of uncombined POD can also be reduced by about 80%. Meanwhile, the multi-GNSS real-time clock products can be easily generated in 5 seconds or even higher sampling rate. In addition, the processing efficiency of UPD and OSB products can also be increased by 4-6 times.

Get full-text (via PubEx)

Graphics processing unit implementation of the F-statistic for continuous gravitational wave searches

Classical and Quantum Gravity ◽

10.1088/1361-6382/ac4616 ◽

2021 ◽

Author(s):

Liam Dunn ◽

Patrick Clearwater ◽

Andrew Melatos ◽

Karl Wette

Keyword(s):

Gravitational Wave ◽

Graphics Processing Units ◽

Graphics Processing Unit ◽

Computational Cost ◽

Processing Unit ◽

Central Processing ◽

Long Baseline ◽

Using Data ◽

Graphics Processing ◽

Gpu Implementation

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.

Get full-text (via PubEx)

Intensity-Assisted ICP for Fast Registration of 2D-LIDAR

Sensors ◽

10.3390/s19092124 ◽

2019 ◽

Vol 19 (9) ◽

pp. 2124 ◽

Cited By ~ 3

Author(s):

Yingzhong Tian ◽

Xining Liu ◽

Long Li ◽

Wenbin Wang

Keyword(s):

Real Time ◽

Computational Cost ◽

Target Function ◽

Picard Iteration ◽

Processing Unit ◽

Central Processing ◽

Localization And Mapping ◽

Initial Transformation ◽

Comparative Results ◽

Rigid Body Transformation

Iterative closest point (ICP) is a method commonly used to perform scan-matching and registration. To be a simple and robust algorithm, it is still computationally expensive, and it has been regarded as having a crucial challenge especially in a real-time application as used for the simultaneous localization and mapping (SLAM) problem. For these reasons, this paper presents a new method for the acceleration of ICP with an assisted intensity. Unlike the conventional ICP, this method is proposed to reduce the computational cost and avoid divergences. An initial transformation guess is computed with an assisted intensity for their relative rigid-body transformation. Moreover, a target function is proposed to determine the best initial transformation guess based on the statistic of their spatial distances and intensity residuals. Additionally, this method is also proposed to reduce the iteration number. The Anderson acceleration is utilized for increasing the iteration speed which has better ability than the Picard iteration procedure. The proposed algorithm is operated in real time with a single core central processing unit (CPU) thread. Hence, it is suitable for the robot which has limited computation resources. To validate the novelty, this proposed method is evaluated on the SEMANTIC3D.NET benchmark dataset. According to comparative results, the proposed method is declared as having better accuracy and robustness than the conventional ICP methods.

Get full-text (via PubEx)

CPU AND GPU (CUDA) TEMPLATE MATCHING COMPARISON / CPU IR GPU (CUDA) PALYGINIMAS VYKDANT ŠABLONŲ ATITIKTIES ALGORITMĄ

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2014.16 ◽

2014 ◽

Vol 6 (2) ◽

pp. 129-133

Author(s):

Evaldas Borcovas ◽

Gintautas Daunys

Keyword(s):

Template Matching ◽

Gpu Computing ◽

Computing Time ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Central Processing ◽

Device Architecture ◽

Cuda Technology ◽

Dual Core ◽

Template Size

Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I), NVidia GeForce GT320M CUDAcompliable graphics card (GPU I) and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II), NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II).Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have. Vaizdų apdorojimas, kompiuterinė rega ir kiti sudėtingi algoritmai, apdorojantys optinę informaciją, naudoja dideliusskaičiavimo išteklius. Dažnai šiuos algoritmus reikia realizuoti realiuoju laiku. Šį uždavinį išspręsti naudojant tik vienoCPU (angl. Central processing unit) pajėgumus yra sudėtinga. nVidia pasiūlyta CUDA (angl. Compute unified device architecture)technologija leidžia panaudoti GPU (angl. Graphic processing unit) išteklius. Tyrimui atlikti buvo pasirinkti du skirtingiCPU: Intel Pentium Dual-Core T4500 ir Intel Core I5 2500K, bei GPU: nVidia GeForce GT320M ir NVidia GeForce 560.Tyrime buvo panaudotos vaizdų apdorojimo bibliotekos: OpenCV 2.1 ir OpenCV 2.4. Tyrimui buvo pasirinktas šablonų atitiktiesalgoritmas. Algoritmui realizuoti reikalingas analizuojamas vaizdas ir ieškomo objekto vaizdo šablonas. Tyrimo metu buvokeičiamas vaizdo ir šablono dydis bei stebima, kaip tai veikia algoritmo vykdymo trukmę ir vykdomų operacijų skaičių persekundę. Iš gautų rezultatų galima teigti, kad apdorojant didelį duomenų kiekį GPU realizuoja algoritmą iki 24 kartų greičiaunei tik CPU. Dirbant su nedideliu duomenų kiekiu, skirtumas tarp CPU ir GPU yra minimalus. Lyginant skaičiavimus dviejuoseGPU, pastebėta, kad skaičiavimų sparta yra tiesiogiai proporcinga GPU turimų branduolių kiekiui. Mūsų tyrimo atvejuspartesniame GPU jų buvo 16 kartų daugiau, tad ir skaičiavimai vyko 16 kartų sparčiau.

Get full-text (via PubEx)

Effective Implementation of Edge-Preserving Filtering on CPU Microarchitectures

Applied Sciences ◽

10.3390/app8101985 ◽

2018 ◽

Vol 8 (10) ◽

pp. 1985 ◽

Cited By ~ 5

Author(s):

Yoshihiro Maeda ◽

Norishige Fukushima ◽

Hiroshi Matsuo

Keyword(s):

Computational Cost ◽

Bilateral Filter ◽

Processing Unit ◽

Normal Numbers ◽

Edge Preserving ◽

Central Processing ◽

Local Means ◽

Kernel Weights ◽

Computational Performance ◽

Non Local

In this paper, we propose acceleration methods for edge-preserving filtering. The filters natively include denormalized numbers, which are defined in IEEE Standard 754. The processing of the denormalized numbers has a higher computational cost than normal numbers; thus, the computational performance of edge-preserving filtering is severely diminished. We propose approaches to prevent the occurrence of the denormalized numbers for acceleration. Moreover, we verify an effective vectorization of the edge-preserving filtering based on changes in microarchitectures of central processing units by carefully treating kernel weights. The experimental results show that the proposed methods are up to five-times faster than the straightforward implementation of bilateral filtering and non-local means filtering, while the filters maintain the high accuracy. In addition, we showed effective vectorization for each central processing unit microarchitecture. The implementation of the bilateral filter is up to 14-times faster than that of OpenCV. The proposed methods and the vectorization are practical for real-time tasks such as image editing.

Get full-text (via PubEx)

An Improved Back-Projection Algorithm for GNSS-R BSAR Imaging Based on CPU and GPU Platform

Remote Sensing ◽

10.3390/rs13112107 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2107

Author(s):

Shiyu Wu ◽

Zhichao Xu ◽

Feng Wang ◽

Dongkai Yang ◽

Gongjian Guo

Keyword(s):

Graphics Processing Units ◽

Low Cost ◽

Computational Cost ◽

Satellite System ◽

Projection Algorithm ◽

Synthetic Aperture ◽

Processing Unit ◽

Back Projection ◽

Imaging Quality ◽

Central Processing

Global Navigation Satellite System Reflectometry Bistatic Synthetic Aperture Radar (GNSS-R BSAR) is becoming more and more important in remote sensing because of its low power, low mass, low cost, and real-time global coverage capability. The Back Projection Algorithm (BPA) was usually selected as the GNSS-R BSAR imaging algorithm because it can process echo signals of complex geometric configurations. However, the huge computational cost is a challenge for its application in GNSS-R BSAR. Graphics Processing Units (GPU) provides an efficient computing platform for GNSS-R BSAR processing. In this paper, a solution accelerating the BPA of GNSS-R BSAR using GPU is proposed to improve imaging efficiency, and a matching pre-processing program was proposed to synchronize direct and echo signals to improve imaging quality. To process hundreds of gigabytes of data collected by a long-time synthetic aperture in fixed station mode, a stream processing structure was used to process such a large amount of data to solve the problem of limited GPU memory. In the improvement of the imaging efficiency, the imaging task is divided into pre-processing and BPA, which are performed in the Central Processing Unit (CPU) and GPU, respectively, and a pixel-oriented parallel processing method in back projection is adopted to avoid memory access conflicts caused by excessive data volume. The improved BPA with the long synthetic aperture time is verified through the simulation of and experimenting on the GPS-L5 signal. The results show that the proposed accelerating solution is capable of taking approximately 128.04 s, which is 156 times lower than pure CPU framework for producing a size of 600 m × 600 m image with 1800 s synthetic aperture time; in addition, the same imaging quality with the existing processing solution can be retained.

Get full-text (via PubEx)

Implementation of Real Time Hybrid Simulation Based on GPU Computing

10.21203/rs.3.rs-596198/v1 ◽

2021 ◽

Author(s):

Zhenyun Tang ◽

Xiaohui Dong ◽

Zhenbao Li ◽

Xiuli Du

Keyword(s):

Real Time ◽

Degrees Of Freedom ◽

Gpu Computing ◽

Dynamic Performance ◽

Hybrid Simulation ◽

Shaking Table ◽

Processing Unit ◽

Engineering Structures ◽

Element Analysis ◽

Complex Engineering

Abstract With combination of physical experiment and numerical simulation, real-time hybrid simulation (RTHS) can enlarge the dimensions of testing specimens and improve the testing accuracy. However, due to the limitation of computing capacity, the maximum degrees of freedom for numerical substructure are less than 2000 from the reported RTHS testing. It cannot meet the testing requirements for evaluating the dynamic performance of large and complex engineering structures. Taking advantages of parallel computing toolbox (PCT) in Matlab and high-performance computing of graphics processing unit (GPU). A RTHS framework based on MATLAB and GPU was established in this work. Using this framework, a soil-structure interaction system (SSI) was tested by a shaking table based RTHS. Meanwhile, the dynamic response of this SSI system was simulated by finite element analysis. The comparison of simulation and testing results demonstrated that the proposed testing framework can implement RTHS testing successfully. Using this method, the maximum degrees of freedom for numerical substructure can reach to 27,000, which significantly enhance the testing capacity of RTHS testing for large and complex engineering structures.

Get full-text (via PubEx)

Accelerating computational discovery of porous solids through improved navigation of energy-structure-function maps

Science Advances ◽

10.1126/sciadv.abi4763 ◽

2021 ◽

Vol 7 (33) ◽

pp. eabi4763

Author(s):

Edward O. Pyzer-Knapp ◽

Linjiang Chen ◽

Graeme M. Day ◽

Andrew I. Cooper

Keyword(s):

Structure Function ◽

Computational Cost ◽

Bayesian Optimization ◽

Porous Solids ◽

Energy Structure ◽

Processing Unit ◽

Central Processing ◽

Property Data ◽

Computational Discovery ◽

The Cost

While energy-structure-function (ESF) maps are a powerful new tool for in silico materials design, the cost of acquiring an ESF map for many properties is too high for routine integration into high-throughput virtual screening workflows. Here, we propose the next evolution of the ESF map. This uses parallel Bayesian optimization to selectively acquire energy and property data, generating the same levels of insight at a fraction of the computational cost. We use this approach to obtain a two orders of magnitude speedup on an ESF study that focused on the discovery of molecular crystals for methane capture, saving more than 500,000 central processing unit hours from the original protocol. By accelerating the acquisition of insight from ESF maps, we pave the way for the use of these maps in automated ultrahigh-throughput screening pipelines by greatly reducing the opportunity risk associated with the choice of system to calculate.

Get full-text (via PubEx)

DEVELOPING PARALLEL COMPUTING ALGORITHMS USING GPU’S TO DETERMINE OIL AND GAS RESERVES PRESENTED IN THE UPSTREAM (EXPLORATION) SECTOR

Proceedings of the International Conference on Emerging Trends in Engineering & Technology (IConETech-2020) ◽

10.47412/mruu5197 ◽

2020 ◽

Author(s):

Stefan Boodoo ◽

Ajay Joshi

Keyword(s):

High Performance ◽

Oil And Gas ◽

Gpu Computing ◽

Graphics Processing Unit ◽

Reservoir Rock ◽

Processing Unit ◽

Potential Wells ◽

Central Processing ◽

Rock Formations ◽

Graphics Processing

Oil and Gas companies keep exploring every new possible method to increase the likelihood of finding a commercial hydrocarbon bearing prospect. Well logging generates gigabytes of data from various probes and sensors. After processing, a prospective reservoir will indicate areas of oil, gas, water and reservoir rock. Incorporating High Performance Computing (HPC) methodologies will allow for thousands of potential wells to be indicative of its hydrocarbon bearing potential. This study will present the use of the Graphics Processing Unit (GPU) computing as another method of analyzing probable reserves. Raw well log data from the Kansas Geological Society (1999-2018) forms the basis of the data analysis. Parallel algorithms are developed and make use of Nvidia’s Compute Unified Device Architecture (CUDA). The results gathered highlight a 5 times speedup using a Nvidia GeForce GT 330M GPU as compared to an Intel Core i7 740QM Central Processing Unit (CPU). The processed results display depth wise areas of shale and rock formations as well as water, oil and/or gas reserves.

Get full-text (via PubEx)

Implementation of Real Time Hybrid Simulation Based On GPU Computing

10.21203/rs.3.rs-596198/v2 ◽

2021 ◽

Author(s):

Zhenyun Tang ◽

Xiaohui Dong ◽

Zhenbao Li ◽

Xiuli Du

Keyword(s):

Real Time ◽

Degrees Of Freedom ◽

Gpu Computing ◽

Dynamic Performance ◽

Hybrid Simulation ◽

Shaking Table ◽

Processing Unit ◽

Engineering Structures ◽

Element Analysis ◽

Complex Engineering

Abstract With combination of physical experiment and numerical simulation, real-time hybrid simulation (RTHS) can enlarge the dimensions of testing specimens and improve the testing accuracy. However, due to the limitation of computing capacity, the maximum degrees of freedom for numerical substructure are less than 7000 from the reported RTHS testing. It cannot meet the testing requirements for evaluating the dynamic performance of large and complex engineering structures. Taking advantages of parallel computing toolbox (PCT) in Matlab and high-performance computing of graphics processing unit (GPU). A RTHS framework based on MATLAB and GPU was established in this work. Using this framework, a soil-structure interaction system (SSI) was tested by a shaking table based RTHS. Meanwhile, the dynamic response of this SSI system was simulated by finite element analysis. The comparison of simulation and testing results demonstrated that the proposed testing framework can implement RTHS testing successfully. Using this method, the maximum degrees of freedom for numerical substructure can reach to 27,000, which significantly enhance the testing capacity of RTHS testing for large and complex engineering structures.

Get full-text (via PubEx)