parallel acceleration Latest Research Papers

YOLO-Tiny is a lightweight version of the object detection model based on the original “You only look once” (YOLO) model for simplifying network structure and reducing parameters, which makes it suitable for real-time applications. Although the YOLO-Tiny series, which includes YOLOv3-Tiny and YOLOv4-Tiny, can achieve real-time performance on a powerful GPU, it remains challenging to leverage this approach for real-time object detection on embedded computing devices, such as those in small intelligent trajectory cars. To obtain real-time and high-accuracy performance on these embedded devices, a novel object detection lightweight network called embedded YOLO is proposed in this paper. First, a new backbone network structure, ASU-SPP network, is proposed to enhance the effectiveness of low-level features. Then, we designed a simplified version of the neck network module PANet-Tiny that reduces computation complexity. Finally, in the detection head module, we use depthwise separable convolution to reduce the number of convolution stacks. In addition, the number of channels is reduced to 96 dimensions so that the module can attain the parallel acceleration of most inference frameworks. With its lightweight design, the proposed embedded YOLO model has only 3.53M parameters, and the average processing time can reach 155.1 frames per second, as verified by Baidu smart car target detection. At the same time, compared with YOLOv3-Tiny and YOLOv4-Tiny, the detection accuracy is 6% higher.

Download Full-text

GPU-Based Dynamic Hyperspace Hash with Full Concurrency

Data Science and Engineering ◽

10.1007/s41019-021-00161-5 ◽

2021 ◽

Author(s):

Zhuo Ren ◽

Yu Gu ◽

Chuanwen Li ◽

FangFang Li ◽

Ge Yu

Keyword(s):

Control Strategy ◽

Concurrency Control ◽

High Performance ◽

Hash Table ◽

Multidimensional Space ◽

Retrieval Performance ◽

Data Bases ◽

Sharing Strategy ◽

The Rich ◽

Parallel Acceleration

AbstractHyperspace hashing which is often applied to NoSQL data-bases builds indexes by mapping objects with multiple attributes to a multidimensional space. It can accelerate processing queries of some secondary attributes in addition to just primary keys. In recent years, the rich computing resources of GPU provide opportunities for implementing high-performance HyperSpace Hash. In this study, we construct a fully concurrent dynamic hyperspace hash table for GPU. By using atomic operations instead of locking, we make our approach highly parallel and lock-free. We propose a special concurrency control strategy that ensures wait-free read operations. Our data structure is designed considering GPU specific hardware characteristics. We also propose a warp-level pre-combinations data sharing strategy to obtain high parallel acceleration. Experiments on an Nvidia RTX2080Ti GPU suggest that GHSH performs about 20-100X faster than its counterpart on CPU. Specifically, GHSH performs updates with up to 396 M updates/s and processes search queries with up to 995 M queries/s. Compared to other GPU hashes that cannot conduct queries on non-key attributes, GHSH demonstrates comparable building and retrieval performance.

Download Full-text

Adaptable Parallel Acceleration Strategy for Dynamic Monte Carlo Simulations of Polymerization with Microscopic Resolution

Industrial & Engineering Chemistry Research ◽

10.1021/acs.iecr.0c05795 ◽

2021 ◽

Author(s):

Rui Liu ◽

Antonios Armaou ◽

Xi Chen

Keyword(s):

Monte Carlo ◽

Monte Carlo Simulations ◽

Dynamic Monte Carlo ◽

Parallel Acceleration

Download Full-text

Parallel Acceleration and Improvement of Gravitational Field Optimization Algorithm

Tehnicki vjesnik - Technical Gazette ◽

10.17559/tv-20191217030336 ◽

2021 ◽

Vol 28 (2) ◽

Keyword(s):

Gravitational Field ◽

Optimization Algorithm ◽

Parallel Acceleration

Download Full-text

Statistical analysis of the accelerated H2O ions above 1 keV: the comet 67P/Churyumov–Gerasimenko observed by the Rosetta spacecraft.

10.5194/egusphere-egu21-378 ◽

2021 ◽

Author(s):

Tsubasa Kotani ◽

Masatoshi Yamauchi ◽

Hans Nilsson ◽

Gabriella Stenberg-Wieser ◽

Martin Wieser ◽

...

Keyword(s):

Magnetic Field ◽

Solar Wind ◽

Statistical Analysis ◽

The Sun ◽

Ion Composition ◽

Solar Winds ◽

Comet Activity ◽

Parallel Acceleration ◽

Comet 67P ◽

The Magnetic Field

The ESA/Rosetta spacecraft has studied the comet 67P/Churyumov-Gerasimenko for two years. Rosetta Plasma Consortium's Ion Composition Analyser (RPC/ICA) detected comet-origin water ions that are accelerated to > 100 eV.&#160; Majority of them are interpreted as ordinary pick-up acceleration&#160; by the solar wind electric field perpendicular to the magnetic field during low comet activity [1,2]. As the comet approaches the sun, a comet magnetosphere is formed, where solar winds cannot intrude.However,&#160; some water ions are accelerated to > 1 keV even in the magnetosphere [3]. Using RPC/ICA data during two years [4], we investigate the acceleration events > 1 keV where solar winds are not observed, and classify dispersion events with respect to the directions of the sun, the comet, and the magnetic field.&#160; Majority of these water ions show reversed energy-angle dispersion. Results of the investigation also show that these ions are flowing along the (enhanced) magnetic field, indicating that the parallel acceleration occurs in the magnetosphere.In this meeting, we show a statistical analysis and discuss a possible acceleration mechanism.References[1] H. Nilsson et al., MNRAS 469, 252 (2017), doi:10.1093/mnras/stx1491[2] G. Nicolau et al., MNRAS 469, 339 (2017), doi:10.1093/mnras/stx1621[3] T. Kotani et al., EPSC, EPSC2020-576 (2020), https://doi.org/10.5194/epsc2020-576[4] H. Nilsson et al., Space Sci. Rev., 128, 671 (2007), DOI: 10.1007/s11214-006-9031-z&#160;

Download Full-text

Advanced parallel implementation of the coupled ocean–ice model FEMAO (version 2.0) with load balancing

Geoscientific Model Development ◽

10.5194/gmd-14-843-2021 ◽

2021 ◽

Vol 14 (2) ◽

pp. 843-857

Author(s):

Pavel Perezhogin ◽

Ilya Chernov ◽

Nikolay Iakovlev

Keyword(s):

Load Balancing ◽

Sea Ice ◽

Parallel Implementation ◽

Ocean Dynamics ◽

The Arctic ◽

Computational Domain ◽

Dynamics Model ◽

Ice Dynamics ◽

Model Domain ◽

Parallel Acceleration

Abstract. In this paper, we present a parallel version of the finite-element model of the Arctic Ocean (FEMAO) configured for the White Sea and based on MPI technology. This model consists of two main parts: an ocean dynamics model and a surface ice dynamics model. These parts are very different in terms of the number of computations because the complexity of the ocean part depends on the bottom depth, while that of the sea-ice component does not. In the first step, we decided to locate both submodels on the same CPU cores with a common horizontal partition of the computational domain. The model domain is divided into small blocks, which are distributed over the CPU cores using Hilbert-curve balancing. Partitioning of the model domain is static (i.e., computed during the initialization stage). There are three baseline options: a single block per core, balancing of 2D computations, and balancing of 3D computations. After showing parallel acceleration for particular ocean and ice procedures, we construct the common partition, which minimizes joint imbalance in both submodels. Our novelty is using arrays shared by all blocks that belong to a CPU core instead of allocating separate arrays for each block, as is usually done. Computations on a CPU core are restricted by the masks of non-land grid nodes and block–core correspondence. This approach allows us to implement parallel computations into the model that are as simple as when the usual decomposition to squares is used, though with advances in load balancing. We provide parallel acceleration of up to 996 cores for the model with a resolution of 500×500×39 in the ocean component and 43 sea-ice scalars, and we carry out a detailed analysis of different partitions on the model runtime.

Download Full-text

Research on GPU Parallel Acceleration of Efficient Coherent Integration Processor for Passive Radar

10.1007/978-3-030-90196-7_35 ◽

2021 ◽

pp. 415-422

Author(s):

Zirong Bu ◽

Lijun Wang ◽

Huijie Zhu

Keyword(s):

Passive Radar ◽

Coherent Integration ◽

Parallel Acceleration

Download Full-text

Effects of the parallel acceleration on heavy impurity transport in turbulent tokamak plasmas

Plasma Physics and Controlled Fusion ◽

10.1088/1361-6587/abd226 ◽

2020 ◽

Author(s):

Madalina Vlad ◽

Dragos Iustin Palade ◽

Florin Spineanu

Keyword(s):

Tokamak Plasmas ◽

Impurity Transport ◽

Heavy Impurity ◽

Parallel Acceleration

Download Full-text

Advanced parallel implementation of the coupled ocean-ice model FEMAO with load balancing

10.5194/gmd-2020-182 ◽

2020 ◽

Author(s):

Pavel Perezhogin ◽

Ilya Chernov ◽

Nikolay Iakovlev

Keyword(s):

Load Balancing ◽

Sea Ice ◽

Ocean Dynamics ◽

The Arctic ◽

Computational Domain ◽

Dynamics Model ◽

Ice Dynamics ◽

Model Domain ◽

The Common ◽

Parallel Acceleration

Abstract. In this paper, we present a parallel version of the finite element model of the Arctic Ocean (FEMAO) configured for the White sea and based on the MPI technology. This model consists of two main parts: an ocean dynamics model and a surface ice dynamics model. These parts are very different in terms of the amount of computations because the complexity of the ocean part depends on the bottom depth, while that of the sea-ice component does not. In the first step, we decided to locate both submodels on the same CPU cores with the common horizontal partition of the computational domain. The model domain is divided into small blocks, which are distributed over the CPU cores using Hilbert-curve balancing. Partition of the model domain is static (i.e., computed during the initialization stage). There are three baseline options: single block per core, balancing of 2D computations and balancing of 3D computations. After showing parallel acceleration for particular ocean and ice procedures, we construct the common partition, which minimizes joint imbalance in both submodels. Our novelty is using arrays shared by all blocks that belong to a CPU core instead of allocating separate arrays for each block, as is usually done. Computations on a CPU core are restricted by the masks of not-land grid nodes and block-core correspondence. This approach allows us to implement parallel computations into the model that are as simple as when the usual decomposition to squares is used, though with advances of load balancing. We provide parallel acceleration of up to 996 cores for the model with resolution 500 × 500 × 39 in the ocean component and 43 sea-ice scalars, and we carry out detailed analysis of different partitions on the model runtime.

Download Full-text

Parallel acceleration of CPU and GPU range queries over large data sets

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-020-00191-w ◽

2020 ◽

Vol 9 (1) ◽

Author(s):

Mitchell Nelson ◽

Zachary Sorenson ◽

Joseph M. Myre ◽

Jason Sawin ◽

David Chiu

Keyword(s):

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Range Queries ◽

Parallel Acceleration

Download Full-text

parallel acceleration
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Embedded YOLO: A Real-Time Object Detector for Small Intelligent Trajectory Cars

GPU-Based Dynamic Hyperspace Hash with Full Concurrency

Adaptable Parallel Acceleration Strategy for Dynamic Monte Carlo Simulations of Polymerization with Microscopic Resolution

Parallel Acceleration and Improvement of Gravitational Field Optimization Algorithm

Statistical analysis of the accelerated H2O ions above 1 keV: the comet 67P/Churyumov–Gerasimenko observed by the Rosetta spacecraft.

Advanced parallel implementation of the coupled ocean–ice model FEMAO (version 2.0) with load balancing

Research on GPU Parallel Acceleration of Efficient Coherent Integration Processor for Passive Radar

Effects of the parallel acceleration on heavy impurity transport in turbulent tokamak plasmas

Advanced parallel implementation of the coupled ocean-ice model FEMAO with load balancing

Parallel acceleration of CPU and GPU range queries over large data sets

Export Citation Format

parallel accelerationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Embedded YOLO: A Real-Time Object Detector for Small Intelligent Trajectory Cars

GPU-Based Dynamic Hyperspace Hash with Full Concurrency

Adaptable Parallel Acceleration Strategy for Dynamic Monte Carlo Simulations of Polymerization with Microscopic Resolution

Parallel Acceleration and Improvement of Gravitational Field Optimization Algorithm

Statistical analysis of the accelerated H2O ions above 1 keV: the comet 67P/Churyumov–Gerasimenko observed by the Rosetta spacecraft.

Advanced parallel implementation of the coupled ocean–ice model FEMAO (version 2.0) with load balancing

Research on GPU Parallel Acceleration of Efficient Coherent Integration Processor for Passive Radar

Effects of the parallel acceleration on heavy impurity transport in turbulent tokamak plasmas

Advanced parallel implementation of the coupled ocean-ice model FEMAO with load balancing

Parallel acceleration of CPU and GPU range queries over large data sets

parallel acceleration
Recently Published Documents