efficient parallelization Latest Research Papers

AbstractNanopore sequencing depends on the FAST5 file format, which does not allow efficient parallel analysis. Here we introduce SLOW5, an alternative format engineered for efficient parallelization and acceleration of nanopore data analysis. Using the example of DNA methylation profiling of a human genome, analysis runtime is reduced from more than two weeks to approximately 10.5 h on a typical high-performance computer. SLOW5 is approximately 25% smaller than FAST5 and delivers consistent improvements on different computer architectures.

Download Full-text

A Parallel Approach of the Enhanced Craig–Bampton Method

Mathematics ◽

10.3390/math9243278 ◽

2021 ◽

Vol 9 (24) ◽

pp. 3278

Author(s):

Petr Pařík ◽

Jin-Gyun Kim ◽

Martin Isoz ◽

Chang-uk Ahn

Keyword(s):

Linear Algebra ◽

Graph Partitioning ◽

Large Scale ◽

Parallel Implementation ◽

Parallel Architecture ◽

Structural Vibration ◽

Accuracy Improvement ◽

Vibration Problems ◽

Efficient Parallelization ◽

Reduced Matrices

The enhanced Craig–Bampton (ECB) method is a novel extension of the original Craig–Bampton (CB) method, which has been widely used for component mode synthesis (CMS). The ECB method, using residual modal compensation that is neglected in the CB method, provides dramatic accuracy improvement of reduced matrices without an increasing number of eigenbasis. However, it also needs additional computational requirements to treat the residual flexibility. In this paper, an efficient parallelization of the ECB method is presented to handle this issue and accelerate the applicability for large-scale structural vibration problems. A new ECB formulation within a substructuring strategy is derived to achieve better scalability. The parallel implementation is based on OpenMP parallel architecture. METIS graph partitioning and Linear Algebra Package (LAPACK) are used to automated algebraic partitioning and computational linear algebra, respectively. Numerical examples are presented to evaluate the accuracy, scalability, and capability of the proposed parallel ECB method. Consequently, based on this work, one can expect effective computation of the ECB method as well as accuracy improvement.

Download Full-text

Discovery of typical subsequences of time series on graphical processor

Numerical Methods and Programming (Vychislitel'nye Metody i Programmirovanie) ◽

10.26089/nummet.v22r423 ◽

2021 ◽

pp. 344-359

Author(s):

М.Л. Цымблер ◽

А.И. Гоглачев

Keyword(s):

Time Series ◽

High Performance ◽

Life Support ◽

Technical Condition ◽

Wide Range ◽

Cuda Programming ◽

Time Series Mining ◽

Intelligent Management ◽

The Given ◽

Efficient Parallelization

Поиск типичных подпоследовательностей временного ряда является одной из актуальных задач интеллектуального анализа временных рядов. Данная задача предполагает нахождение набора подпоследовательностей временного ряда, которые адекватно отражают течение процесса или явления, задаваемого этим рядом. Поиск типичных подпоследовательностей дает возможность резюмировать и визуализировать большие временные ряды в широком спектре приложений: мониторинг технического состояния сложных машин и механизмов, интеллектуальное управление системами жизнеобеспечения, мониторинг показателей функциональной диагностики организма человека и др. Предложенная недавно концепция сниппета формализует типичную подпоследовательность временного ряда следующим образом. Сниппет представляет собой подпоследовательность, на которую похожи многие другие подпоследовательности данного ряда в смысле специализированной меры схожести, основанной на евклидовом расстоянии. Поиск типичных подпоследовательностей с помощью сниппетов показывает адекватные результаты для временных рядов из широкого спектра предметных областей, однако соответствующий алгоритм имеет высокую вычислительную сложность. В настоящей работе предложен новый параллельный алгоритм поиска сниппетов во временном ряде на графическом ускорителе. Распараллеливание выполнено с помощью технологии программирования CUDA. Разработаны структуры данных, позволяющие эффективно распараллелить вычисления на графическом процессоре. Представлены результаты вычислительных экспериментов, подтверждающих высокую производительность разработанного алгоритма. Discovery of typical subsequences in a time series is one of the topical problems of time series mining. In this problem, we are to find a set of subsequences that adequately represents the specified time series. The solution of such a problem makes it possible to summarize and visualize a large time series in a wide range of applications: monitoring of the technical condition of complex machines and mechanisms, intelligent management of life support systems, monitoring of indicators of functional diagnostics of the human body, etc. The recently proposed snippet concept formalizes a typical time series subsequence as follows. A snippet of a time series is a subsequence that many other subsequences of the given series are similar to, with respect to a specialized similarity measure based on the Euclidean distance. Despite the snippets discovery algorithm shows adequate results for time series from a wide range of subject domains, it has a high computational complexity. In this article, we propose a novel parallel algorithm for snippets discovery on GPU. Parallelization is performed through the CUDA programming technology. We developed data structures that allow for efficient parallelization of GPU calculations. The experimental results show the high performance of the proposed algorithm.

Download Full-text

HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs

The Journal of Supercomputing ◽

10.1007/s11227-021-03746-z ◽

2021 ◽

Author(s):

Hao Fu ◽

Shanjiang Tang ◽

Bingsheng He ◽

Ce Yu ◽

Jizhou Sun

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Efficient Parallelization

Download Full-text

Features of the implementation of an efficient parallel computation algorithm for modeling the icing of a swept wing with a GLC-305 airfoil

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(5)-15 ◽

2021 ◽

Vol 33 (5) ◽

pp. 249-258

Author(s):

Konstantin Borisovich Koshelev ◽

Andrei Vladimirovich Osipov ◽

Sergei Vladimirovich Strijhak

Keyword(s):

Fixed Number ◽

The Body ◽

Ice Formation ◽

Swept Wing ◽

The Third ◽

Ice Surface ◽

Calculation Results ◽

Computation Algorithm ◽

Efficient Parallelization ◽

Good Agreement

The paper considers the possibility of the ICELIB library, developed at ISP RAS, for modeling ice formation processes on the surface of aircraft. As a test example to compare the accuracy of modeling the physical processes arising during the operation of the aircraft, the surface of a swept wing with a GLC-305 profile was studied. The possibilities of an efficient parallelization algorithm using a liquid film model, a dynamic mesh, and the geometric method of bisectors are discussed. The developed ICELIB library is a collection of three solvers. The first solver iceFoam1 is intended for preliminary estimation of the icing zones of the fuselage surface and aircraft’s swept wing. The change in the geometric shape of the investigated body is neglected, the thickness of ice formation is negligible. This version of the solver has no restrictions on the number of cores when parallelizing. The second version of solver iceDyMFoam2 is designed to simulate the formation of two types of ice, smooth (“Glaze ice”) and loose (“Rime ice"), for which the shape of ice often takes on a complex and bizarre appearance. The effect of changing the shape of the body on the icing process is taken into account. The limitations are related to the peculiarities of the construction of the mesh near the boundary layer of the streamlined body. Different algorithms are used to move the front and back edges of the film, which are optimized for their cases. The performance gain is limited and is achieved with a fixed number of cores. The third version of solver iceDyMFoam3 also allows you to take into account the effect of changes in the surface of a solid during the formation of ice on the icing process itself. For the case of smooth ice formation, the latest version of the solver is still inferior in its capabilities to the second one with complex ice surface shapes. In the third version, a somewhat simplified and more uniform approach is still used to calculate the motion of both boundaries of the ice film. The estimation of the calculation results with the data of the experiment from M. Papadakis for various airfoils and swept wing for the case of “Rime ice” is carried out. Good agreement with the experimental results was obtained.

Download Full-text

The Backward Photon Mapping for the Realistic Image Rendering

10.51130/graphicon-2020-2-3-8 ◽

2020 ◽

pp. paper8-1-paper8-12

Author(s):

Dmitry Zhdanov ◽

Andrey Zhdanov

Keyword(s):

Mapping Method ◽

Single Image ◽

Photon Mapping ◽

Realistic Rendering ◽

Speed Up ◽

Realistic Image ◽

Image Pixels ◽

Realistic Image Rendering ◽

Efficient Parallelization ◽

Back Ward

The current paper is devoted to the methods of the realistic rendering methods based on the bidirectional stochastic ray tracing with photon maps. The research of the backward photon mapping method to account for both caustics and indirect illumination is presented. By using the backward photon maps authors reduced the amount of data that should be stored in the photon maps that allowed to speed up the process of the indirect luminance calculation. Methods used for constructing a tree of backward photon maps and methods of efficient parallelization used in algorithms of accumulation and forming the backward photon maps along with tracing forward and back-ward rays in the rendering process are considered. Methods to estimate the attained luminance error both for single image pixels and for the entire image with the designed rendering method are presented. The rendering results obtained with the use of the developed methods and algorithms are presented.

Download Full-text

Efficient Parallelization of a Genetic Algorithm Solution on the Traveling Salesman Problem with Multi-core and Many-core Systems

International Journal of Engineering ◽

10.5829/ije.2020.33.07a.12 ◽

2020 ◽

Vol 33 (7) ◽

Keyword(s):

Genetic Algorithm ◽

Traveling Salesman Problem ◽

Traveling Salesman ◽

The Traveling Salesman Problem ◽

Many Core ◽

Efficient Parallelization

Download Full-text

A2G2: A Python wrapper to perform very large alignments in semi-conserved regions

10.1101/2020.05.21.109009 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jose Sergio Hleap ◽

Melania E. Cristescu ◽

Dirk Steinke

Keyword(s):

Supplementary Information ◽

Command Line ◽

Reference Region ◽

Consensus Sequences ◽

Link Type ◽

Large Numbers ◽

Conserved Genes ◽

Local Reference ◽

Supplementary Material ◽

Efficient Parallelization

AbstractSummaryAmplicons to Global Gene (A2G2) is a Python wrapper that uses MAFFT and an “Amplicon to Gene” strategy to align very large numbers of sequences while improving alignment accuracy. It is specially developed to deal with conserved genes, where traditional aligners introduce a significant amount of gaps. A2G2 leverages the add sequences option of MAFFT to align the sequences to a global reference gene and a local reference region. Both of these references can be consensus sequences of trusted sources. Efficient parallelization of these tasks allows A2G2 to align a very large number of sequences (> 500K) in a reasonable amount of time. A2G2 can be imported in Python for easier integration with other software, or can be run via command line.AvailabilityA2G2 is implemented in Python 3 (3.6) and depends on MAFFT availability. Other package requirements can be found in the requirements.txt file at https://github.com/jshleap/A2G. A2G2 is also available via PyPi (https://pypi.org/project/A2G). It is licensed under the LGPLv3.Supplementary informationSupplementary material is available at github as jupyter notebook.

Download Full-text

Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

The Journal of Supercomputing ◽

10.1007/s11227-020-03308-9 ◽

2020 ◽

Author(s):

Wei-Jia He ◽

Ming-Lin Yang ◽

Wu Wang ◽

Xin-Qing Sheng

Keyword(s):

Electromagnetic Simulation ◽

Fast Multipole ◽

Multilevel Fast Multipole Algorithm ◽

Many Core ◽

Fast Multipole Algorithm ◽

Efficient Parallelization

Download Full-text

An Efficient Parallelization Strategy For The Adaptive Integral Method Based On Graph Partitioning

2020 14th European Conference on Antennas and Propagation (EuCAP) ◽

10.23919/eucap48036.2020.9135887 ◽

2020 ◽

Cited By ~ 1

Author(s):

Damian Marek ◽

Shashwat Sharma ◽

Piero Triverio

Keyword(s):

Graph Partitioning ◽

Integral Method ◽

Parallelization Strategy ◽

Efficient Parallelization

Download Full-text

efficient parallelization
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fast nanopore sequencing data analysis with SLOW5

A Parallel Approach of the Enhanced Craig–Bampton Method

Discovery of typical subsequences of time series on graphical processor

HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs

Features of the implementation of an efficient parallel computation algorithm for modeling the icing of a swept wing with a GLC-305 airfoil

The Backward Photon Mapping for the Realistic Image Rendering

Efficient Parallelization of a Genetic Algorithm Solution on the Traveling Salesman Problem with Multi-core and Many-core Systems

A2G2: A Python wrapper to perform very large alignments in semi-conserved regions

Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

An Efficient Parallelization Strategy For The Adaptive Integral Method Based On Graph Partitioning

Export Citation Format

efficient parallelizationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Fast nanopore sequencing data analysis with SLOW5

A Parallel Approach of the Enhanced Craig–Bampton Method

Discovery of typical subsequences of time series on graphical processor

HGP4CNN: an efficient parallelization framework for training convolutional neural networks on modern GPUs

Features of the implementation of an efficient parallel computation algorithm for modeling the icing of a swept wing with a GLC-305 airfoil

The Backward Photon Mapping for the Realistic Image Rendering

Efficient Parallelization of a Genetic Algorithm Solution on the Traveling Salesman Problem with Multi-core and Many-core Systems

A2G2: A Python wrapper to perform very large alignments in semi-conserved regions

Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core SW26010 processor

An Efficient Parallelization Strategy For The Adaptive Integral Method Based On Graph Partitioning

efficient parallelization
Recently Published Documents