computational kernel Latest Research Papers

Disaggregated computer architectures eliminate resource fragmentation in next-generation datacenters by enabling virtual machines to employ resources such as CPUs, memory, and accelerators that are physically located on different servers. While this paves the way for highly compute- and/or memory-intensive applications to potentially deploy all CPUs and/or memory resources in a datacenter, it poses a major challenge to the efficient deployment of hardware accelerators: input/output data can reside on different servers than the ones hosting accelerator resources, thereby requiring time- and energy-consuming remote data transfers that diminish the gains of hardware acceleration. Targeting a disaggregated datacenter architecture similar to the IBM dReDBox disaggregated datacenter prototype, the present work explores the potential of deploying custom acceleration units adjacently to the disaggregated-memory controller on memory bricks (in dReDBox terminology), which is implemented on FPGA technology, to reduce data movement and improve performance and energy efficiency when reconstructing large phylogenies (evolutionary relationships among organisms). A fundamental computational kernel is the Phylogenetic Likelihood Function (PLF), which dominates the total execution time (up to 95%) of widely used maximum-likelihood methods. Numerous efforts to boost PLF performance over the years focused on accelerating computation; since the PLF is a data-intensive, memory-bound operation, performance remains limited by data movement, and memory disaggregation only exacerbates the problem. We describe two near-memory processing models, one that addresses the problem of workload distribution to memory bricks, which is particularly tailored toward larger genomes (e.g., plants and mammals), and one that reduces overall memory requirements through memory-side data interpolation transparently to the application, thereby allowing the phylogeny size to scale to a larger number of organisms without requiring additional memory.

Download Full-text

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

ACM Transactions on Embedded Computing Systems ◽

10.1145/3476982 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-24

Author(s):

Daniele Parravicini ◽

Davide Conficconi ◽

Emanuele Del Sozzo ◽

Christian Pilato ◽

Marco D. Santambrogio

Keyword(s):

Language Processing ◽

Regular Expression ◽

Hardware Acceleration ◽

Domain Specific ◽

Programmable Architecture ◽

Regular Expression Matching ◽

Intrinsic Parallelism ◽

Finite State ◽

Computational Kernel ◽

Compilation Framework

Regular Expression (RE) matching is a computational kernel used in several applications. Since RE complexity and data volumes are steadily increasing, hardware acceleration is gaining attention also for this problem. Existing approaches have limited flexibility as they require a different implementation for each RE. On the other hand, it is complex to map efficient RE representations like non-deterministic finite-state automata onto software-programmable engines or parallel architectures. In this work, we present CICERO , an end-to-end framework composed of a domain-specific architecture and a companion compilation framework for RE matching. Our solution is suitable for many applications, such as genomics/proteomics and natural language processing. CICERO aims at exploiting the intrinsic parallelism of non-deterministic representations of the REs. CICERO can trade-off accelerators’ efficiency and processors’ flexibility thanks to its programmable architecture and the compilation framework. We implemented CICERO prototypes on embedded FPGA achieving up to 28.6× and 20.8× more energy efficiency than embedded and mainstream processors, respectively. Since it is a programmable architecture, it can be implemented as a custom ASIC that is orders of magnitude more energy-efficient than mainstream processors.

Download Full-text

Conditional Optimization of the Functional Computational Kernel Algorithm for Approximating the Probability Density on the Basis of a Given Sample

Computational Mathematics and Mathematical Physics ◽

10.1134/s0965542521090062 ◽

2021 ◽

Vol 61 (9) ◽

pp. 1401-1415

Author(s):

T. E. Bulgakova ◽

A. V. Voytishek

Keyword(s):

Probability Density ◽

Computational Kernel ◽

Conditional Optimization

Download Full-text

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Journal of Chemical Theory and Computation ◽

10.1021/acs.jctc.1c00145 ◽

2021 ◽

Author(s):

Madushanka Manathunga ◽

Chi Jin ◽

Vinícius Wilian D. Cruzeiro ◽

Yipu Miao ◽

Dawei Mu ◽

...

Keyword(s):

Gpu Acceleration ◽

Computational Kernel

Download Full-text

Open-Source Multi-GPU-Accelerated QM/MM Simulations with AMBER and QUICK

10.26434/chemrxiv.13984028.v1 ◽

2021 ◽

Author(s):

Vinicius Cruzeiro ◽

Madushanka Manathunga ◽

Kenneth M. Merz, Jr. ◽

Andreas Goetz

Keyword(s):

Open Source ◽

Application Programming Interface ◽

Electronic Structure Calculations ◽

Bulk Water ◽

Photoactive Yellow Protein ◽

Multiple Gpus ◽

Application Programming ◽

Computational Kernel ◽

Structure Calculations ◽

Programming Interface

<div><div><div><p>The quantum mechanics/molecular mechanics (QM/MM) approach is an essential and well-established tool in computational chemistry that has been widely applied in a myriad of biomolecular problems in the literature. In this publication, we report the integration of the QUantum Interaction Computational Kernel (QUICK) program as an engine to perform electronic structure calculations in QM/MM simulations with AMBER. This integration is available through either a file-based interface (FBI) or an application programming interface (API). Since QUICK is an open-source GPU-accelerated code with multi-GPU parallelization, users can take advantage of “free of charge” GPU-acceleration in their QM/MM simulations. In this work, we discuss implementation details and give usage examples. We also investigate energy conservation in typical QM/MM simulations performed at the microcanonical ensemble. Finally, benchmark results for two representative systems, the N-methylacetamide (NMA) molecule and the photoactive yellow protein (PYP) in bulk water, show the performance of QM/MM simulations with QUICK and AMBER using a varying number of CPU cores and GPUs. Our results highlight the acceleration obtained from a single or multiple GPUs; we observed speedups of up to 38x between a single GPU vs. a single CPU core and of up to 2.6x when comparing four GPUs to a single GPU. Results also reveal speedups of up to 3.5x when the API is used instead of FBI.</p></div></div></div>

Download Full-text

Open-Source Multi-GPU-Accelerated QM/MM Simulations with AMBER and QUICK

10.26434/chemrxiv.13984028 ◽

2021 ◽

Author(s):

Vinicius Cruzeiro ◽

Madushanka Manathunga ◽

Kenneth M. Merz, Jr. ◽

Andreas Goetz

Keyword(s):

Open Source ◽

Application Programming Interface ◽

Electronic Structure Calculations ◽

Bulk Water ◽

Photoactive Yellow Protein ◽

Multiple Gpus ◽

Application Programming ◽

Computational Kernel ◽

Structure Calculations ◽

Programming Interface

<div><div><div><p>The quantum mechanics/molecular mechanics (QM/MM) approach is an essential and well-established tool in computational chemistry that has been widely applied in a myriad of biomolecular problems in the literature. In this publication, we report the integration of the QUantum Interaction Computational Kernel (QUICK) program as an engine to perform electronic structure calculations in QM/MM simulations with AMBER. This integration is available through either a file-based interface (FBI) or an application programming interface (API). Since QUICK is an open-source GPU-accelerated code with multi-GPU parallelization, users can take advantage of “free of charge” GPU-acceleration in their QM/MM simulations. In this work, we discuss implementation details and give usage examples. We also investigate energy conservation in typical QM/MM simulations performed at the microcanonical ensemble. Finally, benchmark results for two representative systems, the N-methylacetamide (NMA) molecule and the photoactive yellow protein (PYP) in bulk water, show the performance of QM/MM simulations with QUICK and AMBER using a varying number of CPU cores and GPUs. Our results highlight the acceleration obtained from a single or multiple GPUs; we observed speedups of up to 38x between a single GPU vs. a single CPU core and of up to 2.6x when comparing four GPUs to a single GPU. Results also reveal speedups of up to 3.5x when the API is used instead of FBI.</p></div></div></div>

Download Full-text

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

10.26434/chemrxiv.13769209 ◽

2021 ◽

Author(s):

Madushanka Manathunga ◽

Chi Jin ◽

Vinicius Cruzeiro ◽

Yipu Miao ◽

Dawei Mu ◽

...

Keyword(s):

Load Balancing ◽

Ab Initio ◽

Density Functional ◽

Large Scale ◽

Functional Theory ◽

Multiple Gpus ◽

Hartree Fock ◽

Large Size ◽

Electron Repulsion Integrals ◽

Computational Kernel

<div><div><div><p>We report a new multi-GPU capable ab initio Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to 4 GPU nodes, each containing 4 NVIDIA V100-SMX2 type GPUs demonstrate that our implementation is capable of achiev- ing excellent load balancing and high parallel efficiency. For representative medium to large size protein/organic molecular sys- tems, the observed efficiencies remained above 86%. The accelerations on NVIDIA A100, P100 and K80 platforms also have real- ized parallel efficiencies higher than 74%, paving the way for large-scale ab initio electronic structure calculations.</p></div></div></div>

Download Full-text

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

10.26434/chemrxiv.13769209.v1 ◽

2021 ◽

Author(s):

Madushanka Manathunga ◽

Chi Jin ◽

Vinicius Cruzeiro ◽

Yipu Miao ◽

Dawei Mu ◽

...

Keyword(s):

Load Balancing ◽

Ab Initio ◽

Density Functional ◽

Large Scale ◽

Functional Theory ◽

Multiple Gpus ◽

Hartree Fock ◽

Large Size ◽

Electron Repulsion Integrals ◽

Computational Kernel

<div><div><div><p>We report a new multi-GPU capable ab initio Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to 4 GPU nodes, each containing 4 NVIDIA V100-SMX2 type GPUs demonstrate that our implementation is capable of achiev- ing excellent load balancing and high parallel efficiency. For representative medium to large size protein/organic molecular sys- tems, the observed efficiencies remained above 86%. The accelerations on NVIDIA A100, P100 and K80 platforms also have real- ized parallel efficiencies higher than 74%, paving the way for large-scale ab initio electronic structure calculations.</p></div></div></div>

Download Full-text

Parallel Implementation of Density Functional Theory Methods in the Quantum Interaction Computational Kernel Program

Journal of Chemical Theory and Computation ◽

10.1021/acs.jctc.0c00290 ◽

2020 ◽

Vol 16 (7) ◽

pp. 4315-4326 ◽

Cited By ~ 2

Author(s):

Madushanka Manathunga ◽

Yipu Miao ◽

Dawei Mu ◽

Andreas W. Götz ◽

Kenneth M. Merz

Keyword(s):

Density Functional Theory ◽

Density Functional ◽

Parallel Implementation ◽

Functional Theory ◽

Computational Kernel

Download Full-text

Parallel Implementation of Density Functional Theory Methods in the Quantum Interaction Computational Kernel Program

10.26434/chemrxiv.12018963.v1 ◽

2020 ◽

Author(s):

Madushanka Manathunga ◽

Yipu Miao ◽

Dawei Mu ◽

Andreas Goetz ◽

Kenneth M. Merz Jr.

Keyword(s):

Density Functional Theory ◽

Density Functional ◽

Parallel Implementation ◽

Grid Point ◽

Density Functional Theory Calculations ◽

Medium Size ◽

Double Precision ◽

Functional Theory ◽

Gradient Algorithms ◽

Computational Kernel

<div> <div> <div> <p>We present the details of a GPU capable exchange correlation (XC) scheme integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Our implementation features an octree based numerical grid point partitioning scheme, GPU enabled grid pruning and basis/primitive function prescreening and fully GPU capable XC energy and gradient algorithms. Benchmarking against the CPU version demonstrated that the GPU implementation is capable of delivering an impres- sive performance while retaining excellent accuracy. For small to medium size protein/organic molecular systems, the realized speedups in double precision XC energy and gradient computation on a NVIDIA V100 GPU were 60 to 80-fold and 140 to 780- fold respectively as compared to the serial CPU implementation. The acceleration gained in density functional theory calculations from a single V100 GPU significantly exceeds that of a modern CPU with 40 cores running in parallel. </p> </div> </div> </div>

Download Full-text

computational kernel
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Scalable Phylogeny Reconstruction with Disaggregated Near-memory Processing

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

Conditional Optimization of the Functional Computational Kernel Algorithm for Approximating the Probability Density on the Basis of a Given Sample

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Open-Source Multi-GPU-Accelerated QM/MM Simulations with AMBER and QUICK

Open-Source Multi-GPU-Accelerated QM/MM Simulations with AMBER and QUICK

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Parallel Implementation of Density Functional Theory Methods in the Quantum Interaction Computational Kernel Program

Parallel Implementation of Density Functional Theory Methods in the Quantum Interaction Computational Kernel Program

Export Citation Format

computational kernelRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Scalable Phylogeny Reconstruction with Disaggregated Near-memory Processing

CICERO: A Domain-Specific Architecture for Efficient Regular Expression Matching

Conditional Optimization of the Functional Computational Kernel Algorithm for Approximating the Probability Density on the Basis of a Given Sample

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Open-Source Multi-GPU-Accelerated QM/MM Simulations with AMBER and QUICK

Open-Source Multi-GPU-Accelerated QM/MM Simulations with AMBER and QUICK

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Harnessing the Power of Multi-GPU Acceleration into the Quantum Interaction Computational Kernel Program

Parallel Implementation of Density Functional Theory Methods in the Quantum Interaction Computational Kernel Program

Parallel Implementation of Density Functional Theory Methods in the Quantum Interaction Computational Kernel Program

computational kernel
Recently Published Documents