computational kernel
Recently Published Documents


TOTAL DOCUMENTS

32
(FIVE YEARS 11)

H-INDEX

7
(FIVE YEARS 1)

2022 ◽  
Vol 15 (3) ◽  
pp. 1-32
Author(s):  
Nikolaos Alachiotis ◽  
Panagiotis Skrimponis ◽  
Manolis Pissadakis ◽  
Dionisios Pnevmatikatos

Disaggregated computer architectures eliminate resource fragmentation in next-generation datacenters by enabling virtual machines to employ resources such as CPUs, memory, and accelerators that are physically located on different servers. While this paves the way for highly compute- and/or memory-intensive applications to potentially deploy all CPUs and/or memory resources in a datacenter, it poses a major challenge to the efficient deployment of hardware accelerators: input/output data can reside on different servers than the ones hosting accelerator resources, thereby requiring time- and energy-consuming remote data transfers that diminish the gains of hardware acceleration. Targeting a disaggregated datacenter architecture similar to the IBM dReDBox disaggregated datacenter prototype, the present work explores the potential of deploying custom acceleration units adjacently to the disaggregated-memory controller on memory bricks (in dReDBox terminology), which is implemented on FPGA technology, to reduce data movement and improve performance and energy efficiency when reconstructing large phylogenies (evolutionary relationships among organisms). A fundamental computational kernel is the Phylogenetic Likelihood Function (PLF), which dominates the total execution time (up to 95%) of widely used maximum-likelihood methods. Numerous efforts to boost PLF performance over the years focused on accelerating computation; since the PLF is a data-intensive, memory-bound operation, performance remains limited by data movement, and memory disaggregation only exacerbates the problem. We describe two near-memory processing models, one that addresses the problem of workload distribution to memory bricks, which is particularly tailored toward larger genomes (e.g., plants and mammals), and one that reduces overall memory requirements through memory-side data interpolation transparently to the application, thereby allowing the phylogeny size to scale to a larger number of organisms without requiring additional memory.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-24
Author(s):  
Daniele Parravicini ◽  
Davide Conficconi ◽  
Emanuele Del Sozzo ◽  
Christian Pilato ◽  
Marco D. Santambrogio

Regular Expression (RE) matching is a computational kernel used in several applications. Since RE complexity and data volumes are steadily increasing, hardware acceleration is gaining attention also for this problem. Existing approaches have limited flexibility as they require a different implementation for each RE. On the other hand, it is complex to map efficient RE representations like non-deterministic finite-state automata onto software-programmable engines or parallel architectures. In this work, we present CICERO , an end-to-end framework composed of a domain-specific architecture and a companion compilation framework for RE matching. Our solution is suitable for many applications, such as genomics/proteomics and natural language processing. CICERO aims at exploiting the intrinsic parallelism of non-deterministic representations of the REs. CICERO can trade-off accelerators’ efficiency and processors’ flexibility thanks to its programmable architecture and the compilation framework. We implemented CICERO prototypes on embedded FPGA achieving up to 28.6× and 20.8× more energy efficiency than embedded and mainstream processors, respectively. Since it is a programmable architecture, it can be implemented as a custom ASIC that is orders of magnitude more energy-efficient than mainstream processors.


Author(s):  
Madushanka Manathunga ◽  
Chi Jin ◽  
Vinícius Wilian D. Cruzeiro ◽  
Yipu Miao ◽  
Dawei Mu ◽  
...  

2021 ◽  
Author(s):  
Vinicius Cruzeiro ◽  
Madushanka Manathunga ◽  
Kenneth M. Merz, Jr. ◽  
Andreas Goetz

<div><div><div><p>The quantum mechanics/molecular mechanics (QM/MM) approach is an essential and well-established tool in computational chemistry that has been widely applied in a myriad of biomolecular problems in the literature. In this publication, we report the integration of the QUantum Interaction Computational Kernel (QUICK) program as an engine to perform electronic structure calculations in QM/MM simulations with AMBER. This integration is available through either a file-based interface (FBI) or an application programming interface (API). Since QUICK is an open-source GPU-accelerated code with multi-GPU parallelization, users can take advantage of “free of charge” GPU-acceleration in their QM/MM simulations. In this work, we discuss implementation details and give usage examples. We also investigate energy conservation in typical QM/MM simulations performed at the microcanonical ensemble. Finally, benchmark results for two representative systems, the N-methylacetamide (NMA) molecule and the photoactive yellow protein (PYP) in bulk water, show the performance of QM/MM simulations with QUICK and AMBER using a varying number of CPU cores and GPUs. Our results highlight the acceleration obtained from a single or multiple GPUs; we observed speedups of up to 38x between a single GPU vs. a single CPU core and of up to 2.6x when comparing four GPUs to a single GPU. Results also reveal speedups of up to 3.5x when the API is used instead of FBI.</p></div></div></div>


2021 ◽  
Author(s):  
Vinicius Cruzeiro ◽  
Madushanka Manathunga ◽  
Kenneth M. Merz, Jr. ◽  
Andreas Goetz

<div><div><div><p>The quantum mechanics/molecular mechanics (QM/MM) approach is an essential and well-established tool in computational chemistry that has been widely applied in a myriad of biomolecular problems in the literature. In this publication, we report the integration of the QUantum Interaction Computational Kernel (QUICK) program as an engine to perform electronic structure calculations in QM/MM simulations with AMBER. This integration is available through either a file-based interface (FBI) or an application programming interface (API). Since QUICK is an open-source GPU-accelerated code with multi-GPU parallelization, users can take advantage of “free of charge” GPU-acceleration in their QM/MM simulations. In this work, we discuss implementation details and give usage examples. We also investigate energy conservation in typical QM/MM simulations performed at the microcanonical ensemble. Finally, benchmark results for two representative systems, the N-methylacetamide (NMA) molecule and the photoactive yellow protein (PYP) in bulk water, show the performance of QM/MM simulations with QUICK and AMBER using a varying number of CPU cores and GPUs. Our results highlight the acceleration obtained from a single or multiple GPUs; we observed speedups of up to 38x between a single GPU vs. a single CPU core and of up to 2.6x when comparing four GPUs to a single GPU. Results also reveal speedups of up to 3.5x when the API is used instead of FBI.</p></div></div></div>


2021 ◽  
Author(s):  
Madushanka Manathunga ◽  
Chi Jin ◽  
Vinicius Cruzeiro ◽  
Yipu Miao ◽  
Dawei Mu ◽  
...  

<div><div><div><p>We report a new multi-GPU capable ab initio Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to 4 GPU nodes, each containing 4 NVIDIA V100-SMX2 type GPUs demonstrate that our implementation is capable of achiev- ing excellent load balancing and high parallel efficiency. For representative medium to large size protein/organic molecular sys- tems, the observed efficiencies remained above 86%. The accelerations on NVIDIA A100, P100 and K80 platforms also have real- ized parallel efficiencies higher than 74%, paving the way for large-scale ab initio electronic structure calculations.</p></div></div></div>


2021 ◽  
Author(s):  
Madushanka Manathunga ◽  
Chi Jin ◽  
Vinicius Cruzeiro ◽  
Yipu Miao ◽  
Dawei Mu ◽  
...  

<div><div><div><p>We report a new multi-GPU capable ab initio Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to 4 GPU nodes, each containing 4 NVIDIA V100-SMX2 type GPUs demonstrate that our implementation is capable of achiev- ing excellent load balancing and high parallel efficiency. For representative medium to large size protein/organic molecular sys- tems, the observed efficiencies remained above 86%. The accelerations on NVIDIA A100, P100 and K80 platforms also have real- ized parallel efficiencies higher than 74%, paving the way for large-scale ab initio electronic structure calculations.</p></div></div></div>


2020 ◽  
Author(s):  
Madushanka Manathunga ◽  
Yipu Miao ◽  
Dawei Mu ◽  
Andreas Goetz ◽  
Kenneth M. Merz Jr.

<div> <div> <div> <p>We present the details of a GPU capable exchange correlation (XC) scheme integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Our implementation features an octree based numerical grid point partitioning scheme, GPU enabled grid pruning and basis/primitive function prescreening and fully GPU capable XC energy and gradient algorithms. Benchmarking against the CPU version demonstrated that the GPU implementation is capable of delivering an impres- sive performance while retaining excellent accuracy. For small to medium size protein/organic molecular systems, the realized speedups in double precision XC energy and gradient computation on a NVIDIA V100 GPU were 60 to 80-fold and 140 to 780- fold respectively as compared to the serial CPU implementation. The acceleration gained in density functional theory calculations from a single V100 GPU significantly exceeds that of a modern CPU with 40 cores running in parallel. </p> </div> </div> </div>


Sign in / Sign up

Export Citation Format

Share Document