Study of Fine-grained Nested Parallelism in CDCL SAT Solvers

Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers, which all happen to be of the so-called Conflict-Directed Clause Learning variety. We show the potential for speedups of up to 382× across a variety of problem instances. We hope that these results will stimulate future research, particularly with respect to a computer architecture open problem we present.

Download Full-text

Balanced Sparsity for Efficient DNN Inference on GPU

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015676 ◽

2019 ◽

Vol 33 ◽

pp. 5676-5683 ◽

Cited By ~ 3

Author(s):

Zhuliang Yao ◽

Shijie Cao ◽

Wencong Xiao ◽

Chen Zhang ◽

Lanshun Nie

Keyword(s):

Deep Neural Networks ◽

General Purpose ◽

Coarse Grained ◽

Efficient Computation ◽

Model Accuracy ◽

Sparse Model ◽

Model Inference ◽

Fine Grained ◽

Practical Inference ◽

Speed Up

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.

Download Full-text

APPLICATION OF NOVEL CLONAL ALGORITHM IN MULTIOBJECTIVE OPTIMIZATION

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010003804 ◽

2010 ◽

Vol 09 (02) ◽

pp. 239-266 ◽

Cited By ~ 19

Author(s):

JIANYONG CHEN ◽

QIUZHEN LIN ◽

QINGBIN HU

Keyword(s):

Multiobjective Optimization ◽

Coarse Grained ◽

Pareto Optimal ◽

Pareto Optimal Front ◽

Fine Grained ◽

Initial Stage ◽

Speed Up ◽

Main Notion ◽

Hybrid Mutation Operator ◽

Cooling Schedule

In this paper, a novel clonal algorithm applied in multiobjecitve optimization (NCMO) is presented, which is designed from the improvement of search operators, i.e. dynamic mutation probability, dynamic simulated binary crossover (D-SBX) operator and hybrid mutation operator combining with Gaussian and polynomial mutations (GP-HM) operator. The main notion of these approaches is to perform more coarse-grained search at initial stage in order to speed up the convergence toward the Pareto-optimal front. Once the solutions are getting close to the Pareto-optimal front, more fine-grained search is performed in order to reduce the gaps between the solutions and the Pareto-optimal front. Based on this purpose, a cooling schedule is adopted in these approaches, reducing the parameters gradually to a minimal threshold, the aim of which is to keep a desirable balance between fine-grained search and coarse-grained search. By this means, the exploratory capabilities of NCMO are enhanced. When compared with various state-of-the-art multiobjective optimization algorithms developed recently, simulation results show that NCMO has remarkable performance.

Download Full-text

An Experimental Analysis of a New Interval-Based Mutation Operator

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026815500182 ◽

2015 ◽

Vol 14 (03) ◽

pp. 1550018 ◽

Cited By ~ 3

Author(s):

K. Liagkouras ◽

K. Metaxiotis

Keyword(s):

Experimental Analysis ◽

Pareto Front ◽

Computational Experiments ◽

Coarse Grained ◽

Mutation Operator ◽

Fine Grained ◽

Initial Stage ◽

Speed Up ◽

Better Than

In this paper, we present a novel Interval-Based Mutation (IBMU) operator. The proposed mutation operator is performing coarse-grained search at initial stage in order to speed up convergence toward more promising regions of the search landscape. Then, more fine-grained search is performed in order to guide the solutions towards the Pareto front. Computational experiments indicate that the proposed mutation operator performs better than conventional approaches for solving several well-known benchmarking problems.

Download Full-text

Efficient parallelization of perturbative Monte Carlo QM/MM simulations in heterogeneous platforms

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016649420 ◽

2016 ◽

Vol 31 (6) ◽

pp. 499-516 ◽

Cited By ~ 1

Author(s):

Sebastião Miranda ◽

Jonas Feldt ◽

Frederico Pratas ◽

Ricardo A Mata ◽

Nuno Roma ◽

...

Keyword(s):

Monte Carlo ◽

Heterogeneous Systems ◽

Coarse Grained ◽

Molecular Systems ◽

Fine Grained ◽

Central Processing ◽

Speed Up ◽

Graphical Processing ◽

Computational Bottleneck ◽

The Cost

A novel perturbative Monte Carlo mixed quantum mechanics (QM)/molecular mechanics (MM) approach has been recently developed to simulate molecular systems in complex environments. However, the required accuracy to efficiently simulate such complex molecular systems is usually granted at the cost of long executing times. To alleviate this problem, a new parallelization strategy of multi-level Monte Carlo molecular simulations is herein proposed for heterogeneous systems. It simultaneously exploits fine-grained (at the data level), coarse-grained (at the Markov chain level) and task-grained (pure QM, pure MM and QM/MM procedures) parallelism to ensure an efficient execution in heterogeneous systems composed of central processing units and multiple and possibly different graphical processing units. This is achieved by making use of the OpenCL library, together with appropriate dynamic load balancing schemes. From the conducted evaluation with real benchmarking data, a speed-up of 56x in the computational bottleneck part was observed, which results in a global speed-up of 38x for the whole simulation, reducing the time of a typical simulation from 80 hours to only 2 hours.

Download Full-text

Some language issues in high performance computing: translation from fine-grained parallelism to coarse-grained parallelism.

10.2172/882922 ◽

2006 ◽

Author(s):

Susan Phelps Goudy ◽

Zhaofang Wen ◽

Shan Shan Huang

Keyword(s):

High Performance Computing ◽

High Performance ◽

Coarse Grained ◽

Fine Grained ◽

Fine Grained Parallelism ◽

Performance Computing ◽

Language Issues

Download Full-text

Collaborative Decision of Production Plan and Pricing in Cogeneration System Supply Chain: A Literature Review and Future Research Ideas

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.988.695 ◽

2014 ◽

Vol 988 ◽

pp. 695-701

Author(s):

Hai Jie Yu ◽

Sheng Su

Keyword(s):

Supply Chain ◽

Literature Review ◽

Production Scheduling ◽

Coarse Grained ◽

Future Research ◽

Production Plan ◽

Fine Grained ◽

Cogeneration System ◽

Main Ideas ◽

Collaborative Decision

We summarized corresponding literatures of collaborative decision of pricing and scheduling from coarse-grained and fine-grained views, then we found that collaborative decision of pricing and production scheduling can increase more than 30% of the total profits. Main ideas and research frame of the problem are proposed.

Download Full-text

The Architecture of Inference from SMT to ETB

10.29007/1tc6 ◽

2018 ◽

Author(s):

Natarajan Shankar

Keyword(s):

Logical Reasoning ◽

Coarse Grained ◽

Prior Work ◽

High Quality ◽

Diverse Range ◽

Sat Solvers ◽

Fine Grained ◽

Interactive Theorem Provers ◽

Smt Solvers ◽

Distributed Platform

Modularity plays a central role in logical reasoning. We want to beable to reuse proofs, proof patterns, theories, and specializedreasoning procedures. Architectures that support modularity have beendeveloped at all levels of inference: SAT solvers, theory solvers,combination solvers and rewriters, SMT solvers, simplifiers, rewriters,and tactics-based interactive theorem provers. Prior work has mostlyfocused on fine-grained modular inference. However, with theavailability of a diverse range of high-quality inference tools, it hasbecome important to systematically integrate these big components intorobust toolchains. At SRI, we have been developing a framework calledthe Evidential Tool Bus (ETB) as a distributed platform for thecoarse-grained integration of inference components into flexible,scriptable workflows. The talk describes the architecture of ETB alongwith some motivating applications.

Download Full-text

Distributed Rendering Based on Fine-Grained and Coarse-Grained Strategy to Speed up Time and Increase Efficiency of Rendering Process

ComTech Computer Mathematics and Engineering Applications ◽

10.21512/comtech.v10i1.5067 ◽

2019 ◽

Vol 10 (1) ◽

pp. 15

Author(s):

Melki Sadekh Mansuan ◽

Benfano Soewito

Keyword(s):

Performance Metrics ◽

Descriptive Analysis ◽

Coarse Grained ◽

Primary Data ◽

Computer Laboratory ◽

Fine Grained ◽

Parallel Performance ◽

Parallel Decomposition ◽

Speed Up ◽

Distributed Rendering

The purpose of this research was to solve several problems in the rendering process such as slow rendering time and complex calculations, which caused inefficient rendering. This research analyzed the efficiency in the rendering process. This research was an experimental study by implementing a distributed rendering system with fine-grained and coarse-grained parallel decomposition in computer laboratory. The primary data used was the rendering time obtained from the rendering process of three scenes animation. Descriptive analysis method was used to compare performance using speedup and efficiency of parallel performance metrics. The results show that the distributed rendering method succeeds in increasing the rendering speed with speedup value of 9,43. Moreover, the efficiency of processor use is 94% when it is applied to solve the problem of slow rendering time in the rendering process.

Download Full-text

Multibody Molecular Dynamics II: Applications and Results

Volume 5: 6th International Conference on Multibody Systems, Nonlinear Dynamics, and Control, Parts A, B, and C ◽

10.1115/detc2007-35561 ◽

2007 ◽

Author(s):

Rudranarayan M. Mukherjee ◽

Paul Crozier ◽

Kurt S. Anderson

Keyword(s):

Molecular Dynamics ◽

Multibody Dynamics ◽

Degrees Of Freedom ◽

Coarse Grained ◽

Future Research ◽

Conservation Of Energy ◽

Order Of Magnitude ◽

Speed Up ◽

The Stability ◽

Dynamics Simulations

This is the second paper in a series of two papers on using multibody dynamics algorithms and methods for coarse grained molecular dynamics simulations. In the previous paper, the theoretical discussions on this topic have been presented. This paper presents results obtained from simulating several biomolecular and bulk materials using multibody dynamics algorithms. The systems studied include water boxes, alkane chains, alanine dipeptide and carboxyl terminal fragments of Calmodulin, Ribosomal, and Rhodopsin proteins. The atomistic representations of these systems include several thousand degrees of freedom and results of several nano-second simulations of these systems are presented. The stability and validity of the simulations are studied through conservation of energy, thermodynamics properties and conformational analysis. In these simulations, a speed up of an order of magnitude is realized for conservative error bounds. A discussion is presented on the open-source software developed to facilitate future research using multibody dynamics with molecular dynamics.

Download Full-text

Global Scheduling Heuristics for Multicore Architecture

Scientific Programming ◽

10.1155/2015/860891 ◽

2015 ◽

Vol 2015 ◽

pp. 1-12

Author(s):

D. C. Kiran ◽

S. Gurunarayanan ◽

Janardan Prasad Misra ◽

Abhijeet Nawal

Keyword(s):

Multicore Processors ◽

Communication Overhead ◽

Correct Order ◽

Multicore Architecture ◽

Global Scheduling ◽

Fine Grained ◽

Sequential Program ◽

Hardware Overhead ◽

Speed Up ◽

Fine Grained Parallelism

This work discusses various compiler level global scheduling techniques for multicore processors. The main contribution of the work is to delegate the job of exploiting fine grained parallelism to the compiler, thereby reducing the hardware overhead and the programming complexity. This goal is achieved by decomposing a sequential program into multiple subblocks and constructing subblock dependency graph (SDG). The proposed schedulers select subblocks from the SDG and schedules it on different cores, by ensuring the correct order of execution of subblocks. In conjunction with parallelization techniques, locality optimizations are performed to minimize communication overhead between the cores. The results observed are indicative of better and balanced speed-up per watt.

Download Full-text