Global Scheduling Heuristics for Multicore Architecture

This work discusses various compiler level global scheduling techniques for multicore processors. The main contribution of the work is to delegate the job of exploiting fine grained parallelism to the compiler, thereby reducing the hardware overhead and the programming complexity. This goal is achieved by decomposing a sequential program into multiple subblocks and constructing subblock dependency graph (SDG). The proposed schedulers select subblocks from the SDG and schedules it on different cores, by ensuring the correct order of execution of subblocks. In conjunction with parallelization techniques, locality optimizations are performed to minimize communication overhead between the cores. The results observed are indicative of better and balanced speed-up per watt.

Download Full-text

Study of Fine-grained Nested Parallelism in CDCL SAT Solvers

ACM Transactions on Parallel Computing ◽

10.1145/3470639 ◽

2021 ◽

Vol 8 (3) ◽

pp. 1-18

Author(s):

James Edwards ◽

Uzi Vishkin

Keyword(s):

Computer Architecture ◽

Coarse Grained ◽

Future Research ◽

Sat Solvers ◽

Fine Grained ◽

Nested Parallelism ◽

Clause Learning ◽

Speed Up ◽

Fine Grained Parallelism ◽

Problem Instances

Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers, which all happen to be of the so-called Conflict-Directed Clause Learning variety. We show the potential for speedups of up to 382× across a variety of problem instances. We hope that these results will stimulate future research, particularly with respect to a computer architecture open problem we present.

Download Full-text

Balanced Sparsity for Efficient DNN Inference on GPU

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015676 ◽

2019 ◽

Vol 33 ◽

pp. 5676-5683 ◽

Cited By ~ 3

Author(s):

Zhuliang Yao ◽

Shijie Cao ◽

Wencong Xiao ◽

Chen Zhang ◽

Lanshun Nie

Keyword(s):

Deep Neural Networks ◽

General Purpose ◽

Coarse Grained ◽

Efficient Computation ◽

Model Accuracy ◽

Sparse Model ◽

Model Inference ◽

Fine Grained ◽

Practical Inference ◽

Speed Up

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.

Download Full-text

A cooperative DDoS attack detection scheme based on entropy and ensemble learning in SDN

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-021-01957-9 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Shanshan Yu ◽

Jicheng Zhang ◽

Ju Liu ◽

Xiaoqing Zhang ◽

Yafeng Li ◽

...

Keyword(s):

Ensemble Learning ◽

Denial Of Service ◽

Attack Detection ◽

Coarse Grained ◽

Communication Overhead ◽

Detection Scheme ◽

Fine Grained ◽

Ddos Attack ◽

Network Status ◽

Ddos Attack Detection

AbstractIn order to solve the problem of distributed denial of service (DDoS) attack detection in software-defined network, we proposed a cooperative DDoS attack detection scheme based on entropy and ensemble learning. This method sets up a coarse-grained preliminary detection module based on entropy in the edge switch to monitor the network status in real time and report to the controller if any abnormality is found. Simultaneously, a fine-grained precise attack detection module is designed in the controller, and a ensemble learning-based algorithm is utilized to further identify abnormal traffic accurately. In this framework, the idle computing capability of edge switches is fully utilized with the design idea of edge computing to offload part of the detection task from the control plane to the data plane innovatively. Simulation results of two common DDoS attack methods, ICMP and SYN, show that the system can effectively detect DDoS attacks and greatly reduce the southbound communication overhead and the burden of the controller as well as the detection delay of the attacks.

Download Full-text

Real-Time Support in the Proposal for Fine-Grained Parallelism in Ada

2015 IEEE Real-Time Systems Symposium ◽

10.1109/rtss.2015.43 ◽

2015 ◽

Author(s):

Luis Miguel Pinho ◽

Brad Moore ◽

Stephen Michell ◽

S. Tucker Taft

Keyword(s):

Real Time ◽

Fine Grained ◽

Fine Grained Parallelism

Download Full-text

Tasklettes – A Fine Grained Parallelism for Ada on Multicores

Reliable Software Technologies – Ada-Europe 2013 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-38601-5_2 ◽

2013 ◽

pp. 17-34 ◽

Cited By ~ 9

Author(s):

Stephen Michell ◽

Brad Moore ◽

Luís Miguel Pinho

Keyword(s):

Fine Grained ◽

Fine Grained Parallelism

Download Full-text

Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015693 ◽

2019 ◽

Vol 33 ◽

pp. 5693-5700 ◽

Cited By ~ 16

Author(s):

Hao Yu ◽

Sen Yang ◽

Shenghuo Zhu

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Model Averaging ◽

Communication Overhead ◽

Single Server ◽

Training Time ◽

Distributed Training ◽

Speed Up ◽

Experimental Works ◽

Single Worker

In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with the averaged gradient. Ideally, parallel mini-batch SGD can achieve a linear speed-up of the training time (with respect to the number of workers) compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for gradient communication as more workers are involved. Model averaging, which periodically averages individual models trained over parallel workers, is another common practice used for distributed training of deep neural networks since (Zinkevich et al. 2010) (McDonald, Hall, and Mann 2010). Compared with parallel mini-batch SGD, the communication overhead of model averaging is significantly reduced. Impressively, tremendous experimental works have verified that model averaging can still achieve a good speed-up of the training time as long as the averaging interval is carefully controlled. However, it remains a mystery in theory why such a simple heuristic works so well. This paper provides a thorough and rigorous theoretical study on why model averaging can work as well as parallel mini-batch SGD with significantly less communication overhead.

Download Full-text

Lightweight Fine-Grained Access Control for Wireless Body Area Networks

Sensors ◽

10.3390/s20041088 ◽

2020 ◽

Vol 20 (4) ◽

pp. 1088 ◽

Cited By ~ 2

Author(s):

Mohammad Ali ◽

Mohammad-Reza Sadeghi ◽

Ximeng Liu

Keyword(s):

Access Control ◽

Wireless Body Area Network ◽

Control Policy ◽

Cloud Service ◽

Communication Overhead ◽

Area Network ◽

Sensitive Information ◽

Health Providers ◽

Body Area ◽

Fine Grained

Wireless Body Area Network (WBAN) is a highly promising technology enabling health providers to remotely monitor vital parameters of patients via tiny wearable and implantable sensors. In a WBAN, medical data is collected by several tiny sensors and usually transmitted to a server-side (e.g., a cloud service provider) for long-term storage and online/offline processing. However, as the health data includes several sensitive information, providing confidentiality and fine-grained access control is necessary to preserve the privacy of patients. In this paper, we design an attribute-based encryption (ABE) scheme with lightweight encryption and decryption mechanisms. Our scheme enables tiny sensors to encrypt the collected data under an access control policy by performing very few computational operations. Also, the computational overhead on the users in the decryption phase is lightweight, and most of the operations are performed by the cloud server. In comparison with some excellent ABE schemes, our encryption mechanism is more than 100 times faster, and the communication overhead in our scheme decreases significantly. We provide the security definition for the new primitive and prove its security in the standard model and under the hardness assumption of the decisional bilinear Diffie-Hellman (DBDH) problem.

Download Full-text

Investigating the limits of fine-grained parallelism in a statically scheduled superscalar architecture

Lecture Notes in Computer Science - Euro-Par'96 Parallel Processing ◽

10.1007/bfb0024777 ◽

1996 ◽

pp. 779-788 ◽

Cited By ~ 1

Author(s):

Richard Potter ◽

Gordon Steven

Keyword(s):

Fine Grained ◽

Fine Grained Parallelism

Download Full-text

APPLICATION OF NOVEL CLONAL ALGORITHM IN MULTIOBJECTIVE OPTIMIZATION

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622010003804 ◽

2010 ◽

Vol 09 (02) ◽

pp. 239-266 ◽

Cited By ~ 19

Author(s):

JIANYONG CHEN ◽

QIUZHEN LIN ◽

QINGBIN HU

Keyword(s):

Multiobjective Optimization ◽

Coarse Grained ◽

Pareto Optimal ◽

Pareto Optimal Front ◽

Fine Grained ◽

Initial Stage ◽

Speed Up ◽

Main Notion ◽

Hybrid Mutation Operator ◽

Cooling Schedule

In this paper, a novel clonal algorithm applied in multiobjecitve optimization (NCMO) is presented, which is designed from the improvement of search operators, i.e. dynamic mutation probability, dynamic simulated binary crossover (D-SBX) operator and hybrid mutation operator combining with Gaussian and polynomial mutations (GP-HM) operator. The main notion of these approaches is to perform more coarse-grained search at initial stage in order to speed up the convergence toward the Pareto-optimal front. Once the solutions are getting close to the Pareto-optimal front, more fine-grained search is performed in order to reduce the gaps between the solutions and the Pareto-optimal front. Based on this purpose, a cooling schedule is adopted in these approaches, reducing the parameters gradually to a minimal threshold, the aim of which is to keep a desirable balance between fine-grained search and coarse-grained search. By this means, the exploratory capabilities of NCMO are enhanced. When compared with various state-of-the-art multiobjective optimization algorithms developed recently, simulation results show that NCMO has remarkable performance.

Download Full-text