scholarly journals Global Scheduling Heuristics for Multicore Architecture

2015 ◽  
Vol 2015 ◽  
pp. 1-12
Author(s):  
D. C. Kiran ◽  
S. Gurunarayanan ◽  
Janardan Prasad Misra ◽  
Abhijeet Nawal

This work discusses various compiler level global scheduling techniques for multicore processors. The main contribution of the work is to delegate the job of exploiting fine grained parallelism to the compiler, thereby reducing the hardware overhead and the programming complexity. This goal is achieved by decomposing a sequential program into multiple subblocks and constructing subblock dependency graph (SDG). The proposed schedulers select subblocks from the SDG and schedules it on different cores, by ensuring the correct order of execution of subblocks. In conjunction with parallelization techniques, locality optimizations are performed to minimize communication overhead between the cores. The results observed are indicative of better and balanced speed-up per watt.

2021 ◽  
Vol 8 (3) ◽  
pp. 1-18
Author(s):  
James Edwards ◽  
Uzi Vishkin

Boolean satisfiability (SAT) is an important performance-hungry problem with applications in many problem domains. However, most work on parallelizing SAT solvers has focused on coarse-grained, mostly embarrassing, parallelism. Here, we study fine-grained parallelism that can speed up existing sequential SAT solvers, which all happen to be of the so-called Conflict-Directed Clause Learning variety. We show the potential for speedups of up to 382× across a variety of problem instances. We hope that these results will stimulate future research, particularly with respect to a computer architecture open problem we present.


Author(s):  
Zhuliang Yao ◽  
Shijie Cao ◽  
Wencong Xiao ◽  
Chen Zhang ◽  
Lanshun Nie

In trained deep neural networks, unstructured pruning can reduce redundant weights to lower storage cost. However, it requires the customization of hardwares to speed up practical inference. Another trend accelerates sparse model inference on general-purpose hardwares by adopting coarse-grained sparsity to prune or regularize consecutive weights for efficient computation. But this method often sacrifices model accuracy. In this paper, we propose a novel fine-grained sparsity approach, Balanced Sparsity, to achieve high model accuracy with commercial hardwares efficiently. Our approach adapts to high parallelism property of GPU, showing incredible potential for sparsity in the widely deployment of deep learning services. Experiment results show that Balanced Sparsity achieves up to 3.1x practical speedup for model inference on GPU, while retains the same high model accuracy as finegrained sparsity.


Author(s):  
Shanshan Yu ◽  
Jicheng Zhang ◽  
Ju Liu ◽  
Xiaoqing Zhang ◽  
Yafeng Li ◽  
...  

AbstractIn order to solve the problem of distributed denial of service (DDoS) attack detection in software-defined network, we proposed a cooperative DDoS attack detection scheme based on entropy and ensemble learning. This method sets up a coarse-grained preliminary detection module based on entropy in the edge switch to monitor the network status in real time and report to the controller if any abnormality is found. Simultaneously, a fine-grained precise attack detection module is designed in the controller, and a ensemble learning-based algorithm is utilized to further identify abnormal traffic accurately. In this framework, the idle computing capability of edge switches is fully utilized with the design idea of edge computing to offload part of the detection task from the control plane to the data plane innovatively. Simulation results of two common DDoS attack methods, ICMP and SYN, show that the system can effectively detect DDoS attacks and greatly reduce the southbound communication overhead and the burden of the controller as well as the detection delay of the attacks.


Author(s):  
Luis Miguel Pinho ◽  
Brad Moore ◽  
Stephen Michell ◽  
S. Tucker Taft

Author(s):  
Hao Yu ◽  
Sen Yang ◽  
Shenghuo Zhu

In distributed training of deep neural networks, parallel minibatch SGD is widely used to speed up the training process by using multiple workers. It uses multiple workers to sample local stochastic gradients in parallel, aggregates all gradients in a single server to obtain the average, and updates each worker’s local model using a SGD update with the averaged gradient. Ideally, parallel mini-batch SGD can achieve a linear speed-up of the training time (with respect to the number of workers) compared with SGD over a single worker. However, such linear scalability in practice is significantly limited by the growing demand for gradient communication as more workers are involved. Model averaging, which periodically averages individual models trained over parallel workers, is another common practice used for distributed training of deep neural networks since (Zinkevich et al. 2010) (McDonald, Hall, and Mann 2010). Compared with parallel mini-batch SGD, the communication overhead of model averaging is significantly reduced. Impressively, tremendous experimental works have verified that model averaging can still achieve a good speed-up of the training time as long as the averaging interval is carefully controlled. However, it remains a mystery in theory why such a simple heuristic works so well. This paper provides a thorough and rigorous theoretical study on why model averaging can work as well as parallel mini-batch SGD with significantly less communication overhead.


Sensors ◽  
2020 ◽  
Vol 20 (4) ◽  
pp. 1088 ◽  
Author(s):  
Mohammad Ali ◽  
Mohammad-Reza Sadeghi ◽  
Ximeng Liu

Wireless Body Area Network (WBAN) is a highly promising technology enabling health providers to remotely monitor vital parameters of patients via tiny wearable and implantable sensors. In a WBAN, medical data is collected by several tiny sensors and usually transmitted to a server-side (e.g., a cloud service provider) for long-term storage and online/offline processing. However, as the health data includes several sensitive information, providing confidentiality and fine-grained access control is necessary to preserve the privacy of patients. In this paper, we design an attribute-based encryption (ABE) scheme with lightweight encryption and decryption mechanisms. Our scheme enables tiny sensors to encrypt the collected data under an access control policy by performing very few computational operations. Also, the computational overhead on the users in the decryption phase is lightweight, and most of the operations are performed by the cloud server. In comparison with some excellent ABE schemes, our encryption mechanism is more than 100 times faster, and the communication overhead in our scheme decreases significantly. We provide the security definition for the new primitive and prove its security in the standard model and under the hardness assumption of the decisional bilinear Diffie-Hellman (DBDH) problem.


Author(s):  
JIANYONG CHEN ◽  
QIUZHEN LIN ◽  
QINGBIN HU

In this paper, a novel clonal algorithm applied in multiobjecitve optimization (NCMO) is presented, which is designed from the improvement of search operators, i.e. dynamic mutation probability, dynamic simulated binary crossover (D-SBX) operator and hybrid mutation operator combining with Gaussian and polynomial mutations (GP-HM) operator. The main notion of these approaches is to perform more coarse-grained search at initial stage in order to speed up the convergence toward the Pareto-optimal front. Once the solutions are getting close to the Pareto-optimal front, more fine-grained search is performed in order to reduce the gaps between the solutions and the Pareto-optimal front. Based on this purpose, a cooling schedule is adopted in these approaches, reducing the parameters gradually to a minimal threshold, the aim of which is to keep a desirable balance between fine-grained search and coarse-grained search. By this means, the exploratory capabilities of NCMO are enhanced. When compared with various state-of-the-art multiobjective optimization algorithms developed recently, simulation results show that NCMO has remarkable performance.


2010 ◽  
Vol 5 (4) ◽  
pp. 291-304 ◽  
Author(s):  
Lars Baunegaard With Jensen ◽  
Anders Kjær-Nielsen ◽  
Karl Pauwels ◽  
Jeppe Barsøe Jessen ◽  
Marc Van Hulle ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document