A Flexible Parallel Runtime for Large Scale Block-Based Matrix Multiplication

Author(s):  
Keyan Liu ◽  
Shaohua Song ◽  
Ningnan Zhou ◽  
Yanyu Ma
Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 253
Author(s):  
Yosang Jeong ◽  
Hoon Ryu

The non-equilibrium Green’s function (NEGF) is being utilized in the field of nanoscience to predict transport behaviors of electronic devices. This work explores how much performance improvement can be driven for quantum transport simulations with the aid of manycore computing, where the core numerical operation involves a recursive process of matrix multiplication. Major techniques adopted for performance enhancement are data restructuring, matrix tiling, thread scheduling, and offload computing, and we present technical details on how they are applied to optimize the performance of simulations in computing hardware, including Intel Xeon Phi Knights Landing (KNL) systems and NVIDIA general purpose graphic processing unit (GPU) devices. With a target structure of a silicon nanowire that consists of 100,000 atoms and is described with an atomistic tight-binding model, the effects of optimization techniques on the performance of simulations are rigorously tested in a KNL node equipped with two Quadro GV100 GPU devices, and we observe that computation is accelerated by a factor of up to ∼20 against the unoptimized case. The feasibility of handling large-scale workloads in a huge computing environment is also examined with nanowire simulations in a wide energy range, where good scalability is procured up to 2048 KNL nodes.


2020 ◽  
Vol 38 (5) ◽  
pp. 6445-6455
Author(s):  
Malay Kumar ◽  
Vaibhav Mishra ◽  
Anurag Shukla ◽  
Munendra Singh ◽  
Manu Vardhan

2011 ◽  
Vol 21 (03) ◽  
pp. 279-299 ◽  
Author(s):  
I-HSIN CHUNG ◽  
CHE-RUNG LEE ◽  
JIAZHENG ZHOU ◽  
YEH-CHING CHUNG

As the high performance computing systems scale up, mapping the tasks of a parallel application onto physical processors to allow efficient communication becomes one of the critical performance issues. Existing algorithms were usually designed to map applications with regular communication patterns. Their mapping criterion usually overlooks the size of communicated messages, which is the primary factor of communication time. In addition, most of their time complexities are too high to process large scale problems. In this paper, we present a hierarchical mapping algorithm (HMA), which is capable of mapping applications with irregular communication patterns. It first partitions tasks according to their run-time communication information. The tasks that communicate with each other more frequently are regarded as strongly connected. Based on their connectivity strength, the tasks are partitioned into supernodes based on the algorithms in spectral graph theory. The hierarchical partitioning reduces the mapping algorithm complexity to achieve scalability. Finally, the run-time communication information will be used again in fine tuning to explore better mappings. With the experiments, we show how the mapping algorithm helps to reduce the point-to-point communication time for the PDGEMM, a ScaLAPACK matrix multiplication computation kernel, up to 20% and the AMG2006, a tier 1 application of the Sequoia benchmark, up to 7%.


2020 ◽  
Vol 66 ◽  
pp. 101994 ◽  
Author(s):  
Wei Fan ◽  
Lianyu Zheng ◽  
Wei Ji ◽  
Xun Xu ◽  
Lihui Wang ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Abdelouahid Derhab ◽  
Mohamed Guerroumi ◽  
Mohamed Belaoued ◽  
Omar Cheikhrouhou

Multicontroller software-defined networks have been widely adopted to enable management of large-scale networks. However, they are vulnerable to several attacks including false data injection, which creates topology inconsistency among controllers. To deal with this issue, we propose BMC-SDN, a security architecture that integrates blockchain and multicontroller SDN and divides the network into several domains. Each SDN domain is managed by one master controller that communicates through blockchain with the masters of the other domains. The master controller creates blocks of network flow updates, and its redundant controllers validate the new block based on a proposed reputation mechanism. The reputation mechanism rates the controllers, i.e., block creator and voters, after each voting operation using constant and combined adaptive fading reputation strategies. The evaluation results demonstrate a fast and optimal detection of fraudulent flow rule injection.


Author(s):  
Wagner Al Alam ◽  
Francisco Carvalho Junior

The efforts to make cloud computing suitable for the requirements of HPC applications have motivated us to design HPC Shelf, a cloud computing platform of services for building and deploying parallel computing systems for large-scale parallel processing. We introduce Alite, the system of contextual contracts of HPC Shelf, aimed at selecting component implementations according to requirements of applications, features of targeting parallel computing platforms (e.g. clusters), QoS (Quality-of-Service) properties and cost restrictions. It is evaluated through a small-scale case study employing a componentbased framework for matrix-multiplication based on the BLAS library.


Author(s):  
Kunpeng Zhang ◽  
Wendy Moe

For decades, brand managers have monitored brand health with the use of consumer surveys, which have been refined to address issues related to sampling bias, response bias, leading questions, etc. However, with the advance of Web 2.0 and the internet, consumers have turned to social media to express their opinions on a variety of topics and, subsequently, have generated an extremely large amount of interaction data with brands. Analyzing these publicly available data to measure brand health has attracted great research attention. In this study, we focus on developing a method to measure brand favorability while accounting for the measure biases exhibited by social media posters. Specifically, we propose a probabilistic graphical model–based collective inference framework and implement a block-based Markov chain Monte Carlo sampling technique to obtain an adjusted brand favorability measure that is correlated with traditional survey-based measures used by brands. To demonstrate the effectiveness of our model, we evaluate it using more than 3,300 brands and about 205 million unique users that interact with those brands collected through Facebook. Our model performs very well, providing brand managers with a new method to more accurately measure consumer opinions toward the brand using social media data.


Sign in / Sign up

Export Citation Format

Share Document