Implementation with comparison of system performance in different parallel processing configuration systems using matrix multiplication

2022 ◽  
Author(s):  
Dhurgham A. Habeeban ◽  
Yahya M. Al-Mayali
2013 ◽  
Vol 718-720 ◽  
pp. 2125-2130 ◽  
Author(s):  
Xiao Qun Sun ◽  
Qiang Wu ◽  
Xu Wen Li ◽  
Xin Zheng

This paper introduces the performance metric of DSP parallel processing system and presents a model of coarse-grained speedup of DSP parallel processing structure. Quantitative research is done according to the system performance index and target program features. This study simulates and analyzes different communication protocols and different influences of different degrees of parallelism on the parallel processing structure performances. Optimization direction of parallel processing system is put forward.


Computation ◽  
2021 ◽  
Vol 9 (8) ◽  
pp. 86
Author(s):  
Eduardo Patricio Estévez Estévez Ruiz ◽  
Giovanny Eduardo Caluña Caluña Chicaiza ◽  
Fabian Rodolfo Jiménez Patiño ◽  
Joaquín Cayetano López López Lago ◽  
Saravana Prakash Thirumuruganandham

Optimizing HPC systems based on performance factors and bottlenecks is essential for designing an HPC infrastructure with the best characteristics and at a reasonable cost. Such insight can only be achieved through a detailed analysis of existing HPC systems and the execution of their workloads. The “Quinde I” is the only and most powerful supercomputer in Ecuador and is currently listed third on the South America. It was built with the IBM Power 8 servers. In this work, we measured its performance using different parameters from High-Performance Computing (HPC) to compare it with theoretical values and values obtained from tests on similar models. To measure its performance, we compiled and ran different benchmarks with the specific optimization flags for Power 8 to get the maximum performance with the current configuration in the hardware installed by the vendor. The inputs of the benchmarks were varied to analyze their impact on the system performance. In addition, we compile and compare the performance of two algorithms for dense matrix multiplication SRUMMA and DGEMM.


2018 ◽  
Vol 18 (13&14) ◽  
pp. 1095-1114
Author(s):  
Zongyuan Zhang ◽  
Zhijin Guan ◽  
Hong Zhang ◽  
Haiying Ma ◽  
Weiping Ding

In order to realize the linear nearest neighbor{(LNN)} of the quantum circuits and reduce the quantum cost of linear reversible quantum circuits, a method for synthesizing and optimizing linear reversible quantum circuits based on matrix multiplication of the structure of the quantum circuit is proposed. This method shows the matrix representation of linear quantum circuits by multiplying matrices of different parts of the whole circuit. The LNN realization by adding the SWAP gates is proposed and the equivalence of two ways of adding the SWAP gates is proved. The elimination rules of the SWAP gates between two overlapped adjacent quantum gates in different cases are proposed, which reduce the quantum cost of quantum circuits after realizing the LNN architecture. We propose an algorithm based on parallel processing in order to effectively reduce the time consumption for large-scale quantum circuits. Experiments show that the quantum cost can be improved by 34.31\% on average and the speed-up ratio of the GPU-based algorithm can reach 4 times compared with the CPU-based algorithm. The average time optimization ratio of the benchmark large-scale circuits in RevLib processed by the parallel algorithm is {95.57\%} comparing with the serial algorithm.


2017 ◽  
Vol 131 (4) ◽  
pp. 337-347 ◽  
Author(s):  
Gesa Feenders ◽  
Yoko Kato ◽  
Katharina M. Borzeszkowski ◽  
Georg M. Klump

1960 ◽  
Author(s):  
S. Seidenstein ◽  
R. Chernikoff ◽  
F. V. Taylor

Author(s):  
Christopher Wickens ◽  
Jack Isreal ◽  
Gregory McCarthy ◽  
Daniel Gopher ◽  
Emanuel Donchin

Sign in / Sign up

Export Citation Format

Share Document