scholarly journals High-Performance Modular Multiplication on the Cell Processor

Author(s):  
Joppe W. Bos
2009 ◽  
Vol 17 (1-2) ◽  
pp. 135-151 ◽  
Author(s):  
Guochun Shi ◽  
Volodymyr V. Kindratenko ◽  
Ivan S. Ufimtsev ◽  
Todd J. Martinez ◽  
James C. Phillips ◽  
...  

The Cell Broadband Engine architecture is a revolutionary processor architecture well suited for many scientific codes. This paper reports on an effort to implement several traditional high-performance scientific computing applications on the Cell Broadband Engine processor, including molecular dynamics, quantum chromodynamics and quantum chemistry codes. The paper discusses data and code restructuring strategies necessary to adapt the applications to the intrinsic properties of the Cell processor and demonstrates performance improvements achieved on the Cell architecture. It concludes with the lessons learned and provides practical recommendations on optimization techniques that are believed to be most appropriate.


2009 ◽  
Vol 2 (4) ◽  
pp. 81-91 ◽  
Author(s):  
Hashir Karim Kidwai ◽  
Fadi N. Sibai ◽  
Tamer Rabie

In the world of multi-core processors, the STI Cell Broadband Engine (BE) stands out as a heterogeneous 9-core processor with a PowerPC host processor (PPE) and 8 synergic processor engines (SPEs). The Cell BE architecture is designed to improve upon conventional processors in graphics and related areas by integrating 8 computation engines each with multiple execution units and large register sets to achieve a high performance per area return. In this paper, we discuss the parallelization, implementation and performance evaluation of an edge detection image processing application based on the Roberts edge detector on the Cell BE. The authors report the edge detection performance measured on a computer with one Cell processor and with varying numbers of synergic processor engines enabled. These results are compared to the results obtained on the Cell’s single PPE with all 8 SPEs disabled. The results indicate that edge detection performs 10 times faster on the Cell BE than on modern RISC processors.


2011 ◽  
Vol 20 (03) ◽  
pp. 531-548 ◽  
Author(s):  
KOOROUSH MANOCHEHRI ◽  
BABAK SADEGHIYAN ◽  
SAADAT POURMOZAFARI

Modular calculations are widely used in many applications, especially in public key cryptography. Such operations are very time consuming, due to their long operands. To improve the performance of these calculations, many methods have been introduced. Montgomery modular multiplication is an example of such a solution to enhance the performance of modular multiplication and modular exponentiation. The radix-2 version of this method is simple and fast for hardware implementation, where multi-operand adders are required for its implementation. So far, Carry-Save-Adder (CSA) gives the best performance for multi-addition. In this paper, we propose a new recoding method for the Montgomery modular multiplier to enhance its performance. This is done through replacing CSA blocks with new blocks that have better performances than CSA in multi-addition calculations. With this replacement, we can theoretically have up to 40% reduction in area gates. In our experiments, we obtained 5.8% area reduction and 3% speed improvement in a hardware implementation. The idea behind our proposed method is the use of bitwise subtraction operator, where no carry propagation is needed. This recoding method of operands can also be used in many aspects of computer arithmetic, algorithms and computational hardware, such as multiplication, exponentiation and etc., in order to enhance their performances.


Sign in / Sign up

Export Citation Format

Share Document