scholarly journals GP-SIMD Processing-in-Memory

2015 ◽  
Vol 11 (4) ◽  
pp. 1-26 ◽  
Author(s):  
Amir Morad ◽  
Leonid Yavits ◽  
Ran Ginosar
Keyword(s):  
1992 ◽  
Vol 02 (03) ◽  
pp. 227-245 ◽  
Author(s):  
YOSHIHIRO FUJITA ◽  
NOBUYUKI YAMASHITA ◽  
SHIN-ICHIRO OKAZAKI

This paper presents architectural features and performances for an Integrated Memory Array Processor (IMAP) LSI, which integrates a large capacity memory and a one-dimensional SIMD processor array on a single chip. The IMAP has a conventional memory interface, almost the same as a dual port video RAM with operational input extension. SIMD processing is carried out on the IMAP chip, using an internal processor array, while other higher level processing is concurrently accomplished with external processors through the random access memory port. In addition to the basic IMAP architecture, this paper describes orthogonal IMAP, which has an extended IMAP architecture. The basic IMAP uses a conventional memory cell, while the orthogonal IMAP uses an orthogonal memory for holding images.


Author(s):  
Meilian Xu ◽  
Parimala Thulasiraman ◽  
Ruppa K. Thulasiram

This chapter uses two scientific computing kernels to illustrate challenges of designing parallel algorithms for one heterogeneous multi-core processor, the Cell Broadband Engine processor (Cell/B.E.). It describes the limitation of the current parallel systems using single-core processors as building blocks. The limitation deteriorates the performance of applications which have data-intensive and computationintensive kernels such as Finite Difference Time Domain (FDTD) and Fast Fourier Transform (FFT). FDTD is a regular problem with nearest neighbour comminuncation pattern under synchronization constraint. FFT based on indirect swap network (ISN) modifies the data mapping in traditional Cooley- Tukey butterfly network to improve data locality, hence reducing the communication and synchronization overhead. The authors hope to unleash the Cell/B.E. and design parallel FDTD and parallel FFT based on ISN by taking into account unique features of Cell/B.E. such as its eight SIMD processing units on the single chip and its high-speed on-chip bus.


Author(s):  
Ilya V. Afanasyev ◽  
Vadim V. Voevodin ◽  
Vladimir V. Voevodin ◽  
Kazuhiko Komatsu ◽  
Hiroaki Kobayashi

2007 ◽  
Vol 2007 ◽  
pp. 1-9 ◽  
Author(s):  
Kai Zeng ◽  
Erwei Bai ◽  
Ge Wang

Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors.


Sign in / Sign up

Export Citation Format

Share Document