High-Performance Image Reconstruction (HPIR) in Three Dimensions

Commonly used in medical imaging for diagnostic purposes, in luggage scanning, as well as in industrial non-destructive testing applications, Computed Tomography (CT) is an imaging technique that provides cross sections of an object from measurements taken from different angular positions around the object. CT, also referred to as Image Reconstruction (IR), is known to be a very compute-intensive problem. In its simplest form, the computational load is a function of O(M × N3), where M represents the number of measurements taken around the object and N is the dimension of the object. Furthermore, research institutes report that the increase in processing power required by CT is consistently above Moore‘s Law. On the other hand, the changing work flow in hospital requires obtaining CT images faster with better quality from lower dose. In some cases, real time is needed. High Performance Image Reconstruction (HPIR) has to be used to match the performance requirements involved by the use of modern CT reconstruction algorithms in hospitals. Traditionally, this problem had been solved by the design of specific hardware. Nowadays, the evolution of technology makes it possible to use Components of the Shelf (COTS). Typical HPIR platforms can be built around multicore processors such as the Cell Broadband Engine (CBE), General-Purpose Graphics Processing Units (GPGPU) or Field Programmable Gate Arrays (FPGA). These platforms exhibit different level in the parallelism required to implement CT reconstruction algorithms. They also have different properties in the way the computation can be carried out, potentially requiring drastic changes in the way an algorithm can be implemented. Furthermore, because of their COTS nature, it is not always easy to take the best advantages of a given platform and compromises have to be made. Finally, a fully fleshed reconstruction platform also includes the data acquisition interface as well as the vizualisation of the reconstructed slices. These parts are the area of excellence of FPGAs and GPGPUs. However, more often then not, the processing power available in those units exceeds the requirement of a given pipeline and the remaining real estate and processing power can be used for the core of the reconstruction pipeline. Indeed, several design options can be considered for a given algorithm with yet another set of compromises.

Download Full-text

State-of-the-art in Heterogeneous Computing

Scientific Programming ◽

10.1155/2010/540159 ◽

2010 ◽

Vol 18 (1) ◽

pp. 1-33 ◽

Cited By ~ 96

Author(s):

Andre R. Brodtkorb ◽

Christopher Dyken ◽

Trond R. Hagen ◽

Jon M. Hjelmervik ◽

Olaf O. Storaasli

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

State Of The Art ◽

Peak Performance ◽

Fine Grained ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Cost Efficient ◽

Graphics Processing

Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.

Download Full-text

Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution

Parallel Processing Letters ◽

10.1142/s0129626418500160 ◽

2018 ◽

Vol 28 (04) ◽

pp. 1850016 ◽

Cited By ~ 2

Author(s):

Christian Schmitt ◽

Moritz Schmid ◽

Sebastian Kuckuk ◽

Harald Köstler ◽

Jürgen Teich ◽

...

Keyword(s):

Conjugate Gradient ◽

Graphics Processing Units ◽

High Performance ◽

Central Processing ◽

Domain Specific ◽

Multigrid Solvers ◽

Field Programmable ◽

Grid Solution ◽

Tool Set ◽

Numerical Solver

Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096[Formula: see text]4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.

Download Full-text

FPGA Implementation of ECT Digital System for Imaging Conductive Materials

Algorithms ◽

10.3390/a12020028 ◽

2019 ◽

Vol 12 (2) ◽

pp. 28 ◽

Cited By ~ 1

Author(s):

Wael Deabes

Keyword(s):

High Performance ◽

Matrix Multiplication ◽

Inner Product ◽

Reconstruction Algorithms ◽

Hardware System ◽

Software Algorithms ◽

Fast Processing ◽

Field Programmable ◽

On Chip ◽

Stability And Accuracy

This paper presents the hardware implementation of a stand-alone Electrical Capacitance Tomography (ECT) system employing a Field Programmable Gate Array (FPGA). The image reconstruction algorithms of the ECT system demand intensive computation and fast processing of large number of measurements. The inner product of large vectors is the core of the majority of these algorithms. Therefore, a reconfigurable segmented parallel inner product architecture for the parallel matrix multiplication is proposed. In addition, hardware-software codesign targeting FPGA System-On-Chip (SoC) is applied to achieve high performance. The development of the hardware-software codesign is carried out via commercial tools to adjust the software algorithms and parameters of the system. The ECT system is used in this work to monitor the characteristic of the molten metal in the Lost Foam Casting (LFC) process. The hardware system consists of capacitive sensors, wireless nodes and FPGA module. The experimental results reveal high stability and accuracy when building the ECT system based on the FPGA architecture. The proposed system achieves high performance in terms of speed and small design density.

Download Full-text

Heterogeneous Reconstruction of Tracks and Primary Vertices With the CMS Pixel Tracker

Frontiers in Big Data ◽

10.3389/fdata.2020.601728 ◽

2020 ◽

Vol 3 ◽

Author(s):

A. Bocci ◽

V. Innocente ◽

M. Kortelainen ◽

F. Pantaleo ◽

M. Rovere

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

Hadron Collider ◽

Proton Proton ◽

Processing Power ◽

Reconstruction Software ◽

Speed Up ◽

Computing Platforms ◽

Graphics Processing

The High-Luminosity upgrade of the Large Hadron Collider (LHC) will see the accelerator reach an instantaneous luminosity of 7 × 1034 cm−2 s−1 with an average pileup of 200 proton-proton collisions. These conditions will pose an unprecedented challenge to the online and offline reconstruction software developed by the experiments. The computational complexity will exceed by far the expected increase in processing power for conventional CPUs, demanding an alternative approach. Industry and High-Performance Computing (HPC) centers are successfully using heterogeneous computing platforms to achieve higher throughput and better energy efficiency by matching each job to the most appropriate architecture. In this paper we will describe the results of a heterogeneous implementation of pixel tracks and vertices reconstruction chain on Graphics Processing Units (GPUs). The framework has been designed and developed to be integrated in the CMS reconstruction software, CMSSW. The speed up achieved by leveraging GPUs allows for more complex algorithms to be executed, obtaining better physics output and a higher throughput.

Download Full-text

Three dimensional deblurring of transmitted-light brightfield micrographs

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100148654 ◽

1993 ◽

Vol 51 ◽

pp. 564-565

Author(s):

Santosh Bhattacharyya

Keyword(s):

Image Reconstruction ◽

Three Dimensional ◽

Transmitted Light ◽

Reconstruction Algorithms ◽

3D Image Reconstruction ◽

Structural Details ◽

Spread Function ◽

Early Testing ◽

Linearized Model ◽

Image Reconstruction Algorithms

Three dimensional microscopic structures play an important role in the understanding of various biological and physiological phenomena. Structural details of neurons, such as the density, caliber and volumes of dendrites, are important in understanding physiological and pathological functioning of nervous systems. Even so, many of the widely used stains in biology and neurophysiology are absorbing stains, such as horseradish peroxidase (HRP), and yet most of the iterative, constrained 3D optical image reconstruction research has concentrated on fluorescence microscopy. It is clear that iterative, constrained 3D image reconstruction methodologies are needed for transmitted light brightfield (TLB) imaging as well. One of the difficulties in doing so, in the past, has been in determining the point spread function of the system.We have been developing several variations of iterative, constrained image reconstruction algorithms for TLB imaging. Some of our early testing with one of them was reported previously. These algorithms are based on a linearized model of TLB imaging.

Download Full-text

High performance computing on graphics processing units

Pollack Periodica ◽

10.1556/pollack.3.2008.2.3 ◽

2008 ◽

Vol 3 (2) ◽

pp. 27-34 ◽

Cited By ~ 2

Author(s):

Balázs Tukora ◽

Tibor Szalay

Keyword(s):

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Graphics Processing ◽

Performance Computing

Download Full-text

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

Graphics Processing

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

DSPSR: Digital Signal Processing Software for Pulsar Astronomy

Publications of the Astronomical Society of Australia ◽

10.1071/as10021 ◽

2011 ◽

Vol 28 (1) ◽

pp. 1-14 ◽

Cited By ~ 172

Author(s):

W. van Straten ◽

M. Bailes

Keyword(s):

Signal Processing ◽

Digital Signal Processing ◽

Graphics Processing Units ◽

High Performance ◽

Digital Signal ◽

General Purpose ◽

Design Decisions ◽

Extensive Range ◽

Processing Software ◽

Graphics Processing

Abstractdspsr is a high-performance, open-source, object-oriented, digital signal processing software library and application suite for use in radio pulsar astronomy. Written primarily in C++, the library implements an extensive range of modular algorithms that can optionally exploit both multiple-core processors and general-purpose graphics processing units. After over a decade of research and development, dspsr is now stable and in widespread use in the community. This paper presents a detailed description of its functionality, justification of major design decisions, analysis of phase-coherent dispersion removal algorithms, and demonstration of performance on some contemporary microprocessor architectures.

Download Full-text

Convergence and stability analysis of the half thresholding based few-view CT reconstruction

Journal of Inverse and Ill-Posed Problems ◽

10.1515/jiip-2020-0003 ◽

2020 ◽

Vol 28 (6) ◽

pp. 829-847

Author(s):

Hua Huang ◽

Chengwu Lu ◽

Lingli Zhang ◽

Weiwei Wang

Keyword(s):

Noise Suppression ◽

Reconstructed Image ◽

Reference Image ◽

Projection Data ◽

Ct Reconstruction ◽

Reconstruction Algorithms ◽

Ill Posed ◽

Thresholding Algorithm ◽

The Stability ◽

Well Posed

AbstractThe projection data obtained using the computed tomography (CT) technique are often incomplete and inconsistent owing to the radiation exposure and practical environment of the CT process, which may lead to a few-view reconstruction problem. Reconstructing an object from few projection views is often an ill-posed inverse problem. To solve such problems, regularization is an effective technique, in which the ill-posed problem is approximated considering a family of neighboring well-posed problems. In this study, we considered the {\ell_{1/2}} regularization to solve such ill-posed problems. Subsequently, the half thresholding algorithm was employed to solve the {\ell_{1/2}} regularization-based problem. The convergence analysis of the proposed method was performed, and the error bound between the reference image and reconstructed image was clarified. Finally, the stability of the proposed method was analyzed. The result of numerical experiments demonstrated that the proposed method can outperform the classical reconstruction algorithms in terms of noise suppression and preserving the details of the reconstructed image.

Download Full-text

Phantom and clinical assessment of small pulmonary nodules using Q.Clear reconstruction on a silicon-photomultiplier-based time-of-flight PET/CT system

Scientific Reports ◽

10.1038/s41598-021-89725-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Zhifang Wu ◽

Binwei Guo ◽

Bin Huang ◽

Xinzhong Hao ◽

Ping Wu ◽

...

Keyword(s):

Time Of Flight ◽

Pulmonary Nodules ◽

Medium Size ◽

Ct Reconstruction ◽

Reconstruction Algorithms ◽

Silicon Photomultiplier ◽

Standardized Uptake Values ◽

Pet Ct ◽

Quantification Accuracy ◽

Small Pulmonary Nodules

AbstractTo evaluate the quantification accuracy of different positron emission tomography-computed tomography (PET/CT) reconstruction algorithms, we measured the recovery coefficient (RC) and contrast recovery (CR) in phantom studies. The results played a guiding role in the partial-volume-effect correction (PVC) for following clinical evaluations. The PET images were reconstructed with four different methods: ordered subsets expectation maximization (OSEM), OSEM with time-of-flight (TOF), OSEM with TOF and point spread function (PSF), and Bayesian penalized likelihood (BPL, known as Q.Clear in the PET/CT of GE Healthcare). In clinical studies, SUVmax and SUVmean (the maximum and mean of the standardized uptake values, SUVs) of 75 small pulmonary nodules (sub-centimeter group: < 10 mm and medium-size group: 10–25 mm) were measured from 26 patients. Results show that Q.Clear produced higher RC and CR values, which can improve quantification accuracy compared with other methods (P < 0.05), except for the RC of 37 mm sphere (P > 0.05). The SUVs of sub-centimeter fludeoxyglucose (FDG)-avid pulmonary nodules with Q.Clear illustrated highly significant differences from those reconstructed with other algorithms (P < 0.001). After performing the PVC, highly significant differences (P < 0.001) still existed in the SUVmean measured by Q.Clear comparing with those measured by the other algorithms. Our results suggest that the Q.Clear reconstruction algorithm improved the quantification accuracy towards the true uptake, which potentially promotes the diagnostic confidence and treatment response evaluations with PET/CT imaging, especially for the sub-centimeter pulmonary nodules. For small lesions, PVC is essential.

Download Full-text