A Scalable Large Format Display Based on Zero Client Processor

This paper proposes zero client module that targets Large Format Display (LFD) system for display wall. Increased resolution in modern LFD requires a high bandwidth channel and a high performance display controller to transfer the image data to the monitor. The key idea is to use a Gagabit-Etherent communication based Daisy-Chain to transfer an image data. This communication supports sufficient bandwidth for image data transfer. As a result, we implement the LFD system using tha zero client module and LFDmonitors.

Download Full-text

SPECTRA: A program for processing electron images of crystals

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100121065 ◽

1992 ◽

Vol 50 (1) ◽

pp. 132-133

Author(s):

M.F. Schmid ◽

R. Dargahi ◽

M. W. Tam

Keyword(s):

Lattice Distortion ◽

Data Transfer ◽

Reciprocal Lattice ◽

Data Entry ◽

Image Data ◽

Distortion Correction ◽

Specimen Preparation ◽

Electron Image ◽

Computer Controlled ◽

Program Modules

Electron crystallography is an emerging field for structure determination as evidenced by a number of membrane proteins that have been solved to near-atomic resolution. Advances in specimen preparation and in data acquisition with a 400kV microscope by computer controlled spot scanning mean that our ability to record electron image data will outstrip our capacity to analyze it. The computed fourier transform of these images must be processed in order to provide a direct measurement of amplitudes and phases needed for 3-D reconstruction.In anticipation of this processing bottleneck, we have written a program that incorporates a menu-and mouse-driven procedure for auto-indexing and refining the reciprocal lattice parameters in the computed transform from an image of a crystal. It is linked to subsequent steps of image processing by a system of data bases and spawned child processes; data transfer between different program modules no longer requires manual data entry. The progress of the reciprocal lattice refinement is monitored visually and quantitatively. If desired, the processing is carried through the lattice distortion correction (unbending) steps automatically.

Download Full-text

High-Bandwidth Tactical-Network Data Analysis in a High-Performance-Computing (HPC) Environment: Transport Protocol (Transmission Control Protocol/User Datagram Protocol [TCP/UDP]) Analysis

10.21236/ada621268 ◽

2015 ◽

Author(s):

Kenneth D. Renard ◽

James R. Adametz

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Transmission Control Protocol ◽

Network Data ◽

Transport Protocol ◽

Transmission Control ◽

Control Protocol ◽

High Bandwidth ◽

Performance Computing

Download Full-text

High-Bandwidth Tactical-Network Data Analysis in a High-Performance-Computing (HPC) Environment: Device Status Data

10.21236/ada626790 ◽

2015 ◽

Author(s):

Brian Panneton ◽

Brendan Tauras ◽

Christopher Wancowicz ◽

Sean Coyne

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Network Data ◽

High Bandwidth ◽

Status Data ◽

Performance Computing

Download Full-text

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

The Journal of Supercomputing ◽

10.1007/s11227-021-03853-x ◽

2021 ◽

Author(s):

Xiaohan Tao ◽

Jianmin Pang ◽

Jinlong Xu ◽

Yu Zhu

Keyword(s):

Energy Consumption ◽

High Performance ◽

Scientific Computing ◽

Data Transfer ◽

Performance Model ◽

Experimental Result ◽

Transfer Model ◽

Scratchpad Memory ◽

On Chip ◽

Many Core

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.

Download Full-text

I-DMAC: An Intelligent DMA Controller for Utilization - Aware Video Streaming used in AI Applications

10.54216/jcim.080203 ◽

2021 ◽

pp. 60-70

Author(s):

Piyush Kumar Shukla ◽

◽

Prashant Kumar Shukla ◽

Keyword(s):

Video Processing ◽

High Performance ◽

Data Transfer ◽

Direct Memory Access ◽

Large Data ◽

Video Frame ◽

Microprocessor System ◽

Bulk Data ◽

Xilinx Fpga ◽

Vhdl Code

The interpretation of large data streams necessitates high-performance repeated transfers, which overload Microprocessor System on Chips (SoC). The effective direct memory access (DMA) controller performs bulk data transfers without the CPU's involvement. The Direct Memory Controller (DMAC) solves this by facilitating bulk data transfer and execution. In this work, we created an intelligent DMAC (I-DMAC) for accessing video processing data without using CPUs. The model includes Bus selection Module, User control signal, Status Register, DMA supported Address, and AXI-PCI subsystems for improved video frame analysis. These modules are experimentally verified in Xilinx FPGA SoC architecture using VHDL code simulation and results compared to the E-DMAC model.

Download Full-text

PhasorNet A High Performance Network Communications Architecture for Synchrophasor Data Transfer in Wide Area Monitoring, Protection and Control Applications

2007 iREP Symposium - Bulk Power System Dynamics and Control - VII. Revitalizing Operational Reliability ◽

10.1109/irep.2007.4410566 ◽

2007 ◽

Cited By ~ 2

Author(s):

K A Fahid ◽

Prasanth Gopalakrishnan ◽

Sushil Cherian

Keyword(s):

High Performance ◽

Data Transfer ◽

Wide Area ◽

Control Applications ◽

Wide Area Monitoring ◽

Network Communications ◽

Protection And Control ◽

And Control ◽

Area Monitoring

Download Full-text

High-Level Synthesis Design for Stencil Computations on FPGA with High Bandwidth Memory

Electronics ◽

10.3390/electronics9081275 ◽

2020 ◽

Vol 9 (8) ◽

pp. 1275

Author(s):

Changdao Du ◽

Yoshiki Yamaguchi

Keyword(s):

Programming Languages ◽

High Performance ◽

Design Space Exploration ◽

Scale Up ◽

High Level Synthesis ◽

Stencil Computations ◽

Temporal Domain ◽

High Bandwidth ◽

Promising Solution ◽

High Level

Due to performance and energy requirements, FPGA-based accelerators have become a promising solution for high-performance computations. Meanwhile, with the help of high-level synthesis (HLS) compilers, FPGA can be programmed using common programming languages such as C, C++, or OpenCL, thereby improving design efficiency and portability. Stencil computations are significant kernels in various scientific applications. In this paper, we introduce an architecture design for implementing stencil kernels on state-of-the-art FPGA with high bandwidth memory (HBM). Traditional FPGAs are usually equipped with external memory, e.g., DDR3 or DDR4, which limits the design space exploration in the spatial domain of stencil kernels. Therefore, many previous studies mainly relied on exploiting parallelism in the temporal domain to eliminate the bandwidth limitations. In our approach, we scale-up the design performance by considering both the spatial and temporal parallelism of the stencil kernel equally. We also discuss the design portability among different HLS compilers. We use typical stencil kernels to evaluate our design on a Xilinx U280 FPGA board and compare the results with other existing studies. By adopting our method, developers can take broad parallelization strategies based on specific FPGA resources to improve performance.

Download Full-text

Reconfigurable platform for 3D-panoramic telepresence system for mobile applications

10.32920/ryerson.14652129.v1 ◽

2021 ◽

Author(s):

Artur Saakov

Keyword(s):

High Performance ◽

Outer Space ◽

Nuclear Industry ◽

Human Beings ◽

Reconfigurable Logic ◽

Logic Devices ◽

Hazardous Environments ◽

High Bandwidth ◽

Reconfigurable Platform ◽

Bomb Disposal

The concept of telepresence allows human beings to interact with hazardous environments and situations without facing any actual risks. Examples include the nuclear industry, outer space and underwater operations, mining, bomb disposal and firefighting. Recent progress in digital system technology, especially in technology of reconfigurable logic devices (e.g. FPGA), allows the effective implementation of advanced embedded systems characterized by high-performance data processing and high-bandwidth communication. However, most of the existing telepresence systems do not benefit from these advancements. Therefore, the goal of this work was to develop a concept and architecture of the platform for the 3D-Panoramic Telepresence System for mobile robotic applications based on reconfigurable logic devices. During the development process, two versions of the system were implemented. The first system focused on feasibility testing of major components of the proposed architecture. Based on the experimental results obtained on the first prototype of the system and their analyses, a set of recommendations were derived for an updated version of the system. These recommendations were incorporated into the implementation of the second and final version of the system.

Download Full-text