Hardware Resource and Computational Density Efficient CNN Accelerator Design Based on FPGA

Block-structured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of Exascale Computing Project applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modeling. AMReX is a software framework that provides a unified infrastructure with the functionality needed for these and other AMR applications to be able to effectively and efficiently utilize machines from laptops to exascale architectures. AMR reduces the computational cost and memory footprint compared to a uniform mesh while preserving accurate descriptions of different physical processes in complex multiphysics algorithms. AMReX supports algorithms that solve systems of partial differential equations in simple or complex geometries and those that use particles and/or particle–mesh operations to represent component physical processes. In this article, we will discuss the core elements of the AMReX framework such as data containers and iterators as well as several specialized operations to meet the needs of the application projects. In addition, we will highlight the strategy that the AMReX team is pursuing to achieve highly performant code across a range of accelerator-based architectures for a variety of different applications.

Download Full-text

Accelerator design for the Cornell High Energy Synchrotron Source upgrade

Physical Review Accelerators and Beams ◽

10.1103/physrevaccelbeams.22.021602 ◽

2019 ◽

Vol 22 (2) ◽

Cited By ~ 4

Author(s):

J. Shanks ◽

J. Barley ◽

S. Barrett ◽

M. Billing ◽

G. Codner ◽

...

Keyword(s):

High Energy ◽

Synchrotron Source ◽

Accelerator Design

Download Full-text

2.5 MeV CW 4-vane RFQ accelerator design for BNCT applications

Nuclear Instruments and Methods in Physics Research Section A Accelerators Spectrometers Detectors and Associated Equipment ◽

10.1016/j.nima.2017.11.042 ◽

2018 ◽

Vol 883 ◽

pp. 57-74 ◽

Cited By ~ 4

Author(s):

Xiaowen Zhu ◽

Hu Wang ◽

Yuanrong Lu ◽

Zhi Wang ◽

Kun Zhu ◽

...

Keyword(s):

Accelerator Design

Download Full-text

Servo Control System of Electrical Discharge Machining Based on PMAC

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.926-930.1497 ◽

2014 ◽

Vol 926-930 ◽

pp. 1497-1500

Author(s):

Xu Yang Chu ◽

Gang Liu ◽

Chun Mei Wang ◽

Kai Zhu ◽

Da Yun Chen ◽

...

Keyword(s):

Control System ◽

Electrical Discharge ◽

Electrical Discharge Machining ◽

Control Function ◽

Servo Control ◽

Human Computer Interface ◽

Real Time Control ◽

Time Control ◽

Servo Control System ◽

Hardware Resource

This paper describe the principles of Servo Control System of Electrical Discharge Machining Based on PMAC .To meet the requirements of processing and Put forward based on PMAC servo control system. This control system combines hardware resource with PMAC real-time control function. Elaborate human-computer interface developing process.

Download Full-text

Operating System for Runtime Reconfigurable Multiprocessor Systems

International Journal of Reconfigurable Computing ◽

10.1155/2011/121353 ◽

2011 ◽

Vol 2011 ◽

pp. 1-16 ◽

Cited By ~ 16

Author(s):

Diana Göhringer ◽

Michael Hübner ◽

Etienne Nguepi Zeutebouo ◽

Jürgen Becker

Keyword(s):

Operating System ◽

Resource Management ◽

Multiprocessor System ◽

Task Mapping ◽

Access Port ◽

Novel Approach ◽

Hardware Resource ◽

Hardware Architectures ◽

On Chip ◽

Internal Configuration

Operating systems traditionally handle the task scheduling of one or more application instances on processor-like hardware architectures. RAMPSoC, a novel runtime adaptive multiprocessor System-on-Chip, exploits the dynamic reconfiguration on FPGAs to generate, start and terminate hardware and software tasks. The hardware tasks have to be transferred to the reconfigurable hardware via a configuration access port. The software tasks can be loaded into the local memory of the respective IP core either via the configuration access port or via the on-chip communication infrastructure (e.g. a Network-on-Chip). Recent-series of Xilinx FPGAs, such as Virtex-5, provide two Internal Configuration Access Ports, which cannot be accessed simultaneously. To prevent conflicts, the access to these ports as well as the hardware resource management needs to be controlled, e.g. by a special-purpose operating system running on an embedded processor. For that purpose and to handle the relations between temporally and spatially scheduled operations, the novel approach of an operating system is of high importance. This special purpose operating system, called CAP-OS (Configuration Access Port-Operating System), which will be presented in this paper, supports the clients using the configuration port with the services of priority-based access scheduling, hardware task mapping and resource management.

Download Full-text

A Literature Survey on Algorithms and Hardware Architectures of Max-Log-MAP Demapping

Journal of Circuits System and Computers ◽

10.1142/s021812662230001x ◽

2021 ◽

pp. 2230001

Author(s):

Mostafa Rizk ◽

Amer Baghdadi ◽

Michel Jézéquel

Keyword(s):

Research Effort ◽

High Order ◽

Performance Criteria ◽

Hardware Complexity ◽

Map Algorithm ◽

Resource Requirements ◽

Hardware Resource ◽

Modulation Schemes ◽

And Performance ◽

Log Map

Emergent wireless communication standards, which are employed in different transmission environments, support various modulation schemes. High-order constellations are targeted to achieve high bandwidth efficiency. However, the complexity of the symbol-by-symbol Maximum A Posteriori (MAP) algorithm increases dramatically for these high-order modulation schemes. In order to reduce the hardware complexity, the suboptimal Max-Log-MAP, which is the direct transformation of the MAP algorithm into logarithmic domain, is alternatively implemented. In the literature, a great deal of research effort has been invested into Max-Log-MAP demapping. Several simplifications are presented to meet with specific constellations. In addition, the hardware implementations dedicated for Max-Log-MAP demapping vary greatly in terms of design choices, supported flexibility and performance criteria, making them a challenge to compare. This paper explores the published Max-Log-MAP algorithm simplifications and existing hardware demapper designs and presents an extensive review of the current literature. In-depth comparisons are drawn amongst the designs and different key performance characteristics are described, namely, achieved throughput, hardware resource requirements and flexibility. This survey should facilitate fair comparisons of future designs, as well as opportunities for improving the design of Max-Log-MAP demappers.

Download Full-text