Hardware Resource and Computational Density Efficient CNN Accelerator Design Based on FPGA

Author(s):  
Xiang Chen ◽  
Jindong Li ◽  
Yong Zhao
Author(s):  
Rachit Nigam ◽  
Sachille Atapattu ◽  
Samuel Thomas ◽  
Zhijing Li ◽  
Theodore Bauer ◽  
...  
Keyword(s):  

Author(s):  
Weiqun Zhang ◽  
Andrew Myers ◽  
Kevin Gott ◽  
Ann Almgren ◽  
John Bell

Block-structured adaptive mesh refinement (AMR) provides the basis for the temporal and spatial discretization strategy for a number of Exascale Computing Project applications in the areas of accelerator design, additive manufacturing, astrophysics, combustion, cosmology, multiphase flow, and wind plant modeling. AMReX is a software framework that provides a unified infrastructure with the functionality needed for these and other AMR applications to be able to effectively and efficiently utilize machines from laptops to exascale architectures. AMR reduces the computational cost and memory footprint compared to a uniform mesh while preserving accurate descriptions of different physical processes in complex multiphysics algorithms. AMReX supports algorithms that solve systems of partial differential equations in simple or complex geometries and those that use particles and/or particle–mesh operations to represent component physical processes. In this article, we will discuss the core elements of the AMReX framework such as data containers and iterators as well as several specialized operations to meet the needs of the application projects. In addition, we will highlight the strategy that the AMReX team is pursuing to achieve highly performant code across a range of accelerator-based architectures for a variety of different applications.


Author(s):  
J. Shanks ◽  
J. Barley ◽  
S. Barrett ◽  
M. Billing ◽  
G. Codner ◽  
...  

2014 ◽  
Vol 926-930 ◽  
pp. 1497-1500
Author(s):  
Xu Yang Chu ◽  
Gang Liu ◽  
Chun Mei Wang ◽  
Kai Zhu ◽  
Da Yun Chen ◽  
...  

This paper describe the principles of Servo Control System of Electrical Discharge Machining Based on PMAC .To meet the requirements of processing and Put forward based on PMAC servo control system. This control system combines hardware resource with PMAC real-time control function. Elaborate human-computer interface developing process.


2011 ◽  
Vol 2011 ◽  
pp. 1-16 ◽  
Author(s):  
Diana Göhringer ◽  
Michael Hübner ◽  
Etienne Nguepi Zeutebouo ◽  
Jürgen Becker

Operating systems traditionally handle the task scheduling of one or more application instances on processor-like hardware architectures. RAMPSoC, a novel runtime adaptive multiprocessor System-on-Chip, exploits the dynamic reconfiguration on FPGAs to generate, start and terminate hardware and software tasks. The hardware tasks have to be transferred to the reconfigurable hardware via a configuration access port. The software tasks can be loaded into the local memory of the respective IP core either via the configuration access port or via the on-chip communication infrastructure (e.g. a Network-on-Chip). Recent-series of Xilinx FPGAs, such as Virtex-5, provide two Internal Configuration Access Ports, which cannot be accessed simultaneously. To prevent conflicts, the access to these ports as well as the hardware resource management needs to be controlled, e.g. by a special-purpose operating system running on an embedded processor. For that purpose and to handle the relations between temporally and spatially scheduled operations, the novel approach of an operating system is of high importance. This special purpose operating system, called CAP-OS (Configuration Access Port-Operating System), which will be presented in this paper, supports the clients using the configuration port with the services of priority-based access scheduling, hardware task mapping and resource management.


Author(s):  
Mostafa Rizk ◽  
Amer Baghdadi ◽  
Michel Jézéquel

Emergent wireless communication standards, which are employed in different transmission environments, support various modulation schemes. High-order constellations are targeted to achieve high bandwidth efficiency. However, the complexity of the symbol-by-symbol Maximum A Posteriori (MAP) algorithm increases dramatically for these high-order modulation schemes. In order to reduce the hardware complexity, the suboptimal Max-Log-MAP, which is the direct transformation of the MAP algorithm into logarithmic domain, is alternatively implemented. In the literature, a great deal of research effort has been invested into Max-Log-MAP demapping. Several simplifications are presented to meet with specific constellations. In addition, the hardware implementations dedicated for Max-Log-MAP demapping vary greatly in terms of design choices, supported flexibility and performance criteria, making them a challenge to compare. This paper explores the published Max-Log-MAP algorithm simplifications and existing hardware demapper designs and presents an extensive review of the current literature. In-depth comparisons are drawn amongst the designs and different key performance characteristics are described, namely, achieved throughput, hardware resource requirements and flexibility. This survey should facilitate fair comparisons of future designs, as well as opportunities for improving the design of Max-Log-MAP demappers.


Sign in / Sign up

Export Citation Format

Share Document