Similarity-Aware Architecture/Compiler Co-Designed Context-Reduction Framework for Modulo-Scheduled CGRA

Modulo-scheduled coarse-grained reconfigurable array (CGRA) processors have shown their potential for exploiting loop-level parallelism at high energy efficiency. However, these CGRAs need frequent reconfiguration during their execution, which makes them suffer from large area and power overhead for context memory and context-fetching. To tackle this challenge, this paper uses an architecture/compiler co-designed method for context reduction. From an architecture perspective, we carefully partition the context into several subsections and only fetch the subsections that are different to the former context word whenever fetching the new context. We package each different subsection with an opcode and index value to formulate a context-fetching primitive (CFP) and explore the hardware design space by providing the centralized and distributed CFP-fetching CGRA to support this CFP-based context-fetching scheme. From the software side, we develop a similarity-aware tuning algorithm and integrate it into state-of-the-art modulo scheduling and memory access conflict optimization algorithms. The whole compilation flow can efficiently improve the similarities between contexts in each PE for the purpose of reducing both context-fetching latency and context footprint. Experimental results show that our HW/SW co-designed framework can improve the area efficiency and energy efficiency to at most 34% and 21% higher with only 2% performance overhead.

Download Full-text

Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling

IEE Proceedings - Computers and Digital Techniques ◽

10.1049/ip-cdt:20030833 ◽

2003 ◽

Vol 150 (5) ◽

pp. 255 ◽

Cited By ~ 31

Author(s):

B. Mei ◽

S. Vernalde ◽

D. Verkest ◽

H. De Man ◽

R. Lauwereins

Keyword(s):

Coarse Grained ◽

Reconfigurable Architectures ◽

Modulo Scheduling ◽

Loop Level ◽

Level Parallelism

Download Full-text

CGRA MODULO SCHEDULING FOR ACHIEVING BETTER PERFORMANCE AND INCREASED EFFICIENCY

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i4.1225 ◽

2021 ◽

Vol 12 (4) ◽

pp. 1400-1413

Author(s):

Siva Sankara Phani.T , Et. al.

Keyword(s):

High Performance ◽

High Energy ◽

Coarse Grained ◽

Modulo Scheduling ◽

Stable Algorithm ◽

Mapping Algorithms ◽

Mapping Algorithm ◽

Rearrangement Mechanism ◽

Planning Algorithm ◽

Better Than

Coarse-Grained Reconfigurable Architectures (CGRA) is an effective solution for speeding up computer-intensive activities due to its high energy efficiency and flexibility sacrifices. The timely implementation of CGRA loops was one of the hardest problems in the analysis. Modulo scheduling (MS) was productive in order to implement loops on CGRAs. The problem remains with current MS algorithms, namely to map large and irregular circuits to CGRAs over a fair period of compilation with restricted computational and high-performance routing tools. This is mainly due to an absence of awareness of major mapping limits and a time consuming approach to solving temporary and space-related mapping using CGRA buffer tools. It aims to boost the performance and robust compilation of the CGRA modulo planning algorithm. The problem with the CGRA MS is divided into time and space and the mechanisms between the two problems have to be reorganized. We have a detailed, systematic mapping fluid that addresses the algorithms of the time mapping problem with a powerful buffer algorithm and efficient connection and calculation limitations. We create a fast-stable algorithm for spatial mapping with a retransmission and rearrangement mechanism. With higher performance and quicker build-up time, our MS algorithm can map loops to CBGRA. The results show that, given the same compilation budget, our mapping algorithm results in a better rate for compilation. The performance of this method will be increased from 5% to 14%, better than the standard CGRA mapping algorithms available.

Download Full-text

Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling

2003 Design, Automation and Test in Europe Conference and Exhibition ◽

10.1109/date.2003.1253623 ◽

2003 ◽

Cited By ~ 88

Author(s):

Bingfeng Mei ◽

S. Vernalde ◽

D. Verkest ◽

H. De Man ◽

R. Lauwereins

Keyword(s):

Coarse Grained ◽

Reconfigurable Architectures ◽

Modulo Scheduling ◽

Loop Level ◽

Level Parallelism

Download Full-text

P20 phosphor on polyimide as a large area high-resolution transmission screen for slow scan CCD systems

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100136477 ◽

1995 ◽

Vol 53 ◽

pp. 20-21

Author(s):

C. C. Ahn ◽

S. Karnes ◽

M. Lvovsky ◽

C. M. Garland ◽

H. A. Atwater ◽

...

Keyword(s):

Mechanical Stability ◽

High Energy ◽

Practical Implementation ◽

Line Pair ◽

Modulation Transfer ◽

Large Area ◽

Ccd Imaging ◽

Transmission Electron ◽

The One ◽

Beam Broadening

The bane of CCD imaging systems for transmission electron microscopy at intermediate and high voltages has been their relatively poor modulation transfer function (MTF), or line pair resolution. The problem originates primarily with the phosphor screen. On the one hand, screens should be thick so that as many incident electrons as possible are converted to photons, yielding a high detective quantum efficiency(DQE). The MTF diminishes as a function of scintillator thickness however, and to some extent as a function of fluorescence within the scintillator substrates. Fan has noted that the use of a thin layer of phosphor beneath a self supporting 2μ, thick Al substrate might provide the most appropriate compromise for high DQE and MTF in transmission electron microcscopes which operate at higher voltages. Monte Carlo simulations of high energy electron trajectories reveal that only little beam broadening occurs within this thickness of Al film. Consequently, the MTF is limited predominantly by broadening within the thin phosphor underlayer. There are difficulties however, in the practical implementation of this design, associated mostly with the mechanical stability of the Al support film.

Download Full-text

Reactions in Activated Peroxide Systems and their Influences on Bleaching Performance

Mini-Reviews in Organic Chemistry ◽

10.2174/1570193x17999201029191747 ◽

2020 ◽

Vol 17 ◽

Author(s):

Xiaoyan Wang ◽

Jinmei Du ◽

Changhai Xu

Keyword(s):

Aqueous Solution ◽

Hydrogen Peroxide ◽

Energy Efficiency ◽

Textile Industry ◽

High Energy ◽

Side Reactions ◽

Competitive Reactions ◽

High Energy Efficiency ◽

Hydrolysis Of ◽

Bleach Activators

Abstract:: Activated peroxide systems are formed by adding so-called bleach activators to aqueous solution of hydrogen peroxide, developed in the seventies of the last century for use in domestic laundry for their high energy efficiency and introduced at the beginning of the 21st century to the textile industry as an approach toward overcoming the extensive energy consumption in bleaching. In activated peroxide systems, bleach activators undergo perhydrolysis to generate more kinetically active peracids that enable bleaching under milder conditions while hydrolysis of bleach activators and decomposition of peracids may occur as side reactions to weaken the bleaching efficiency. This mini-review aims to summarize these competitive reactions in activated peroxide systems and their influence on bleaching performance.

Download Full-text

Diffuse γ-ray emission toward the massive star-forming region, W40

Astronomy and Astrophysics ◽

10.1051/0004-6361/202037580 ◽

2020 ◽

Vol 639 ◽

pp. A80

Author(s):

Xiao-Na Sun ◽

Rui-Zhi Yang ◽

Yun-Feng Liang ◽

Fang-Kun Peng ◽

Hai-Ming Zhang ◽

...

Keyword(s):

Cosmic Ray ◽

High Energy ◽

Young Star ◽

Gas Content ◽

Large Area ◽

Stellar Cluster ◽

Production Region ◽

Star Forming ◽

Photon Index ◽

Young Star Clusters

We report the detection of high-energy γ-ray signal towards the young star-forming region, W40. Using 10-yr Pass 8 data from the Fermi Large Area Telescope (Fermi-LAT), we extracted an extended γ-ray excess region with a significance of ~18σ. The radiation has a spectrum with a photon index of 2.49 ± 0.01. The spatial correlation with the ionized gas content favors the hadronic origin of the γ-ray emission. The total cosmic-ray (CR) proton energy in the γ-ray production region is estimated to be the order of 1047 erg. However, this could be a small fraction of the total energy released in cosmic rays (CRs) by local accelerators, presumably by massive stars, over the lifetime of the system. If so, W40, together with earlier detections of γ-rays from Cygnus cocoon, Westerlund 1, Westerlund 2, NGC 3603, and 30 Dor C, supports the hypothesis that young star clusters are effective CR factories. The unique aspect of this result is that the γ-ray emission is detected, for the first time, from a stellar cluster itself, rather than from the surrounding “cocoons”.

Download Full-text

High energy efficiency ventilation to limit COVID-19 contagion in school environments

Energy and Buildings ◽

10.1016/j.enbuild.2021.110882 ◽

2021 ◽

pp. 110882

Author(s):

Luigi Schibuola ◽

Chiara Tambani

Keyword(s):

Energy Efficiency ◽

High Energy ◽

School Environments ◽

High Energy Efficiency

Download Full-text

Degradation Investigation of Electrocatalyst in Proton Exchange Membrane Fuel Cell at a High Energy Efficiency

Molecules ◽

10.3390/molecules26133932 ◽

2021 ◽

Vol 26 (13) ◽

pp. 3932

Author(s):

Jie Song ◽

Qing Ye ◽

Kun Wang ◽

Zhiyuan Guo ◽

Meiling Dou

Keyword(s):

Energy Efficiency ◽

Single Cell ◽

Proton Exchange Membrane ◽

Cell Voltage ◽

High Energy ◽

Proton Exchange ◽

Pt Catalyst ◽

Operation Conditions ◽

High Efficient ◽

Exchange Membrane

The development of high efficient stacks is critical for the wide spread application of proton exchange membrane fuel cells (PEMFCs) in transportation and stationary power plant. Currently, the favorable operation conditions of PEMFCs are with single cell voltage between 0.65 and 0.7 V, corresponding to energy efficiency lower than 57%. For the long term, PEMFCs need to be operated at higher voltage to increase the energy efficiency and thus promote the fuel economy for transportation and stationary applications. Herein, PEMFC single cell was investigated to demonstrate its capability to working with voltage and energy efficiency higher than 0.8 V and 65%, respectively. It was demonstrated that the PEMFC encountered a significant performance degradation after the 64 h operation. The cell voltage declined by more than 13% at the current density of 1000 mA cm−2, due to the electrode de-activation. The high operation potential of the cathode leads to the corrosion of carbon support and then causes the detachment of Pt nanoparticles, resulting in significant Pt agglomeration. The catalytic surface area of cathode Pt is thus reduced for oxygen reduction and the cell performance decreased. Therefore, electrochemically stable Pt catalyst is highly desirable for efficient PEMFCs operated under cell voltage higher than 0.8 V.

Download Full-text

The Growth of Semiconductor Thin Films Studied by RHEED

Australian Journal of Physics ◽

10.1071/ph900583 ◽

1990 ◽

Vol 43 (5) ◽

pp. 583

Author(s):

GL Price

Keyword(s):

Thin Films ◽

Surface Science ◽

High Vacuum ◽

High Energy ◽

Ultra High Vacuum ◽

Large Area ◽

Growth Modes ◽

Semiconductor Thin Films ◽

Wide Range ◽

Recent Developments

Recent developments in the growth of semiconductor thin films are reviewed. The emphasis is on growth by molecular beam epitaxy (MBE). Results obtained by reflection high energy electron diffraction (RHEED) are employed to describe the different kinds of growth processes and the types of materials which can be constructed. MBE is routinely capable of heterostructure growth to atomic precision with a wide range of materials including III-V, IV, II-VI semiconductors, metals, ceramics such as high Tc materials and organics. As the growth proceeds in ultra high vacuum, MBE can take advantage of surface science techniques such as Auger, RHEED and SIMS. RHEED is the essential in-situ probe since the final crystal quality is strongly dependent on the surface reconstruction during growth. RHEED can also be used to calibrate the growth rate, monitor growth kinetics, and distinguish between various growth modes. A major new area is lattice mismatched growth where attempts are being made to construct heterostructures between materials of different lattice constants such as GaAs on Si. Also described are the new techniques of migration enhanced epitaxy and tilted superlattice growth. Finally some comments are given On the means of preparing large area, thin samples for analysis by other techniques from MBE grown films using capping, etching and liftoff.

Download Full-text

UltraSynth: Insights of a CGRA Integration into a Control Engineering Environment

Journal of Signal Processing Systems ◽

10.1007/s11265-021-01641-7 ◽

2021 ◽

Author(s):

Dennis Wolf ◽

Andreas Engel ◽

Tajas Ruschke ◽

Andreas Koch ◽

Christian Hochberger

Keyword(s):

Computing System ◽

Coarse Grained ◽

Instruction Level Parallelism ◽

Control Engineering ◽

Processing Elements ◽

Actual Application ◽

Reconfigurable Arrays ◽

Engineering Environment ◽

On Chip ◽

Level Parallelism

AbstractCoarse Grained Reconfigurable Arrays (CGRAs) or Architectures are a concept for hardware accelerators based on the idea of distributing workload over Processing Elements. These processors exploit instruction level parallelism, while being energy efficient due to their simplistic internal structure. However, the incorporation into a complete computing system raises severe challenges at the hardware and software level. This article evaluates a CGRA integrated into a control engineering environment targeting a Xilinx Zynq System on Chip (SoC) in detail. Besides the actual application execution performance, the practicability of the configuration toolchain is validated. Challenges of the real-world integration are discussed and practical insights are highlighted.

Download Full-text