You Only Traverse Twice: A YOTT Placement, Routing, and Timing Approach for CGRAs

2021 ◽  
Vol 20 (5s) ◽  
pp. 1-25
Author(s):  
Michael Canesche ◽  
Westerley Carvalho ◽  
Lucas Reis ◽  
Matheus Oliveira ◽  
Salles Magalhães ◽  
...  

Coarse-grained reconfigurable architecture (CGRA) mapping involves three main steps: placement, routing, and timing. The mapping is an NP-complete problem, and a common strategy is to decouple this process into its independent steps. This work focuses on the placement step, and its aim is to propose a technique that is both reasonably fast and leads to high-performance solutions. Furthermore, a near-optimal placement simplifies the following routing and timing steps. Exact solutions cannot find placements in a reasonable execution time as input designs increase in size. Heuristic solutions include meta-heuristics, such as Simulated Annealing (SA) and fast and straightforward greedy heuristics based on graph traversal. However, as these approaches are probabilistic and have a large design space, it is not easy to provide both run-time efficiency and good solution quality. We propose a graph traversal heuristic that provides the best of both: high-quality placements similar to SA and the execution time of graph traversal approaches. Our placement introduces novel ideas based on “you only traverse twice” (YOTT) approach that performs a two-step graph traversal. The first traversal generates annotated data to guide the second step, which greedily performs the placement, node per node, aided by the annotated data and target architecture constraints. We introduce three new concepts to implement this technique: I/O and reconvergence annotation, degree matching, and look-ahead placement. Our analysis of this approach explores the placement execution time/quality trade-offs. We point out insights on how to analyze graph properties during dataflow mapping. Our results show that YOTT is 60.6 , 9.7 , and 2.3 faster than a high-quality SA, bounding box SA VPR, and multi-single traversal placements, respectively. Furthermore, YOTT reduces the average wire length and the maximal FIFO size (additional timing requirement on CGRAs) to avoid delay mismatches in fully pipelined architectures.

Author(s):  
Minjing Dong ◽  
Hanting Chen ◽  
Yunhe Wang ◽  
Chang Xu

Network pruning is widely applied to deep CNN models due to their heavy computation costs and achieves high performance by keeping important weights while removing the redundancy. Pruning redundant weights directly may hurt global information flow, which suggests that an efficient sparse network should take graph properties into account. Thus, instead of paying more attention to preserving important weight, we focus on the pruned architecture itself. We propose to use graph entropy as the measurement, which shows useful properties to craft high-quality neural graphs and enables us to propose efficient algorithm to construct them as the initial network architecture. Our algorithm can be easily implemented and deployed to different popular CNN models and achieve better trade-offs.


Author(s):  
Vianney Kengne Tchendji ◽  
Jean Frederic Myoupo ◽  
Gilles Dequen

In this paper, the authors highlight the existence of close relations between the execution time, efficiency and number of communication rounds in a family of CGM-based parallel algorithms for the optimal binary search tree problem (OBST). In this case, these three parameters cannot be simultaneously improved. The family of CGM (Coarse Grained Multicomputer) algorithms they derive is based on Knuth's sequential solution running in time and space, where n is the size of the problem. These CGM algorithms use p processors, each with local memory. In general, the authors show that each algorithms runs in with communications rounds. is the granularity of their model, and is a parameter that depends on and . The special case of yields a load-balanced CGM-based parallel algorithm with communication rounds and execution steps. Alternately, if , they obtain another algorithm with better execution time, say , the absence of any load-balancing and communication rounds, i.e., not better than the first algorithm. The authors show that the granularity has a crucial role in the different techniques they use to partition the problem to solve and study the impact of each scheduling algorithm. To the best of their knowledge, this is the first unified method to derive a set of parameter-dependent CGM-based parallel algorithms for the OBST problem.


Author(s):  
José M. Cadenas ◽  
Ma Carmen Garrido ◽  
Enrique Muñoz ◽  
Carlos Cruz-Corona ◽  
David A. Pelta ◽  
...  

Working on artificial intelligence, one of the tasks we can carry on is optimization of the possible solutions of a problem. Optimization problems appear. In optimization problems we search for the best solution, or one good enough, to a problem among a lot of alternatives. Problems we try to solve are usual in daily living. Every person constantly works out optimization problems, e.g. finding the quickest way from home to work taking into account traffic restrictions. Humans can find efficiently solutions to these problems because these are easy enough. Nevertheless, problems can be more complex, for example reducing fuel consumption of a fleet of plains. Computational algorithms are required to tackle this kind of problems. A first approach to solve them is using an exhaustive search. Theoretically, this method always finds the solution, but is not efficient as its execution time grows exponentially. In order to improve this method heuristics were proposed. Heuristics are intelligent techniques, methods or procedures that use expert knowledge to solve tasks; they try to obtain a high performance referring to solution quality and used resources. Metaheuristics, term first used by Fred Glover in 1986 (Glover, 1986), arise to improve heuristics, and can be defined as (Melián, Moreno & Moreno, 2003) ‘intelligent strategies for designing and improving very general heuristic procedures with a high performance’. Since Glover the field has been extensively developed. The current trend is designing new metaheuristics that improve the solution to given problems. However, another line, very interesting, is reuse existing metaheuristics in a coordinated system. In this article we present two different methods following this line.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Marco S. Nobile ◽  
Federico Fontana ◽  
Luca Manzoni ◽  
Paolo Cazzaniga ◽  
Giancarlo Mauri ◽  
...  

AbstractSelf-assembling processes are ubiquitous phenomena that drive the organization and the hierarchical formation of complex molecular systems. The investigation of assembling dynamics, emerging from the interactions among biomolecules like amino-acids and polypeptides, is fundamental to determine how a mixture of simple objects can yield a complex structure at the nano-scale level. In this paper we present HyperBeta, a novel open-source software that exploits an innovative algorithm based on hyper-graphs to efficiently identify and graphically represent the dynamics of $$\beta$$ β -sheets formation. Differently from the existing tools, HyperBeta directly manipulates data generated by means of coarse-grained molecular dynamics simulation tools (GROMACS), performed using the MARTINI force field. Coarse-grained molecular structures are visualized using HyperBeta ’s proprietary real-time high-quality 3D engine, which provides a plethora of analysis tools and statistical information, controlled by means of an intuitive event-based graphical user interface. The high-quality renderer relies on a variety of visual cues to improve the readability and interpretability of distance and depth relationships between peptides. We show that HyperBeta is able to track the $$\beta$$ β -sheets formation in coarse-grained molecular dynamics simulations, and provides a completely new and efficient mean for the investigation of the kinetics of these nano-structures. HyperBeta will therefore facilitate biotechnological and medical research where these structural elements play a crucial role, such as the development of novel high-performance biomaterials in tissue engineering, or a better comprehension of the molecular mechanisms at the basis of complex pathologies like Alzheimer’s disease.


Crystals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 235
Author(s):  
Shuqi Zhao ◽  
Tongtong Yu ◽  
Ziming Wang ◽  
Shilei Wang ◽  
Limei Wei ◽  
...  

Two-dimensional (2D) materials driven by their unique electronic and optoelectronic properties have opened up possibilities for their various applications. The large and high-quality single crystals are essential to fabricate high-performance 2D devices for practical applications. Herein, IV-V 2D GeP single crystals with high-quality and large size of 20 × 15 × 5 mm3 were successfully grown by the Bi flux growth method. The crystalline quality of GeP was confirmed by high-resolution X-ray diffraction (HRXRD), Laue diffraction, electron probe microanalysis (EPMA) and Raman spectroscopy. Additionally, intrinsic anisotropic optical properties were investigated by angle-resolved polarized Raman spectroscopy (ARPRS) and transmission spectra in detail. Furthermore, we fabricated high-performance photodetectors based on GeP, presenting a relatively large photocurrent over 3 mA. More generally, our results will significantly contribute the GeP crystal to the wide optoelectronic applications.


2021 ◽  
Author(s):  
Lixiang Han ◽  
Mengmeng Yang ◽  
Peiting Wen ◽  
Wei Gao ◽  
nengjie huo ◽  
...  

One dimensional (1D)-two dimensional (2D) van der Waals (vdWs) mixed-dimensional heterostructures with advantages of atomically sharp interface, high quality and good compatibility have attracted tremendous attention in recent years. The...


Author(s):  
Kersten Schuster ◽  
Philip Trettner ◽  
Leif Kobbelt

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.


Crystals ◽  
2021 ◽  
Vol 11 (3) ◽  
pp. 229
Author(s):  
Roberto Bergamaschini ◽  
Elisa Vitiello

The quest for high-performance and scalable devices required for next-generation semiconductor applications inevitably passes through the fabrication of high-quality materials and complex designs [...]


2015 ◽  
Vol 3 (38) ◽  
pp. 19294-19298 ◽  
Author(s):  
Xichang Bao ◽  
Qianqian Zhu ◽  
Meng Qiu ◽  
Ailing Yang ◽  
Yujin Wang ◽  
...  

High-quality CH3NH3PbI3 perovskite films were directly prepared on simple treated ITO glass in air under a relative humidity of lower than 30%.


Sign in / Sign up

Export Citation Format

Share Document