You Only Traverse Twice: A YOTT Placement, Routing, and Timing Approach for CGRAs

Coarse-grained reconfigurable architecture (CGRA) mapping involves three main steps: placement, routing, and timing. The mapping is an NP-complete problem, and a common strategy is to decouple this process into its independent steps. This work focuses on the placement step, and its aim is to propose a technique that is both reasonably fast and leads to high-performance solutions. Furthermore, a near-optimal placement simplifies the following routing and timing steps. Exact solutions cannot find placements in a reasonable execution time as input designs increase in size. Heuristic solutions include meta-heuristics, such as Simulated Annealing (SA) and fast and straightforward greedy heuristics based on graph traversal. However, as these approaches are probabilistic and have a large design space, it is not easy to provide both run-time efficiency and good solution quality. We propose a graph traversal heuristic that provides the best of both: high-quality placements similar to SA and the execution time of graph traversal approaches. Our placement introduces novel ideas based on “you only traverse twice” (YOTT) approach that performs a two-step graph traversal. The first traversal generates annotated data to guide the second step, which greedily performs the placement, node per node, aided by the annotated data and target architecture constraints. We introduce three new concepts to implement this technique: I/O and reconvergence annotation, degree matching, and look-ahead placement. Our analysis of this approach explores the placement execution time/quality trade-offs. We point out insights on how to analyze graph properties during dataflow mapping. Our results show that YOTT is 60.6 , 9.7 , and 2.3 faster than a high-quality SA, bounding box SA VPR, and multi-single traversal placements, respectively. Furthermore, YOTT reduces the average wire length and the maximal FIFO size (additional timing requirement on CGRAs) to avoid delay mismatches in fully pipelined architectures.

Download Full-text

Crafting Efficient Neural Graph of Large Entropy

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/311 ◽

2019 ◽

Author(s):

Minjing Dong ◽

Hanting Chen ◽

Yunhe Wang ◽

Chang Xu

Keyword(s):

Information Flow ◽

Network Architecture ◽

High Performance ◽

Global Information ◽

High Quality ◽

Network Pruning ◽

Initial Network ◽

Trade Offs ◽

Graph Properties ◽

Deep Cnn

Network pruning is widely applied to deep CNN models due to their heavy computation costs and achieves high performance by keeping important weights while removing the redundancy. Pruning redundant weights directly may hurt global information flow, which suggests that an efficient sparse network should take graph properties into account. Thus, instead of paying more attention to preserving important weight, we focus on the pruned architecture itself. We propose to use graph entropy as the measurement, which shows useful properties to craft high-quality neural graphs and enables us to propose efficient algorithm to construct them as the initial network architecture. Our algorithm can be easily implemented and deployed to different popular CNN models and achieve better trade-offs.

Download Full-text

High Performance CGM-based Parallel Algorithms for the Optimal Binary Search Tree Problem

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2016100104 ◽

2016 ◽

Vol 8 (4) ◽

pp. 55-77 ◽

Cited By ~ 2

Author(s):

Vianney Kengne Tchendji ◽

Jean Frederic Myoupo ◽

Gilles Dequen

Keyword(s):

Parallel Algorithms ◽

Execution Time ◽

High Performance ◽

Scheduling Algorithm ◽

Search Tree ◽

Binary Search ◽

Coarse Grained ◽

Binary Search Tree ◽

Parameter Dependent ◽

The Impact

In this paper, the authors highlight the existence of close relations between the execution time, efficiency and number of communication rounds in a family of CGM-based parallel algorithms for the optimal binary search tree problem (OBST). In this case, these three parameters cannot be simultaneously improved. The family of CGM (Coarse Grained Multicomputer) algorithms they derive is based on Knuth's sequential solution running in time and space, where n is the size of the problem. These CGM algorithms use p processors, each with local memory. In general, the authors show that each algorithms runs in with communications rounds. is the granularity of their model, and is a parameter that depends on and . The special case of yields a load-balanced CGM-based parallel algorithm with communication rounds and execution steps. Alternately, if , they obtain another algorithm with better execution time, say , the absence of any load-balancing and communication rounds, i.e., not better than the first algorithm. The authors show that the granularity has a crucial role in the different techniques they use to partition the problem to solve and study the impact of each scheduling algorithm. To the best of their knowledge, this is the first unified method to derive a set of parameter-dependent CGM-based parallel algorithms for the OBST problem.

Download Full-text

Different Approaches for Cooperation with Metaheuristics

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch073 ◽

2011 ◽

pp. 480-487

Author(s):

José M. Cadenas ◽

Ma Carmen Garrido ◽

Enrique Muñoz ◽

Carlos Cruz-Corona ◽

David A. Pelta ◽

...

Keyword(s):

Artificial Intelligence ◽

Fuel Consumption ◽

Execution Time ◽

High Performance ◽

Expert Knowledge ◽

Optimization Problems ◽

Current Trend ◽

Computational Algorithms ◽

Solution Quality ◽

Heuristic Procedures

Working on artificial intelligence, one of the tasks we can carry on is optimization of the possible solutions of a problem. Optimization problems appear. In optimization problems we search for the best solution, or one good enough, to a problem among a lot of alternatives. Problems we try to solve are usual in daily living. Every person constantly works out optimization problems, e.g. finding the quickest way from home to work taking into account traffic restrictions. Humans can find efficiently solutions to these problems because these are easy enough. Nevertheless, problems can be more complex, for example reducing fuel consumption of a fleet of plains. Computational algorithms are required to tackle this kind of problems. A first approach to solve them is using an exhaustive search. Theoretically, this method always finds the solution, but is not efficient as its execution time grows exponentially. In order to improve this method heuristics were proposed. Heuristics are intelligent techniques, methods or procedures that use expert knowledge to solve tasks; they try to obtain a high performance referring to solution quality and used resources. Metaheuristics, term first used by Fred Glover in 1986 (Glover, 1986), arise to improve heuristics, and can be defined as (Melián, Moreno & Moreno, 2003) ‘intelligent strategies for designing and improving very general heuristic procedures with a high performance’. Since Glover the field has been extensively developed. The current trend is designing new metaheuristics that improve the solution to given problems. However, another line, very interesting, is reuse existing metaheuristics in a coordinated system. In this article we present two different methods following this line.

Download Full-text

HyperBeta: characterizing the structural dynamics of proteins and self-assembling peptides

Scientific Reports ◽

10.1038/s41598-021-87087-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Marco S. Nobile ◽

Federico Fontana ◽

Luca Manzoni ◽

Paolo Cazzaniga ◽

Giancarlo Mauri ◽

...

Keyword(s):

Molecular Dynamics ◽

High Performance ◽

Molecular Mechanisms ◽

Complex Structure ◽

Molecular Structures ◽

Dynamics Simulation ◽

Coarse Grained ◽

High Quality ◽

Self Assembling ◽

Coarse Grained Molecular Dynamics

AbstractSelf-assembling processes are ubiquitous phenomena that drive the organization and the hierarchical formation of complex molecular systems. The investigation of assembling dynamics, emerging from the interactions among biomolecules like amino-acids and polypeptides, is fundamental to determine how a mixture of simple objects can yield a complex structure at the nano-scale level. In this paper we present HyperBeta, a novel open-source software that exploits an innovative algorithm based on hyper-graphs to efficiently identify and graphically represent the dynamics of $$\beta$$ β -sheets formation. Differently from the existing tools, HyperBeta directly manipulates data generated by means of coarse-grained molecular dynamics simulation tools (GROMACS), performed using the MARTINI force field. Coarse-grained molecular structures are visualized using HyperBeta ’s proprietary real-time high-quality 3D engine, which provides a plethora of analysis tools and statistical information, controlled by means of an intuitive event-based graphical user interface. The high-quality renderer relies on a variety of visual cues to improve the readability and interpretability of distance and depth relationships between peptides. We show that HyperBeta is able to track the $$\beta$$ β -sheets formation in coarse-grained molecular dynamics simulations, and provides a completely new and efficient mean for the investigation of the kinetics of these nano-structures. HyperBeta will therefore facilitate biotechnological and medical research where these structural elements play a crucial role, such as the development of novel high-performance biomaterials in tissue engineering, or a better comprehension of the molecular mechanisms at the basis of complex pathologies like Alzheimer’s disease.

Download Full-text

Flux Method Growth of Large Size Group IV–V 2D GeP Single Crystals and Photoresponse Application

Crystals ◽

10.3390/cryst11030235 ◽

2021 ◽

Vol 11 (3) ◽

pp. 235

Author(s):

Shuqi Zhao ◽

Tongtong Yu ◽

Ziming Wang ◽

Shilei Wang ◽

Limei Wei ◽

...

Keyword(s):

Raman Spectroscopy ◽

Single Crystals ◽

High Performance ◽

Group Iv ◽

X Ray Diffraction ◽

High Quality ◽

Practical Applications ◽

Large Size ◽

Growth Method ◽

Polarized Raman

Two-dimensional (2D) materials driven by their unique electronic and optoelectronic properties have opened up possibilities for their various applications. The large and high-quality single crystals are essential to fabricate high-performance 2D devices for practical applications. Herein, IV-V 2D GeP single crystals with high-quality and large size of 20 × 15 × 5 mm3 were successfully grown by the Bi flux growth method. The crystalline quality of GeP was confirmed by high-resolution X-ray diffraction (HRXRD), Laue diffraction, electron probe microanalysis (EPMA) and Raman spectroscopy. Additionally, intrinsic anisotropic optical properties were investigated by angle-resolved polarized Raman spectroscopy (ARPRS) and transmission spectra in detail. Furthermore, we fabricated high-performance photodetectors based on GeP, presenting a relatively large photocurrent over 3 mA. More generally, our results will significantly contribute the GeP crystal to the wide optoelectronic applications.

Download Full-text

High performance self-powered photodetector based on 1D Te-2D WS2 mixed-dimensional heterostructure

Nanoscale Advances ◽

10.1039/d1na00073j ◽

2021 ◽

Author(s):

Lixiang Han ◽

Mengmeng Yang ◽

Peiting Wen ◽

Wei Gao ◽

nengjie huo ◽

...

Keyword(s):

High Performance ◽

Van Der Waals ◽

Sharp Interface ◽

Two Dimensional ◽

High Quality ◽

One Dimensional ◽

Self Powered

One dimensional (1D)-two dimensional (2D) van der Waals (vdWs) mixed-dimensional heterostructures with advantages of atomically sharp interface, high quality and good compatibility have attracted tremendous attention in recent years. The...

Download Full-text

High-Performance Image Filters via Sparse Approximations

Proceedings of the ACM on Computer Graphics and Interactive Techniques ◽

10.1145/3406182 ◽

2020 ◽

Vol 3 (2) ◽

pp. 1-19

Author(s):

Kersten Schuster ◽

Philip Trettner ◽

Leif Kobbelt

Keyword(s):

High Performance ◽

Hardware Acceleration ◽

Optimization Method ◽

Translation Invariant ◽

Approximation Quality ◽

Trade Offs ◽

Sparse Approximations ◽

Image Filters ◽

Good Trade ◽

And Performance

We present a numerical optimization method to find highly efficient (sparse) approximations for convolutional image filters. Using a modified parallel tempering approach, we solve a constrained optimization that maximizes approximation quality while strictly staying within a user-prescribed performance budget. The results are multi-pass filters where each pass computes a weighted sum of bilinearly interpolated sparse image samples, exploiting hardware acceleration on the GPU. We systematically decompose the target filter into a series of sparse convolutions, trying to find good trade-offs between approximation quality and performance. Since our sparse filters are linear and translation-invariant, they do not exhibit the aliasing and temporal coherence issues that often appear in filters working on image pyramids. We show several applications, ranging from simple Gaussian or box blurs to the emulation of sophisticated Bokeh effects with user-provided masks. Our filters achieve high performance as well as high quality, often providing significant speed-up at acceptable quality even for separable filters. The optimized filters can be baked into shaders and used as a drop-in replacement for filtering tasks in image processing or rendering pipelines.

Download Full-text

Semiconductor Heteroepitaxy

Crystals ◽

10.3390/cryst11030229 ◽

2021 ◽

Vol 11 (3) ◽

pp. 229

Author(s):

Roberto Bergamaschini ◽

Elisa Vitiello

Keyword(s):

High Performance ◽

Next Generation ◽

High Quality

The quest for high-performance and scalable devices required for next-generation semiconductor applications inevitably passes through the fabrication of high-quality materials and complex designs [...]

Download Full-text