High-performance optimizations on tiled many-core embedded systems: a matrix multiplication case study

2013 ◽  
Vol 66 (1) ◽  
pp. 431-487 ◽  
Author(s):  
Arslan Munir ◽  
Farinaz Koushanfar ◽  
Ann Gordon-Ross ◽  
Sanjay Ranka
Author(s):  
Haoyuan Ying ◽  
Klaus Hofmann ◽  
Thomas Hollstein

Due to the growing demand on high performance and low power in embedded systems, many core architectures are proposed the most suitable solutions. While the design concentration of many core embedded systems is switching from computation-centric to communication-centric, Network-on-Chip (NoC) is one of the best interconnect techniques for such architectures because of the scalability and high communication bandwidth. Formalized and optimized system-level design methods for NoC-based many core embedded systems are desired to improve the system performance and to reduce the power consumption. In order to understand the design optimization methods in depth, a case study of optimizing many core embedded systems based on 3-Dimensional (3D) NoC with irregular vertical link distribution topology through task mapping, core placement, routing, and topology generation is demonstrated in this chapter. Results of cycle-accurate simulation experiments prove the validity and efficiency of the design methods. Specific to the case study configuration, in maximum 60% vertical links can be saved while maintaining the system efficiency in comparison to full vertical link connection 3D NoCs by applying the design optimization methods.


2011 ◽  
Vol 21 (02) ◽  
pp. 245-272 ◽  
Author(s):  
DUANE MERRILL ◽  
ANDREW GRIMSHAW

The need to rank and order data is pervasive, and many algorithms are fundamentally dependent upon sorting and partitioning operations. Prior to this work, GPU stream processors have been perceived as challenging targets for problems with dynamic and global data-dependences such as sorting. This paper presents: (1) a family of very efficient parallel algorithms for radix sorting; and (2) our allocation-oriented algorithmic design strategies that match the strengths of GPU processor architecture to this genre of dynamic parallelism. We demonstrate multiple factors of speedup (up to 3.8x) compared to state-of-the-art GPU sorting. We also reverse the performance differentials observed between GPU and multi/many-core CPU architectures by recent comparisons in the literature, including those with 32-core CPU-based accelerators. Our average sorting rates exceed 1B 32-bit keys/sec on a single GPU microprocessor. Our sorting passes are constructed from a very efficient parallel prefix scan "runtime" that incorporates three design features: (1) kernel fusion for locally generating and consuming prefix scan data; (2) multi-scan for performing multiple related, concurrent prefix scans (one for each partitioning bin); and (3) flexible algorithm serialization for avoiding unnecessary synchronization and communication within algorithmic phases, allowing us to construct a single implementation that scales well across all generations and configurations of programmable NVIDIA GPUs.


Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 120
Author(s):  
Augusto Y. Horita ◽  
Denis S. Loubach ◽  
Ricardo Bonna

Sophisticated and high performance embedded systems are present in an increasing number of application domains. In this context, formal-based design methods have been studied to make the development process robust and scalable. Models of computation (MoC) allows the modeling of an application at a high abstraction level by using a formal base. This enables analysis before the application moves to the implementation phase. Different tools and frameworks supporting MoCs have been developed. Some of them can simulate the models and also verify their functionality and feasibility before the next design steps. In view of this, we present a novel method for analysis and identification of possible automation approaches applicable to embedded systems design flow supported by formal models of computation. A comprehensive case study shows the potential and applicability of our method.


Electronics ◽  
2018 ◽  
Vol 7 (11) ◽  
pp. 312 ◽  
Author(s):  
Vanessa Vargas ◽  
Pablo Ramos ◽  
Raoul Velazco

Currently, there is a special interest in validating the use of Commercial-Off-The-Shelf (COTS) multi/many-core processors for critical applications thanks to their high performance, low power consumption and affordability. However, the continuous shrinking of transistor geometry and the increasing complexity of these devices dramatically affect their sensitivity to natural radiation, and thus diminish their reliability. One of the most common effects produced by natural radiation is the Single Event Upset which is the bit-flip of a memory content producing unexpected results at application-level. For this reason, manufacturers and users implement hardware and software error-mitigation techniques on multi/many-core processors. In this context, the present work aims at evaluating a new fault-tolerance approach based on N-Modular redundancy (NMR) and partitioning called NMR-MPar by means of 14 MeV neutron radiation ground testing in order to emulate the effects of high-energy neutrons present at avionics altitudes. For evaluation purposes, a case-study is implemented on the 28 nm CMOS KALRAY MPPA-256 many-core processor running two complementary benchmarks applications: a distributed Matrix Multiplication and the Travel Salesman Problem. Radiation experiments were conducted in GENEPI2 particle-accelerator. The correctness of the results of the application when an error is detected confirms the approach’s effectiveness and boosts their usage on avionics applications.


Author(s):  
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.


2019 ◽  
pp. 123-130

The scientific research works concerning the field of mechanical engineering such as, manufacturing machine slate, soil tillage, sowing and harvesting based on the requirements for the implementation of agrotechnical measures for the cultivation of plants in its transportation, through the development of mastering new types of high-performance and energy-saving machines in manufacturing machine slate, creation of multifunctional machines, allowing simultaneous soil cultivation, by means of several planting operations, integration of agricultural machine designs are taken into account in manufacturing of the local universal tractor designed basing on high ergonomic indicators. For this reason, this article explores the use of case studies in teaching agricultural terminology by means analyzing the researches in machine building. Case study method was firstly used in 1870 in Harvard University of Law School in the United States. Also in the article, we give the examples of agricultural machine-building terms, teaching terminology and case methods, case study process and case studies method itself. The research works in the field of mechanical engineering and the use of case studies in teaching terminology have also been analyzed. In addition, the requirements for the development of case study tasks are given in their practical didactic nature. We also give case study models that allow us analyzing and evaluating students' activities.


Energies ◽  
2021 ◽  
Vol 14 (12) ◽  
pp. 3716
Author(s):  
Francesco Causone ◽  
Rossano Scoccia ◽  
Martina Pelle ◽  
Paola Colombo ◽  
Mario Motta ◽  
...  

Cities and nations worldwide are pledging to energy and carbon neutral objectives that imply a huge contribution from buildings. High-performance targets, either zero energy or zero carbon, are typically difficult to be reached by single buildings, but groups of properly-managed buildings might reach these ambitious goals. For this purpose we need tools and experiences to model, monitor, manage and optimize buildings and their neighborhood-level systems. The paper describes the activities pursued for the deployment of an advanced energy management system for a multi-carrier energy grid of an existing neighborhood in the area of Milan. The activities included: (i) development of a detailed monitoring plan, (ii) deployment of the monitoring plan, (iii) development of a virtual model of the neighborhood and simulation of the energy performance. Comparisons against early-stage energy monitoring data proved promising and the generation system showed high efficiency (EER equal to 5.84), to be further exploited.


2021 ◽  
Vol 36 (1) ◽  
pp. 33-43
Author(s):  
Jian-Bin Fang ◽  
Xiang-Ke Liao ◽  
Chun Huang ◽  
De-Zun Dong

Sign in / Sign up

Export Citation Format

Share Document