Study on Explicit Memory Management for CBEA Green Computing Architecture

2011 ◽  
Vol 374-377 ◽  
pp. 2078-2081
Author(s):  
Guo Fu Feng ◽  
Ming Wang ◽  
Ming Chen ◽  
Tao Chi

Heterogeneous multi-core processors are attractive for power efficient green computing because of their ability to meet varied resource requirements. The multi-level memory hierarchy of Cell Broadband Engine Architecture (CBEA) which requires explicit management by software poses significant challenges to performance increasing and programming. In this paper, with analysis of characteristic of the architecture, we implemented four access methods and a corresponding access library with a uniform memory access interface. Besides getting performance boosts beyond current level technology, the memory access library with uniform access interface could collect profile information of memory management for further performance optimization. Experimental results show the performance of proposed method is better than related works and profile information provided by the method is helpful for programmer to optimize application performance.

2014 ◽  
Vol 22 (2) ◽  
pp. 75-91 ◽  
Author(s):  
Robert Gerstenberger ◽  
Maciej Besta ◽  
Torsten Hoefler

Modern interconnects offer remote direct memory access (RDMA) features. Yet, most applications rely on explicit message passing for communications albeit their unwanted overheads. The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly, however, it's scalability and practicability has to be demonstrated in practice. In this work, we develop scalable bufferless protocols that implement the MPI-3.0 specification. Our protocols support scaling to millions of cores with negligible memory consumption while providing highest performance and minimal overheads. To arm programmers, we provide a spectrum of performance models for all critical functions and demonstrate the usability of our library and models with several application studies with up to half a million processes. We show that our design is comparable to, or better than UPC and Fortran Coarrays in terms of latency, bandwidth and message rate. We also demonstrate application performance improvements with comparable programming complexity.


Author(s):  
Aleix Roca Nonell ◽  
Balazs Gerofi ◽  
Leonardo Bautista-Gomez ◽  
Dominique Martinet ◽  
Vicenç Beltran Querol ◽  
...  

Author(s):  
Eduardo H. M. Cruz ◽  
Matthias Diener ◽  
Laércio L. Pilla ◽  
Philippe O. A. Navaux

Current and future architectures rely on thread-level parallelism to sustain performance growth. These architectures have introduced a complex memory hierarchy, consisting of several cores organized hierarchically with multiple cache levels and NUMA nodes. These memory hierarchies can have an impact on the performance and energy efficiency of parallel applications as the importance of memory access locality is increased. In order to improve locality, the analysis of the memory access behavior of parallel applications is critical for mapping threads and data. Nevertheless, most previous work relies on indirect information about the memory accesses, or does not combine thread and data mapping, resulting in less accurate mappings. In this paper, we propose the Sharing-Aware Memory Management Unit (SAMMU), an extension to the memory management unit that allows it to detect the memory access behavior in hardware. With this information, the operating system can perform online mapping without any previous knowledge about the behavior of the application. In the evaluation with a wide range of parallel applications (NAS Parallel Benchmarks and PARSEC Benchmark Suite), performance was improved by up to 35.7% (10.0% on average) and energy efficiency was improved by up to 11.9% (4.1% on average). These improvements happened due to a substantial reduction of cache misses and interconnection traffic.


2014 ◽  
Vol 651-653 ◽  
pp. 3-6
Author(s):  
Xiao Rui Guan ◽  
Da Lei Zhang ◽  
You Hai Jin

The application performances of TA2 titanium\S44660 super ferrite stainless steel\B30 cupronickel, which are widely used in power plant, were researched using electrochemical test and mechanical test. The results show that corrosion resistance of B30 is significantly lower than TA2 and S44660. Besides, corrosion resistance of S44660 is superior to TA2. Yield strength and tensile strength of S44660 is higher than TA2 and B30. When considering thickness of cooling tubes, flow rate of cooling water and clean coefficient, the thermal conductivity of three materials have little differences. The shock resistance of S44660 is better than TA2 and B30. S44660 contains a small amount of Ni, which improves greatly the anti-cracking ability of the base metal and welding bead.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Lucio Marinelli ◽  
Carlo Trompetto ◽  
Stefania Canneva ◽  
Laura Mori ◽  
Flavio Nobili ◽  
...  

Learning new information is crucial in daily activities and occurs continuously during a subject’s lifetime. Retention of learned material is required for later recall and reuse, although learning capacity is limited and interference between consecutively learned information may occur. Learning processes are impaired in Parkinson’s disease (PD); however, little is known about the processes related to retention and interference. The aim of this study is to investigate the retention and anterograde interference using a declarative sequence learning task in drug-naive patients in the disease’s early stages. Eleven patients with PD and eleven age-matched controls learned a visuomotor sequence, SEQ1, during Day1; the following day, retention of SEQ1 was assessed and, immediately after, a new sequence of comparable complexity, SEQ2, was learned. The comparison of the learning rates of SEQ1 on Day1 and SEQ2 on Day2 assessed the anterograde interference of SEQ1 on SEQ2. We found that SEQ1 performance improved in both patients and controls on Day2. Surprisingly, controls learned SEQ2 better than SEQ1, suggesting the absence of anterograde interference and the occurrence of learning optimization, a process that we defined as “learning how to learn.” Patients with PD lacked such improvement, suggesting defective performance optimization processes.


2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-26
Author(s):  
Arjun Pitchanathan ◽  
Christian Ulmann ◽  
Michel Weber ◽  
Torsten Hoefler ◽  
Tobias Grosser

Presburger arithmetic provides the mathematical core for the polyhedral compilation techniques that drive analytical cache models, loop optimization for ML and HPC, formal verification, and even hardware design. Polyhedral compilation is widely regarded as being slow due to the potentially high computational cost of the underlying Presburger libraries. Researchers typically use these libraries as powerful black-box tools, but the perceived internal complexity of these libraries, caused by the use of C as the implementation language and a focus on end-user-facing documentation, holds back broader performance-optimization efforts. With FPL, we introduce a new library for Presburger arithmetic built from the ground up in modern C++. We carefully document its internal algorithmic foundations, use lightweight C++ data structures to minimize memory management costs, and deploy transprecision computing across the entire library to effectively exploit machine integers and vector instructions. On a newly-developed comprehensive benchmark suite for Presburger arithmetic, we show a 5.4x speedup in total runtime over the state-of-the-art library isl in its default configuration and 3.6x over a variant of isl optimized with element-wise transprecision computing. We expect that the availability of a well-documented and fast Presburger library will accelerate the adoption of polyhedral compilation techniques in production compilers.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Fei Cao ◽  
Michael Z. Q. Chen ◽  
Yinlong Hu

In this paper, the seismic base isolation problem for all low-complexity networks containing one inerter, one spring, and one damper is studied based on a multi-degree-of-freedom model. The analytical solutions for the H2 performance optimization are derived, and the traditional tuned mass damper (TMD) is employed for comparison. Extensive numerical simulations are performed to verify the effectiveness of the obtained results. The results show that for different seismic wave excitations, some isolators are better than TMD in controlling the displacement of the main structure. Moreover, with the increase of the TMD mass ratio, the isolation performances of the inerter-based isolators are increasingly better than that of TMD.


Sign in / Sign up

Export Citation Format

Share Document