efficient memory Latest Research Papers

Abstract Topology optimization has been successful in generating optimal topologies of various structures arising in real-world applications. Since these applications can have complex and large domains, topology optimization suffers from a high computational cost because of the use of unstructured meshes for discretization of these domains and their finite element analysis (FEA). This paper addresses this challenge by developing three GPU-based element-by-element strategies targeting unstructured all-hexahedral mesh for the matrix-free precondition conjugate gradient (PCG) finite element solver. These strategies mainly perform sparse matrix multiplication (SpMV) arising with the FEA solver by allocating more compute threads of GPU per element. Moreover, the strategies are developed to use shared memory of GPU for efficient memory transactions. The proposed strategies are tested with solid isotropic material with penalization (SIMP) method on four examples of 3D structural topology optimization. Results demonstrate that the proposed strategies achieve speedup up to 8.2× over the standard GPU-based SpMV strategies from the literature.

Download Full-text

Shaped Pruning for Efficient Memory Addressing in DNN Accelerators

10.1109/icce-asia53811.2021.9641949 ◽

2021 ◽

Author(s):

Yunhee Woo ◽

Dongyoung Kim ◽

Jaemin Jeong ◽

Jeong-Gun Lee

Keyword(s):

Efficient Memory

Download Full-text

An efficient memory data organization strategy for application-characteristic graph processing

Frontiers of Computer Science ◽

10.1007/s11704-020-0255-y ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Peng Fang ◽

Fang Wang ◽

Zhan Shi ◽

Dan Feng ◽

Qianxu Yi ◽

...

Keyword(s):

Graph Processing ◽

Data Organization ◽

Organization Strategy ◽

Efficient Memory

Download Full-text

XStore : Fast RDMA-Based Ordered Key-Value Store Using Remote Learned Cache

ACM Transactions on Storage ◽

10.1145/3468520 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-32

Author(s):

Xingda Wei ◽

Rong Chen ◽

Haibo Chen ◽

Binyu Zang

Keyword(s):

Memory Performance ◽

Direct Memory Access ◽

Speculative Execution ◽

Hybrid Architecture ◽

Partial Data ◽

Order Of Magnitude ◽

Multiple Round ◽

The Cost ◽

Client Side ◽

Efficient Memory

RDMA ( Remote Direct Memory Access ) has gained considerable interests in network-attached in-memory key-value stores. However, traversing the remote tree-based index in ordered key-value stores with RDMA becomes a critical obstacle, causing an order-of-magnitude slowdown and limited scalability due to multiple round trips. Using index cache with conventional wisdom—caching partial data and traversing them locally—usually leads to limited effect because of unavoidable capacity misses, massive random accesses, and costly cache invalidations. We argue that the machine learning (ML) model is a perfect cache structure for the tree-based index, termed learned cache . Based on it, we design and implement XStore , an RDMA-based ordered key-value store with a new hybrid architecture that retains a tree-based index at the server to perform dynamic workloads (e.g., inserts) and leverages a learned cache at the client to perform static workloads (e.g., gets and scans). The key idea is to decouple ML model retraining from index updating by maintaining a layer of indirection from logical to actual positions of key-value pairs. It allows a stale learned cache to continue predicting a correct position for a lookup key. XStore ensures correctness using a validation mechanism with a fallback path and further uses speculative execution to minimize the cost of cache misses. Evaluations with YCSB benchmarks and production workloads show that a single XStore server can achieve over 80 million read-only requests per second. This number outperforms state-of-the-art RDMA-based ordered key-value stores (namely, DrTM-Tree, Cell, and eRPC+Masstree) by up to 5.9× (from 3.7×). For workloads with inserts, XStore still provides up to 3.5× (from 2.7×) throughput speedup, achieving 53M reqs/s. The learned cache can also reduce client-side memory usage and further provides an efficient memory-performance tradeoff, e.g., saving 99% memory at the cost of 20% peak throughput.

Download Full-text

A Survey on Domain-Specific Memory Architectures

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v16i2.509 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-9

Author(s):

Stephanie Soldavini ◽

Christian Pilato

Keyword(s):

High Performance ◽

Current Trend ◽

Security And Privacy ◽

Graph Analytics ◽

Domain Specific ◽

Specific Memory ◽

Memory Modules ◽

Proper Design ◽

Efficient Memory ◽

Memory Architectures

The never-ending demand for high performance and energy efficiency is pushing designers towards an increasing level of heterogeneity and specialization in modern computing systems. In such systems, creating efficient memory architectures is one of the major opportunities for optimizing modern workloads (e.g., computer vision, machine learning, graph analytics, etc.) that are extremely data-driven. However, designers demand proper design methods to tackle the increasing design complexity and address several new challenges, like the security and privacy of the data to be elaborated.This paper overviews the current trend for the design of domain-specific memory architectures. Domain-specific architectures are tailored for the given application domain, with the introduction of hardware accelerators and custom memory modules while maintaining a certain level of flexibility. We describe the major components, the common challenges, and the state-of-the-art design methodologies for building domain-specific memory architectures. We also discuss the most relevant research projects, providing a classification based on our main topics.

Download Full-text

Stacked Two-Dimensional MXene Composites for an Energy-Efficient Memory and Digital Comparator

ACS Applied Materials & Interfaces ◽

10.1021/acsami.1c11014 ◽

2021 ◽

Author(s):

Liangchao Guo ◽

Boyuan Mu ◽

Ming-Zheng Li ◽

Baidong Yang ◽

Ruo-Si Chen ◽

...

Keyword(s):

Energy Efficient ◽

Two Dimensional ◽

Digital Comparator ◽

Efficient Memory

Download Full-text

Voltage control of magnetic skyrmions: energy efficient memory and neuromorphic computing

Spintronics XIV ◽

10.1117/12.2592586 ◽

2021 ◽

Author(s):

Jayasimha Atulasimha

Keyword(s):

Energy Efficient ◽

Voltage Control ◽

Neuromorphic Computing ◽

Magnetic Skyrmions ◽

Efficient Memory

Download Full-text

Efficient Memory-based Multiplier Technique for SWL DSP Systems

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36474 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 905-909

Author(s):

B Ajay Kumar

Keyword(s):

High Precision ◽

Symmetric Product ◽

Combinational Circuits ◽

Efficient Manner ◽

Storage And Retrieval ◽

Discrete Signals ◽

Different Types ◽

Dsp Systems ◽

Efficient Memory ◽

Combined Technique

The DSP systems usually deal with a lot of multiplications as it is dealt with many discrete signals. The combinational circuits consume a lot of power as there are many intermediate blocks (i.e., usually full adders & and gates). The combinational circuits take more area and the delay is also more. Usually there is a tradeoff between area and delay. To make the multiplier more efficient we usually prefer memory-based multiplier. Different types of techniques are there in memory-based multipliers like the APC (anti-symmetric product coding), OMS (odd multiple storage) etc. In these techniques LUT based storage is used. The multiplied products are stored efficiently based on the technique used to store the data. To optimize the memory required we combine the APC and OMS technique for better storage and retrieval of data. In this project we show how combined technique increases the performance of multiplier. The suggested combined technique reduces the size of the LUT to one-fourth that of a standard LUT. It is demonstrated that the proposed LUT architecture for tiny input sizes can be used to execute high-precision multiplication with input operand decomposition in an efficient manner.

Download Full-text

Design of an FPGA Hardware Optimizing the Performance and Power Consumption of a Plenoptic Camera Depth Estimation Algorithm

Algorithms ◽

10.3390/a14070215 ◽

2021 ◽

Vol 14 (7) ◽

pp. 215

Author(s):

Faraz Bhatti ◽

Thomas Greiner

Keyword(s):

Power Consumption ◽

Graphics Processing Unit ◽

Depth Estimation ◽

Estimation Algorithm ◽

General Purpose ◽

Processing Unit ◽

General Purpose Processor ◽

Plenoptic Camera ◽

Graphics Processing ◽

Efficient Memory

Plenoptic camera based system captures the light-field that can be exploited to estimate the 3D depth of the scene. This process generally consists of a significant number of recurrent operations, and thus requires high computation power. General purpose processor based system, due to its sequential architecture, consequently results in the problem of large execution time. A desktop graphics processing unit (GPU) can be employed to resolve this problem. However, it is an expensive solution with respect to power consumption and therefore cannot be used in mobile applications with low energy requirements. In this paper, we propose a modified plenoptic depth estimation algorithm that works on a single frame recorded by the camera and respective FPGA based hardware design. For this purpose, the algorithm is modified for parallelization and pipelining. In combination with efficient memory access, the results show good performance and lower power consumption compared to other systems.

Download Full-text

efficient memory
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

CuckoOnsai: An Efficient Memory Authentication Using Amalgam of Cuckoo Filters and Integrity Trees

GPU-based Element-by-Element Strategies for Accelerating Topology Optimization of 3D Continuum Structures Using Unstructured All-Hexahedral Mesh

Shaped Pruning for Efficient Memory Addressing in DNN Accelerators

An efficient memory data organization strategy for application-characteristic graph processing

XStore : Fast RDMA-Based Ordered Key-Value Store Using Remote Learned Cache

A Survey on Domain-Specific Memory Architectures

Stacked Two-Dimensional MXene Composites for an Energy-Efficient Memory and Digital Comparator

Voltage control of magnetic skyrmions: energy efficient memory and neuromorphic computing

Efficient Memory-based Multiplier Technique for SWL DSP Systems

Design of an FPGA Hardware Optimizing the Performance and Power Consumption of a Plenoptic Camera Depth Estimation Algorithm

Export Citation Format

efficient memoryRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

CuckoOnsai: An Efficient Memory Authentication Using Amalgam of Cuckoo Filters and Integrity Trees

GPU-based Element-by-Element Strategies for Accelerating Topology Optimization of 3D Continuum Structures Using Unstructured All-Hexahedral Mesh

Shaped Pruning for Efficient Memory Addressing in DNN Accelerators

An efficient memory data organization strategy for application-characteristic graph processing

XStore : Fast RDMA-Based Ordered Key-Value Store Using Remote Learned Cache

A Survey on Domain-Specific Memory Architectures

Stacked Two-Dimensional MXene Composites for an Energy-Efficient Memory and Digital Comparator

Voltage control of magnetic skyrmions: energy efficient memory and neuromorphic computing

Efficient Memory-based Multiplier Technique for SWL DSP Systems

Design of an FPGA Hardware Optimizing the Performance and Power Consumption of a Plenoptic Camera Depth Estimation Algorithm

efficient memory
Recently Published Documents