memory system Latest Research Papers

GPU Domain Specialization via Composable On-Package Architecture

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3484505 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-23

Author(s):

Yaosheng Fu ◽

Evgeny Bolotin ◽

Niladrish Chatterjee ◽

David Nellans ◽

Stephen W. Keckler

Keyword(s):

Deep Learning ◽

Memory System ◽

Design Reuse ◽

Application Domain ◽

Precision Matrix ◽

Practical Solution ◽

Optimal Configurations ◽

Gpu Architecture ◽

With Memory ◽

Cache Capacity

As GPUs scale their low-precision matrix math throughput to boost deep learning (DL) performance, they upset the balance between math throughput and memory system capabilities. We demonstrate that a converged GPU design trying to address diverging architectural requirements between FP32 (or larger)-based HPC and FP16 (or smaller)-based DL workloads results in sub-optimal configurations for either of the application domains. We argue that a C omposable O n- PA ckage GPU (COPA-GPU) architecture to provide domain-specialized GPU products is the most practical solution to these diverging requirements. A COPA-GPU leverages multi-chip-module disaggregation to support maximal design reuse, along with memory system specialization per application domain. We show how a COPA-GPU enables DL-specialized products by modular augmentation of the baseline GPU architecture with up to 4× higher off-die bandwidth, 32× larger on-package cache, and 2.3× higher DRAM bandwidth and capacity, while conveniently supporting scaled-down HPC-oriented designs. This work explores the microarchitectural design necessary to enable composable GPUs and evaluates the benefits composability can provide to HPC, DL training, and DL inference. We show that when compared to a converged GPU design, a DL-optimized COPA-GPU featuring a combination of 16× larger cache capacity and 1.6× higher DRAM bandwidth scales per-GPU training and inference performance by 31% and 35%, respectively, and reduces the number of GPU instances by 50% in scale-out training scenarios.

Software Hint-Driven Data Management for Hybrid Memory in Mobile Systems

ACM Transactions on Embedded Computing Systems ◽

10.1145/3494536 ◽

2022 ◽

Vol 21 (1) ◽

pp. 1-18

Author(s):

Fei Wen ◽

Mian Qin ◽

Paul Gratz ◽

Narasimha Reddy

Keyword(s):

Time Window ◽

Memory Systems ◽

Memory Device ◽

Memory System ◽

Mobile Systems ◽

Data Migration ◽

Time Data ◽

Cooperative Approach ◽

Hybrid Memory ◽

Data Objects

Hybrid memory systems, comprised of emerging non-volatile memory (NVM) and DRAM, have been proposed to address the growing memory demand of current mobile applications. Recently emerging NVM technologies, such as phase-change memories (PCM), memristor, and 3D XPoint, have higher capacity density, minimal static power consumption and lower cost per GB. However, NVM has longer access latency and limited write endurance as opposed to DRAM. The different characteristics of distinct memory classes render a new challenge for memory system design. Ideally, pages should be placed or migrated between the two types of memories according to the data objects’ access properties. Prior system software approaches exploit the program information from OS but at the cost of high software latency incurred by related kernel processes. Hardware approaches can avoid these latencies, however, hardware’s vision is constrained to a short time window of recent memory requests, due to the limited on-chip resources. In this work, we propose OpenMem: a hardware-software cooperative approach that combines the execution time advantages of pure hardware approaches with the data object properties in a global scope. First, we built a hardware-based memory manager unit (HMMU) that can learn the short-term access patterns by online profiling, and execute data migration efficiently. Then, we built a heap memory manager for the heterogeneous memory systems that allows the programmer to directly customize each data object’s allocation to a favorable memory device within the presumed object life cycle. With the programmer’s hints guiding the data placement at allocation time, data objects with similar properties will be congregated to reduce unnecessary page migrations. We implemented the whole system on the FPGA board with embedded ARM processors. In testing under a set of benchmark applications from SPEC 2017 and PARSEC, experimental results show that OpenMem reduces 44.6% energy consumption with only a 16% performance degradation compared to the all-DRAM memory system. The amount of writes to the NVM is reduced by 14% versus the HMMU-only, extending the NVM device lifetime.

Molecular Docking for Ligand-Receptor Binding Process Based on Heterogeneous Computing

Scientific Programming ◽

10.1155/2022/9197606 ◽

2022 ◽

Vol 2022 ◽

pp. 1-13

Author(s):

Jianhua Li ◽

Guanlong Liu ◽

Zhiyuan Zhen ◽

Zihao Shen ◽

Shiliang Li ◽

...

Keyword(s):

Molecular Docking ◽

Receptor Binding ◽

Distributed Memory ◽

Memory System ◽

Parallel Program ◽

Coarse Grained ◽

Flexible Docking ◽

Binding Process ◽

Parallel Scheme ◽

Computationally Intensive

Molecular docking aims to predict possible drug candidates for many diseases, and it is computationally intensive. Particularly, in simulating the ligand-receptor binding process, the binding pocket of the receptor is divided into subcubes, and when the ligand is docked into all cubes, there are many molecular docking tasks, which are extremely time-consuming. In this study, we propose a heterogeneous parallel scheme of molecular docking for the binding process of ligand to receptor to accelerate simulating. The parallel scheme includes two layers of parallelism, a coarse-grained layer of parallelism implemented in the message-passing interface (MPI) and a fine-grained layer of parallelism focused on the graphics processing unit (GPU). At the coarse-grain layer of parallelism, a docking task inside one lattice is assigned to one unique MPI process, and a grouped master-slave mode is used to allocate and schedule the tasks. Meanwhile, at the fine-gained layer of parallelism, GPU accelerators undertake the computationally intensive computing of scoring functions and related conformation spatial transformations in a single docking task. The results of the experiments for the ligand-receptor binding process show that on a multicore server with GPUs the parallel program has achieved a speedup ratio as high as 45 times in flexible docking and as high as 54.5 times in semiflexible docking, and on a distributed memory system, the docking time for flexible docking and that for semiflexible docking gradually decrease as the number of nodes used in the parallel program gradually increases. The scalability of the parallel program is also verified in multiple nodes on a distributed memory system and is approximately linear.

Construction of Polymer Materials with Specific Responses to Violet and Green Lights and Their Potential Applications in Artificial Visual Memory System

Journal of Materials Chemistry C ◽

10.1039/d1tc04803a ◽

2022 ◽

Author(s):

Guan Wang ◽

Hua Li ◽

Qijian Zhang ◽

Fengjuan Zhu ◽

Junwei Yuan ◽

...

Keyword(s):

Visible Light ◽

Visual Memory ◽

Memory System ◽

Polymer Materials ◽

Electrical Signals ◽

Potential Applications ◽

Light Signals ◽

And Robotics

Artificial retinal materials play an important role in vision repair and robotics. However, the inability to convert characteristic visible light signals into electrical signals has seriously hindered the development and...

Scalable Data Management on Hybrid Memory System for Deep Neural Network Applications

10.1109/bigdata52589.2021.9671309 ◽

2021 ◽

Author(s):

Wei Rang ◽

Donglin Yang ◽

Zhimin Li ◽

Dazhao Cheng

Keyword(s):

Neural Network ◽

Data Management ◽

Deep Neural Network ◽

Memory System ◽

Hybrid Memory ◽

Network Applications ◽

Neural Network Applications

OpenMem: Hardware/Software Cooperative Management for Mobile Memory System

10.1109/dac18074.2021.9586186 ◽

2021 ◽

Author(s):

Fei Wen ◽

Mian Qin ◽

Paul Gratz ◽

Narasimha Reddy

Keyword(s):

Memory System ◽

Cooperative Management

BRAHMS: Beyond Conventional RRAM-based Neural Network Accelerators Using Hybrid Analog Memory System

10.1109/dac18074.2021.9586247 ◽

2021 ◽

Author(s):

Tao Song ◽

Xiaoming Chen ◽

Xiaoyu Zhang ◽

Yinhe Han

Keyword(s):

Neural Network ◽

Memory System ◽

Analog Memory

Predicting the trend of infectious diseases using grey self-memory system model: a case study of the incidence of tuberculosis

Public Health ◽

10.1016/j.puhe.2021.09.025 ◽

2021 ◽

Vol 201 ◽

pp. 108-114

Author(s):

Xiaojun Guo ◽

Houxue Shen ◽

Sifeng Liu ◽

Naiming Xie ◽

Yingjie Yang ◽

...

Keyword(s):

Infectious Diseases ◽

Memory System ◽

System Model

Extensive cortical functional connectivity of the human hippocampal memory system

Cortex ◽

10.1016/j.cortex.2021.11.014 ◽

2021 ◽

Author(s):

Qing Ma ◽

Edmund T. Rolls ◽

Chu-Chung Huang ◽

Wei Cheng ◽

Jianfeng Feng

Keyword(s):

Functional Connectivity ◽

Memory System

Neural Network Modeling and Organization of a Hierarchical Associative Memory System

Journal of Machinery Manufacture and Reliability ◽

10.3103/s1052618821080148 ◽

2021 ◽

Vol 50 (8) ◽

pp. 735-742

Author(s):

I. V. Stepanyan

Keyword(s):

Neural Network ◽

Associative Memory ◽

Network Modeling ◽

Memory System ◽

Neural Network Modeling

memory system
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

GPU Domain Specialization via Composable On-Package Architecture

Software Hint-Driven Data Management for Hybrid Memory in Mobile Systems

Molecular Docking for Ligand-Receptor Binding Process Based on Heterogeneous Computing

Construction of Polymer Materials with Specific Responses to Violet and Green Lights and Their Potential Applications in Artificial Visual Memory System

Scalable Data Management on Hybrid Memory System for Deep Neural Network Applications

OpenMem: Hardware/Software Cooperative Management for Mobile Memory System

BRAHMS: Beyond Conventional RRAM-based Neural Network Accelerators Using Hybrid Analog Memory System

Predicting the trend of infectious diseases using grey self-memory system model: a case study of the incidence of tuberculosis

Extensive cortical functional connectivity of the human hippocampal memory system

Neural Network Modeling and Organization of a Hierarchical Associative Memory System

Export Citation Format

memory systemRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

GPU Domain Specialization via Composable On-Package Architecture

Software Hint-Driven Data Management for Hybrid Memory in Mobile Systems

Molecular Docking for Ligand-Receptor Binding Process Based on Heterogeneous Computing

Construction of Polymer Materials with Specific Responses to Violet and Green Lights and Their Potential Applications in Artificial Visual Memory System

Scalable Data Management on Hybrid Memory System for Deep Neural Network Applications

OpenMem: Hardware/Software Cooperative Management for Mobile Memory System

BRAHMS: Beyond Conventional RRAM-based Neural Network Accelerators Using Hybrid Analog Memory System

Predicting the trend of infectious diseases using grey self-memory system model: a case study of the incidence of tuberculosis

Extensive cortical functional connectivity of the human hippocampal memory system

Neural Network Modeling and Organization of a Hierarchical Associative Memory System

memory system
Recently Published Documents