memory architectures Latest Research Papers

Sparse data applications have irregular access patterns that stymie modern memory architectures. Although hyper-sparse workloads have received considerable attention in the past, moderately-sparse workloads prevalent in machine learning applications, graph processing and HPC have not. Where the former can bypass the cache hierarchy, the latter fit in the cache. This article makes the observation that intelligent, near-processor cache management can improve bandwidth utilization for data-irregular accesses, thereby accelerating moderately-sparse workloads. We propose SortCache, a processor-centric approach to accelerating sparse workloads by introducing accelerators that leverage the on-chip cache subsystem, with minimal programmer intervention.

Download Full-text

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Algorithms ◽

10.3390/a14120342 ◽

2021 ◽

Vol 14 (12) ◽

pp. 342

Author(s):

Alessandro Varsi ◽

Simon Maskell ◽

Paul G. Spirakis

Keyword(s):

Parallel Computing ◽

Shared Memory ◽

Time Complexity ◽

Distributed Memory ◽

Particle Filters ◽

Dynamic Models ◽

State Of The Art ◽

Novel Approach ◽

Non Gaussian ◽

Memory Architectures

Resampling is a well-known statistical algorithm that is commonly applied in the context of Particle Filters (PFs) in order to perform state estimation for non-linear non-Gaussian dynamic models. As the models become more complex and accurate, the run-time of PF applications becomes increasingly slow. Parallel computing can help to address this. However, resampling (and, hence, PFs as well) necessarily involves a bottleneck, the redistribution step, which is notoriously challenging to parallelize if using textbook parallel computing techniques. A state-of-the-art redistribution takes O((log2N)2) computations on Distributed Memory (DM) architectures, which most supercomputers adopt, whereas redistribution can be performed in O(log2N) on Shared Memory (SM) architectures, such as GPU or mainstream CPUs. In this paper, we propose a novel parallel redistribution for DM that achieves an O(log2N) time complexity. We also present empirical results that indicate that our novel approach outperforms the O((log2N)2) approach.

Download Full-text

Scalable parallel algorithm for fast computation of Transitive Closure of Graphs on Shared Memory Architectures

10.1109/espm254806.2021.00006 ◽

2021 ◽

Author(s):

Sarthak Patel ◽

Bhrugu Dave ◽

Smit Kumbhani ◽

Mihir Desai ◽

Sidharth Kumar ◽

...

Keyword(s):

Parallel Algorithm ◽

Shared Memory ◽

Transitive Closure ◽

Fast Computation ◽

Memory Architectures ◽

Shared Memory Architectures

Download Full-text

HEART: H ybrid Memory and E nergy- A ware R eal- T ime Scheduling for Multi-Processor Systems

ACM Transactions on Embedded Computing Systems ◽

10.1145/3477019 ◽

2021 ◽

Vol 20 (5s) ◽

pp. 1-23

Author(s):

Mario Günzel ◽

Christian Hakert ◽

Kuan-Hsun Chen ◽

Jian-Jia Chen

Keyword(s):

Energy Consumption ◽

Real Time ◽

Shared Memory ◽

Computing System ◽

Dynamic Power Management ◽

Real Time System ◽

Extensive Evaluation ◽

Non Volatile Memory ◽

Volatile Memory ◽

Memory Architectures

Dynamic power management (DPM) reduces the power consumption of a computing system when it idles, by switching the system into a low power state for hibernation. When all processors in the system share the same component, e.g., a shared memory, powering off this component during hibernation is only possible when all processors idle at the same time. For a real-time system, the schedulability property has to be guaranteed on every processor, especially if idle intervals are considered to be actively introduced. In this work, we consider real-time systems with hybrid shared-memory architectures, which consist of shared volatile memory (VM) and non-volatile memory (NVM). Energy-efficient execution is achieved by applying DPM to turn off all memories during the hibernation mode. Towards this, we first explore the hybrid memory architectures and suggest a task model, which features configurable hibernation overheads. We propose a multi-processor procrastination algorithm (HEART), based on partitioned earliest-deadline-first (pEDF) scheduling. Our algorithm facilitates reducing the energy consumption by actively enlarging the hibernation time. It enforces all processors to idle simultaneously without violating the schedulability condition, such that the system can enter the hibernation state, where shared memories are turned off. Throughout extensive evaluation of HEART, we demonstrate (1) the increase in potential hibernation time, respectively the decrease in energy consumption, and (2) that our algorithm is not only more general but also has better performance than the state of the art with respect to energy efficiency in most cases.

Download Full-text

Impact of HKMG and FDSOI FeFET drain current variation in processing-in-memory architectures

Journal of Materials Research ◽

10.1557/s43578-021-00393-1 ◽

2021 ◽

Author(s):

Nathan Eli Miller ◽

Zheng Wang ◽

Saurabh Dash ◽

Asif Islam Khan ◽

Saibal Mukhopadhyay

Keyword(s):

Drain Current ◽

Current Variation ◽

Memory Architectures

Download Full-text

Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures

PLoS ONE ◽

10.1371/journal.pone.0257047 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257047

Author(s):

Adrián Lamela ◽

Óscar G. Ossorio ◽

Guillermo Vinuesa ◽

Benjamín Sahelices

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Multicore Processors ◽

Memory Access ◽

Non Volatile Memory ◽

Volatile Memory ◽

Memory Accesses ◽

Access Patterns ◽

Memory Architectures

Non-volatile memory technology is now available in commodity hardware. This technology can be used as a backup memory for an external dram cache memory without needing to modify the software. However, the higher read and write latencies of non-volatile memory may exacerbate the memory wall problem. In this work we present a novel off-chip prefetch technique based on a Hidden Markov Model that specifically deals with the latency problem caused by complexity of off-chip memory access patterns. Firstly, we present a thorough analysis of off-chip memory access patterns to identify its complexity in multicore processors. Based on this study, we propose a prefetching module located in the llc which uses two small tables, and where the computational complexity of which is linear with the number of computing threads. Our Markov-based technique is able to keep track and make clustering of several simultaneous groups of memory accesses coming from multiple simultaneous threads in a multicore processor. It can quickly identify complex address groups and trigger prefetch with very high accuracy. Our simulations show an improvement of up to 76% in the hit ratio of an off-chip dram cache for multicore architecture over the conventional prefetch technique (g/dc). Also, the overhead of prefetch requests (failed prefetches) is reduced by 48% in single core simulations and by 83% in multicore simulations.

Download Full-text

PIM-DL: Boosting DNN Inference on Digital Processing In-Memory Architectures via Data Layout Optimizations

10.1109/pact52795.2021.00021 ◽

2021 ◽

Author(s):

Minxuan Zhou ◽

Guoyang Chen ◽

Mohsen Imani ◽

Saransh Gupta ◽

Weifeng Zhang ◽

...

Keyword(s):

Digital Processing ◽

Data Layout ◽

Memory Architectures

Download Full-text

A Survey on Domain-Specific Memory Architectures

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v16i2.509 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-9

Author(s):

Stephanie Soldavini ◽

Christian Pilato

Keyword(s):

High Performance ◽

Current Trend ◽

Security And Privacy ◽

Graph Analytics ◽

Domain Specific ◽

Specific Memory ◽

Memory Modules ◽

Proper Design ◽

Efficient Memory ◽

Memory Architectures

The never-ending demand for high performance and energy efficiency is pushing designers towards an increasing level of heterogeneity and specialization in modern computing systems. In such systems, creating efficient memory architectures is one of the major opportunities for optimizing modern workloads (e.g., computer vision, machine learning, graph analytics, etc.) that are extremely data-driven. However, designers demand proper design methods to tackle the increasing design complexity and address several new challenges, like the security and privacy of the data to be elaborated.This paper overviews the current trend for the design of domain-specific memory architectures. Domain-specific architectures are tailored for the given application domain, with the introduction of hardware accelerators and custom memory modules while maintaining a certain level of flexibility. We describe the major components, the common challenges, and the state-of-the-art design methodologies for building domain-specific memory architectures. We also discuss the most relevant research projects, providing a classification based on our main topics.

Download Full-text

The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures

Future Generation Computer Systems ◽

10.1016/j.future.2021.07.021 ◽

2021 ◽

Author(s):

Júnior Löff ◽

Dalvan Griebler ◽

Gabriele Mencagli ◽

Gabriell Araujo ◽

Massimo Torquati ◽

...

Keyword(s):

Parallel Programming ◽

Shared Memory ◽

Memory Architectures ◽

Shared Memory Architectures ◽

Programming Frameworks

Download Full-text

MNSIM-TIME: Performance Modeling Framework for Training-In-Memory Architectures

2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS) ◽

10.1109/aicas51828.2021.9458441 ◽

2021 ◽

Author(s):

Kaizhong Qiu ◽

Zhenhua Zhu ◽

Yi Cai ◽

Hanbo Sun ◽

Yu Wang ◽

...

Keyword(s):

Performance Modeling ◽

Modeling Framework ◽

Time Performance ◽

Memory Architectures

Download Full-text

memory architectures
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

SortCache

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Scalable parallel algorithm for fast computation of Transitive Closure of Graphs on Shared Memory Architectures

HEART: H ybrid Memory and E nergy- A ware R eal- T ime Scheduling for Multi-Processor Systems

Impact of HKMG and FDSOI FeFET drain current variation in processing-in-memory architectures

Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures

PIM-DL: Boosting DNN Inference on Digital Processing In-Memory Architectures via Data Layout Optimizations

A Survey on Domain-Specific Memory Architectures

The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures

MNSIM-TIME: Performance Modeling Framework for Training-In-Memory Architectures

Export Citation Format

memory architecturesRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

SortCache

An O(log2N) Fully-Balanced Resampling Algorithm for Particle Filters on Distributed Memory Architectures

Scalable parallel algorithm for fast computation of Transitive Closure of Graphs on Shared Memory Architectures

HEART: H ybrid Memory and E nergy- A ware R eal- T ime Scheduling for Multi-Processor Systems

Impact of HKMG and FDSOI FeFET drain current variation in processing-in-memory architectures

Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures

PIM-DL: Boosting DNN Inference on Digital Processing In-Memory Architectures via Data Layout Optimizations

A Survey on Domain-Specific Memory Architectures

The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures

MNSIM-TIME: Performance Modeling Framework for Training-In-Memory Architectures

memory architectures
Recently Published Documents