memory wall
Recently Published Documents


TOTAL DOCUMENTS

52
(FIVE YEARS 16)

H-INDEX

13
(FIVE YEARS 2)

Author(s):  
Qin Li ◽  
Peiyan Dong ◽  
Zijie Yu ◽  
Changlu Liu ◽  
Fei Qiao ◽  
...  
Keyword(s):  

Micromachines ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 1075
Author(s):  
Tao Cai ◽  
Qingjian He ◽  
Dejiao Niu ◽  
Fuli Chen ◽  
Jie Wang ◽  
...  

The non-volatile memory (NVM) device is a useful way to solve the memory wall in computers. However, the current I/O software stack in operating systems becomes a performance bottleneck for applications based on NVM devices, especially for key–value stores. We analyzed the characteristics of key–value stores and NVM devices and designed a new embedded key–value store for an NVM device simulator named PMEKV. The embedded processor in NVM devices was used to manage key–value pairs to reduce the data transfer between NVM devices and key–value applications. Meanwhile, it also cut down the data copy between the user space and the kernel space in the operating system to alleviate the I/O software stacks on the efficiency of key–value stores. The architecture, data layout, management strategy, new interface and log strategy of PMEKV are given. Finally, a prototype of PMEKV was implemented based on PMEM. We used YCSB to test and compare it with Redis, MongDB, and Memcache. Meanwhile, the Redis for PMEM named PMEM-Redis and PMEM-KV were also used to test and compared with PMEKV. The results show that PMEKV had the advantage of throughput and adaptability compared with the current key–value stores.


2020 ◽  
Vol 14 (3) ◽  
pp. 241-254
Author(s):  
Chen Luo ◽  
Michael J. Carey

Log-Structured Merge-trees (LSM-trees) have been widely used in modern NoSQL systems. Due to their out-of-place update design, LSM-trees have introduced memory walls among the memory components of multiple LSM-trees and between the write memory and the buffer cache. Optimal memory allocation among these regions is non-trivial because it is highly workload-dependent. Existing LSM-tree implementations instead adopt static memory allocation schemes due to their simplicity and robustness, sacrificing performance. In this paper, we attempt to break down these memory walls in LSM-based storage systems. We first present a memory management architecture that enables adaptive memory management. We then present a partitioned memory component structure with new flush policies to better exploit the write memory to minimize the write cost. To break down the memory wall between the write memory and the buffer cache, we further introduce a memory tuner that tunes the memory allocation between these two regions. We have conducted extensive experiments in the context of Apache AsterixDB using the YCSB and TPC-C benchmarks and we present the results here.


2020 ◽  
Vol 10 (3) ◽  
pp. 28
Author(s):  
John Reuben

As we approach the end of Moore’s law, many alternative devices are being explored to satisfy the performance requirements of modern integrated circuits. At the same time, the movement of data between processing and memory units in contemporary computing systems (‘von Neumann bottleneck’ or ‘memory wall’) necessitates a paradigm shift in the way data is processed. Emerging resistance switching memories (memristors) show promising signs to overcome the ‘memory wall’ by enabling computation in the memory array. Majority logic is a type of Boolean logic which has been found to be an efficient logic primitive due to its expressive power. In this review, the efficiency of majority logic is analyzed from the perspective of in-memory computing. Recently reported methods to implement majority gate in Resistive RAM array are reviewed and compared. Conventional CMOS implementation accommodated heterogeneity of logic gates (NAND, NOR, XOR) while in-memory implementation usually accommodates homogeneity of gates (only IMPLY or only NAND or only MAJORITY). In view of this, memristive logic families which can implement MAJORITY gate and NOT (to make it functionally complete) are to be favored for in-memory computing. One-bit full adders implemented in memory array using different logic primitives are compared and the efficiency of majority-based implementation is underscored. To investigate if the efficiency of majority-based implementation extends to n-bit adders, eight-bit adders implemented in memory array using different logic primitives are compared. Parallel-prefix adders implemented in majority logic can reduce latency of in-memory adders by 50–70% when compared to IMPLY, NAND, NOR and other similar logic primitives.


2020 ◽  
Vol 29 (7) ◽  
pp. 078504
Author(s):  
Xiaohe Huang ◽  
Chunsen Liu ◽  
Yu-Gang Jiang ◽  
Peng Zhou
Keyword(s):  

Micromachines ◽  
2020 ◽  
Vol 11 (5) ◽  
pp. 496 ◽  
Author(s):  
John Reuben

The flow of data between processing and memory units in contemporary computing systems is their main performance and energy-efficiency bottleneck, often referred to as the ‘von Neumann bottleneck’ or ‘memory wall’. Emerging resistance switching memories (memristors) show promising signs to overcome the ‘memory wall’ by enabling computation in the memory array. Majority logic is a type of Boolean logic, and in many nanotechnologies, it has been found to be an efficient logic primitive. In this paper, a technique is proposed to implement a majority gate in a memory array. The majority gate is realised in an energy-efficient manner as a memory R E A D operation. The proposed logic family disintegrates arithmetic operations to majority and NOT operations which are implemented as memory R E A D and W R I T E operations. A 1-bit full adder can be implemented in 6 steps (memory cycles) in a 1T–1R array, which is faster than I M P L Y , N A N D , N O R and other similar logic primitives.


2020 ◽  
Vol 69 (5) ◽  
pp. 722-733
Author(s):  
Chun-Feng Wu ◽  
Yuan-Hao Chang ◽  
Ming-Chang Yang ◽  
Tei-Wei Kuo

Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1681 ◽  
Author(s):  
Milena Andrighetti ◽  
Giovanna Turvani ◽  
Giulia Santoro ◽  
Marco Vacca ◽  
Andrea Marchesin ◽  
...  

To live in the information society means to be surrounded by billions of electronic devices full of sensors that constantly acquire data. This enormous amount of data must be processed and classified. A solution commonly adopted is to send these data to server farms to be remotely elaborated. The drawback is a huge battery drain due to high amount of information that must be exchanged. To compensate this problem data must be processed locally, near the sensor itself. But this solution requires huge computational capabilities. While microprocessors, even mobile ones, nowadays have enough computational power, their performance are severely limited by the Memory Wall problem. Memories are too slow, so microprocessors cannot fetch enough data from them, greatly limiting their performance. A solution is the Processing-In-Memory (PIM) approach. New memories are designed that can elaborate data inside them eliminating the Memory Wall problem. In this work we present an example of such a system, using as a case of study the Bitmap Indexing algorithm. Such algorithm is used to classify data coming from many sources in parallel. We propose a hardware accelerator designed around the Processing-In-Memory approach, that is capable of implementing this algorithm and that can also be reconfigured to do other tasks or to work as standard memory. The architecture has been synthesized using CMOS technology. The results that we have obtained highlights that, not only it is possible to process and classify huge amount of data locally, but also that it is possible to obtain this result with a very low power consumption.


Author(s):  
Milena Andrighetti ◽  
Giovanna Turvani ◽  
Giulia Santoro ◽  
Marco Vacca ◽  
Andrea Marchesin ◽  
...  

To live in the information society means to be surrounded by billions of electronic devices full of sensors that constantly acquire data. This enormous amount of data must be processed and classified. A solution commonly adopted is to send these data to server farms to be remotely elaborated. The drawback is a huge battery drain due to high amount of information that must be exchanged. To compensate this problem data must be processed locally, near the sensor itself. But this solution requires huge computational capabilities. While microprocessors, even mobile ones, nowadays have enough computational power, their performance are severely limited by the Memory Wall problem. Memories are too slow, so microprocessors cannot fetch enough data from them, greatly limiting their performance. A solution is the Processing-In-Memory (PIM) approach. New memories are designed that are able to elaborate data inside them eliminating the Memory Wall problem. In this work we present an example of such system, using as a case of study the Bitmap Indexing algorithm. Such algorithm is used to classify data coming from many sources in parallel. We propose an hardware accelerator designed around the Processing-In-Memory approach, that is capable of implementing this algorithm and that can also be reconfigured to do other tasks or to work as standard memory. The architecture has been synthesized using CMOS technology. The results that we have obtained highlights that, not only it is possible to process and classify huge amount of data locally, but also that it is possible to obtain this result with a very low power consumption.


Sign in / Sign up

Export Citation Format

Share Document