memory bottleneck
Recently Published Documents


TOTAL DOCUMENTS

28
(FIVE YEARS 9)

H-INDEX

7
(FIVE YEARS 1)

2022 ◽  
Vol 18 (2) ◽  
pp. 1-24
Author(s):  
Saman Froehlich ◽  
Saeideh Shirinzadeh ◽  
Rolf Drechsler

Resistive Random Access Memory (ReRAM) is an emerging non-volatile memory technology. Besides its low power consumption and its high scalability, its inherent computation capabilities make ReRAM especially interesting for future computer architectures. Merging computations into the memory is a promising solution for overcoming the memory bottleneck. To perform computations in ReRAM, efficient synthesis strategies for Boolean functions have to be developed. In this article, we give a thorough presentation of how to employ parallel computing capabilities of ReRAM for the synthesis of functions given state-of-the-art graph-based representations AIGs or BDDs. Additionally, we introduce a new graph-based representation called m-And-Inverter Graph (m-AIGs), which allows us to fully exploit the computing capabilities of ReRAM. In the simulations, we show that our proposed approaches outperform state-of-the art synthesis strategies, and we show the superiority of m-AIGs over the standard AIG representation for ReRAM-based synthesis.


2022 ◽  
Vol 18 (1) ◽  
pp. 1-23
Author(s):  
Jianhui Han ◽  
Xiang Fei ◽  
Zhaolin Li ◽  
Youhui Zhang

Memristor-based processing-in-memory architecture is a promising solution to the memory bottleneck in the neural network ( NN ) processing. A major challenge for the programmability of such architectures is the automatic compilation of high-level NN workloads, from various operators to the memristor-based hardware that may provide programming interfaces with different granularities. This article proposes a source-to-source compilation framework for such memristor-based NN accelerators, which can conduct automatic detection and mapping of multiple NN operators based on the flexible and rich representation capability of the polyhedral model. In contrast to previous studies, it implements support for pipeline generation to exploit the parallelism in the NN loads to leverage hardware resources for higher efficiency. The evaluation based on synthetic kernels and NN benchmarks demonstrates that the proposed framework can reliably detect and map the target operators. Case studies on typical memristor-based architectures also show its generality over various architectural designs. The evaluation further demonstrates that compared with existing polyhedral-based compilation frameworks that do not support the pipelined execution, the performance can upgrade by an order of magnitude with the pipelined execution, which emphasizes the necessity of our improvement.


2021 ◽  
Author(s):  
Barak Hoffer ◽  
Nicolás Wainstein ◽  
Christopher M. Neumann ◽  
Eric Pop ◽  
Eilam Yalon ◽  
...  

Abstract Stateful logic is a digital processing-in-memory technique that could address von Neumann memory bottleneck challenges while maintaining backward compatibility with von Neumann architectures. In stateful logic, memory cells are used to perform the logic operations without reading or moving any data outside the memory array. This has been previously demonstrated using several resistive memory types, but not with commercially available phase-change memory (PCM). Here we present the first implementation of stateful logic using PCM. We experimentally demonstrate four logic gate types (NOR, IMPLY, OR, NIMP) using commonly used PCM materials and crossbar-compatible structures. Our stateful logic gates form a functionally complete set, which enables sequential execution of any logic function within the memory and paves the way to PCM-based digital processing-in-memory systems.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-17
Author(s):  
Paulo Cesar Santos ◽  
Francis Birck Moreira ◽  
Aline Santana Cordeiro ◽  
Sairo Raoní Santos ◽  
Tiago Rodrigo Kepe ◽  
...  

One of the main challenges for modern processors is the data transfer between processor and memory. Such data movement implies high latency and high energy consumption. In this context, Near-Data Processing (NDP) proposals have started to gain acceptance as an accelerator device. Such proposals alleviate the memory bottleneck by moving instructions to data whereabouts. The first proposals date back to the 1990s, but it was only in the 2010s that we could observe an increase in papers addressing NDP. It occurred together with the appearance of 3D-stacked chips with logic and memory stacked layers. This survey presents a brief history of these accelerators, focusing on the applications domains migrated to near-data and the proposed architectures. We also introduce a new taxonomy to classify such architectural proposals according to their data distance.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6229
Author(s):  
Saike Zhu ◽  
Lidan Wang ◽  
Zhekang Dong ◽  
Shukai Duan

In recent years, convolution operations often consume a lot of time and energy in deep learning algorithms, and convolution is usually used to remove noise or extract the edges of an image. However, under data-intensive conditions, frequent operations of the above algorithms will cause a significant memory/communication burden to the computing system. This paper proposes a circuit based on spin memristor cross array to solve the problems mentioned above. First, a logic switch based on spin memristors is proposed, which realizes the control of the memristor cross array. Secondly, a new type of spin memristor cross array and peripheral circuits is proposed, which realizes the multiplication and addition operation in the convolution operation and significantly alleviates the computational memory bottleneck. At last, the color image filtering and edge extraction simulation are carried out. By calculating the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) of the image result, the processing effects of different operators are compared, and the correctness of the circuit is verified.


Author(s):  
Chen Yang ◽  
Jingyu Zhang ◽  
Qi Chen ◽  
Yi Xu ◽  
Cimang Lu

Pedestrian recognition has achieved the state-of-the-art performance due to the progress of recent convolutional neural network (CNN). However, mainstream CNN models are too complicated to emerging Computing-In-Memory (CIM) architectures for hardware implementation, because enormous parameters and massive intermediate processing results may incur severe “memory bottleneck”. This paper proposed a design methodology of Parameter Substitution with Nodes Compensation (PSNC) to significantly reduce parameters of CNN model without inference accuracy degradation. Based on the PSNC methodology, an ultra-lightweight convolutional neural network (UL-CNN) was designed. The UL-CNN model is a specially optimized convolutional neural network aiming at a flash-based CIM architecture (Conv-Flash) and to apply for recognizing person. The implementation result of running UL-CNN on Conv-Flash shows that the inference accuracy is up to 94.7%. Compared to LeNet-5, on the premise of the similar operations and accuracy, the amounts of UL-CNN’s parameters are less than 37% of LeNet-5 at the same dataset benchmark. Such parameter reduction can dramatically speed up the training process and economize on-chip storage overhead, as well as save the power consumption of the memory access. With the aid of UL-CNN, the Conv-Flash architecture can provide the best energy efficiency compared to other platforms (CPU, GPU, FPGA, etc.), which consumes only 2.2[Formula: see text] 105J to complete pedestrian recognition for one frame.


Micromachines ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 622
Author(s):  
Hasan Erdem Yantır ◽  
Ahmed M. Eltawil ◽  
Khaled N. Salama

The traditional computer architectures severely suffer from the bottleneck between the processing elements and memory that is the biggest barrier in front of their scalability. Nevertheless, the amount of data that applications need to process is increasing rapidly, especially after the era of big data and artificial intelligence. This fact forces new constraints in computer architecture design towards more data-centric principles. Therefore, new paradigms such as in-memory and near-memory processors have begun to emerge to counteract the memory bottleneck by bringing memory closer to computation or integrating them. Associative processors are a promising candidate for in-memory computation, which combines the processor and memory in the same location to alleviate the memory bottleneck. One of the applications that need iterative processing of a huge amount of data is stencil codes. Considering this feature, associative processors can provide a paramount advantage for stencil codes. For demonstration, two in-memory associative processor architectures for 2D stencil codes are proposed, implemented by both emerging memristor and traditional SRAM technologies. The proposed architecture achieves a promising efficiency for a variety of stencil applications and thus proves its applicability for scientific stencil computing.


2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Shijie Li ◽  
Xiaolong Shen ◽  
Yong Dou ◽  
Shice Ni ◽  
Jinwei Xu ◽  
...  

Recently, machine learning, especially deep learning, has been a core algorithm to be widely used in many fields such as natural language processing, speech recognition, object recognition, and so on. At the same time, another trend is that more and more applications are moved to wearable and mobile devices. However, traditional deep learning methods such as convolutional neural network (CNN) and its variants consume a lot of memory resources. In this case, these powerful deep learning methods are difficult to apply on mobile memory-limited platforms. In order to solve this problem, we present a novel memory-management strategy called mmCNN in this paper. With the help of this method, we can easily deploy a trained large-size CNN on any memory size platform such as GPU, FPGA, or memory-limited mobile devices. In our experiments, we run a feed-forward CNN process in some extremely small memory sizes (as low as 5 MB) on a GPU platform. The result shows that our method saves more than 98% memory compared to a traditional CNN algorithm and further saves more than 90% compared to the state-of-the-art related work “vDNNs” (virtualized deep neural networks). Our work in this paper improves the computing scalability of lightweight applications and breaks the memory bottleneck of using deep learning method on memory-limited devices.


2019 ◽  
Vol 94 ◽  
pp. 03013
Author(s):  
Jong-Il Park ◽  
Kwi Woo Park ◽  
Chansik Park

In this paper, we design and implement GPU-based non-coherent GPS signal tracking module for real-time SDR. When using CPU and GPU simultaneously, the signal tracking module should be designed considering the memory bottleneck between the two processors. The basic non-coherent module, which accumulates the 1msec correlation value 20 times, is changed to accumulate M(20/N) times of Nmsec units. From the experiments using real GPS signals, the computational performance of N=20 is improved by 80% compared to N=1. Therefore, the implemented SDR using notebook computer can tracks more channels simultaneously in the real time.


Sign in / Sign up

Export Citation Format

Share Document