scholarly journals SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput

Author(s):  
Rotem Ben-Hur ◽  
Ronny Ronen ◽  
Ameer Haj-Ali ◽  
Debjyoti Bhattacharjee ◽  
Adi Eliahu ◽  
...  

In-memory processing can dramatically improve the latency and energy consumption of computing systems by minimizing the data transfer between the memory and the processor. Efficient execution of processing operations within the memory is therefore a highly motivated objective in modern computer architecture. This paper presents a novel automatic framework for efficient implementation of arbitrary combinational logic functions within a memristive memory. Using tools from logic design, graph theory and compiler register allocation technology, we developed SIMPLER (Synthesis and In-memory MaPping of Logic Execution in a single Row), a tool that optimizes the execution of in-memory logic operations in terms of throughput and area. Given a logical function, SIMPLER automatically generates a sequence of atomic Memristor-Aided loGIC (MAGIC) NOR operations and efficiently locates them within a single size-limited memory row, reusing cells to save area when needed. This approach fully exploits the parallelism offered by the MAGIC NOR gates. It allows multiple instances of the logic function to be performed concurrently, each compressed into a single row of the memory. This virtue makes SIMPLER an attractive candidate for designing in-memory Single Instruction, Multiple Data (SIMD) operations. Compared to previous work (that optimizes latency rather than throughput for a single function), SIMPLER achieves an average throughput improvement of 435×. When previous tools are parallelized similarly to SIMPLER, SIMPLER achieves higher throughput of at least 5×, with 23× improvement in area and 20× improvement in area efficiency. These improvements more than fully compensate for the increase (up to 17% on average) in latency.

2020 ◽  
Author(s):  
Rotem Ben-Hur ◽  
Ronny Ronen ◽  
Ameer Haj-Ali ◽  
Debjyoti Bhattacharjee ◽  
Adi Eliahu ◽  
...  

In-memory processing can dramatically improve the latency and energy consumption of computing systems by minimizing the data transfer between the memory and the processor. Efficient execution of processing operations within the memory is therefore a highly motivated objective in modern computer architecture. This paper presents a novel automatic framework for efficient implementation of arbitrary combinational logic functions within a memristive memory. Using tools from logic design, graph theory and compiler register allocation technology, we developed SIMPLER (Synthesis and In-memory MaPping of Logic Execution in a single Row), a tool that optimizes the execution of in-memory logic operations in terms of throughput and area. Given a logical function, SIMPLER automatically generates a sequence of atomic Memristor-Aided loGIC (MAGIC) NOR operations and efficiently locates them within a single size-limited memory row, reusing cells to save area when needed. This approach fully exploits the parallelism offered by the MAGIC NOR gates. It allows multiple instances of the logic function to be performed concurrently, each compressed into a single row of the memory. This virtue makes SIMPLER an attractive candidate for designing in-memory Single Instruction, Multiple Data (SIMD) operations. Compared to previous work (that optimizes latency rather than throughput for a single function), SIMPLER achieves an average throughput improvement of 435×. When previous tools are parallelized similarly to SIMPLER, SIMPLER achieves higher throughput of at least 5×, with 23× improvement in area and 20× improvement in area efficiency. These improvements more than fully compensate for the increase (up to 17% on average) in latency.


2012 ◽  
pp. 502-516
Author(s):  
Muzhou Xiong ◽  
Hai Jin

In this chapter, two algorithms have been presented for supporting efficient data transfer in the Grid environment. From a node’s perspective, a multiple data transfer channel can be formed by selecting some other nodes as relays in data transfer. One algorithm requires the sender to be aware of the global connection information while another does not. Experimental results indicate that both algorithms can transfer data efficiently under various circumstances.


Author(s):  
S.N. John ◽  
A.A. Anoprienko ◽  
C.U. Ndujiuba

This chapter provides solutions for increasing the efficiency of data transfer in modern computer network applications and computing network environments based on the TCP/IP protocol suite. In this work, an imitation model and simulation was used as the basic method in the research. A simulation model was developed for designing and analyzing the computer networks based on TCP/IP protocols suite which fully allows the exact features in realizing the protocols and their impact on increasing the efficiency of data transfer in local and corporate networks. The method of increasing efficiency in the performance of computer networks was offered, based on the TCP/IP protocols by perfection of the modes of data transfer in them. This allows an increased efficient usage of computer networks and network applications without additional expenditure on infrastructure of the network. Practically, the results obtained from this research enable significant increase in the performance efficiency of data transfer in the computer networks environment. An example is the “Donetsk National Technical University” network.


2018 ◽  
Vol 232 ◽  
pp. 01046
Author(s):  
Wan Qiao ◽  
Dake Liu

In this paper, we propose a flexible scalable BP Polar decoding application-specific instruction set processor (PASIP) that supports multiple code lengths (64 to 4096) and any code rates. High throughputs and sufficient programmability are achieved by the single-instruction-multiple-data (SIMD) based architecture and specially designed Polar decoding acceleration instructions. The synthesis result using 65 nm CMOS technology shows that the total area of PASIP is 2.71 mm2. PASIP provides the maximum throughput of 1563 Mbps (for N = 1024) at the work frequency of 400MHz. The comparison with state-of-art Polar decoders reveals PASIP’s high area efficiency.


2013 ◽  
Vol 791-793 ◽  
pp. 1845-1849
Author(s):  
Xu Dong Fang ◽  
Yu Hua Tang ◽  
Jun Jie Wu

With the realization of physical memristors, using memristors to perform stateful logic operations has been demonstrated feasible. In such operations, memristors simultaneously serve as latches and logic gates, thus enabling the in-situ computing which may open a new computing paradigm for computer architecture. In this paper, we first analyze two types of typical memristive stateful logic gates to reveal the working mechanism of the stateful logic, and then review the recent researches on the memristive stateful logic, and finally discuss the pros and cons of the stateful logic. We reach the conclusion that the stateful logic promises a novel computing paradigm which may revolutionize the conventional computer architecture, while its development is currently subjected to the state drift problem and is constrained by the lack of a general design methodology and physically verification.


Author(s):  
Yu Shi ◽  
Jian Li ◽  
Zhize Li

Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, several variants of GBDT training algorithms and implementations have been designed and heavily optimized in some very popular open sourced toolkits including XGBoost, LightGBM and CatBoost. In this paper, we show that both the accuracy and efficiency of GBDT can be further enhanced by using more complex base learners. Specifically, we extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees, as base learners. We show that PL Trees can accelerate convergence of GBDT and improve the accuracy. We also propose some optimization tricks to substantially reduce the training time of PL Trees, with little sacrifice of accuracy. Moreover, we propose several implementation techniques to speedup our algorithm on modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time.


Sign in / Sign up

Export Citation Format

Share Document