SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput

In-memory processing can dramatically improve the latency and energy consumption of computing systems by minimizing the data transfer between the memory and the processor. Efficient execution of processing operations within the memory is therefore a highly motivated objective in modern computer architecture. This paper presents a novel automatic framework for efficient implementation of arbitrary combinational logic functions within a memristive memory. Using tools from logic design, graph theory and compiler register allocation technology, we developed SIMPLER (Synthesis and In-memory MaPping of Logic Execution in a single Row), a tool that optimizes the execution of in-memory logic operations in terms of throughput and area. Given a logical function, SIMPLER automatically generates a sequence of atomic Memristor-Aided loGIC (MAGIC) NOR operations and efficiently locates them within a single size-limited memory row, reusing cells to save area when needed. This approach fully exploits the parallelism offered by the MAGIC NOR gates. It allows multiple instances of the logic function to be performed concurrently, each compressed into a single row of the memory. This virtue makes SIMPLER an attractive candidate for designing in-memory Single Instruction, Multiple Data (SIMD) operations. Compared to previous work (that optimizes latency rather than throughput for a single function), SIMPLER achieves an average throughput improvement of 435×. When previous tools are parallelized similarly to SIMPLER, SIMPLER achieves higher throughput of at least 5×, with 23× improvement in area and 20× improvement in area efficiency. These improvements more than fully compensate for the increase (up to 17% on average) in latency.

Download Full-text

Molecular Dynamics Performance Evaluation with Modern Computer Architecture

Lecture Notes in Computer Science - Numerical Computations: Theory and Algorithms ◽

10.1007/978-3-030-40616-5_26 ◽

2020 ◽

pp. 322-329

Author(s):

Emanuele Breuza ◽

Giorgio Colombo ◽

Daniele Gregori ◽

Filippo Marchetti

Keyword(s):

Molecular Dynamics ◽

Performance Evaluation ◽

Computer Architecture ◽

Modern Computer

Download Full-text

Optimization Algorithms for Data Transfer in the Grid Environment

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch210 ◽

2012 ◽

pp. 502-516

Author(s):

Muzhou Xiong ◽

Hai Jin

Keyword(s):

Data Transfer ◽

Optimization Algorithms ◽

Experimental Results ◽

Grid Environment ◽

Transfer Data ◽

Multiple Data ◽

Transfer Channel ◽

Efficient Data ◽

Global Connection

In this chapter, two algorithms have been presented for supporting efficient data transfer in the Grid environment. From a node’s perspective, a multiple data transfer channel can be formed by selecting some other nodes as relays in data transfer. One algorithm requires the sender to be aware of the global connection information while another does not. Experimental results indicate that both algorithms can transfer data efficiently under various circumstances.

Download Full-text

TCP/IP Protocol-Based Model for Increasing the Efficiency of Data Transfer in Computer Networks

Integrated Models for Information Communication Systems and Networks ◽

10.4018/978-1-4666-2208-1.ch006 ◽

2013 ◽

pp. 116-134

Author(s):

S.N. John ◽

A.A. Anoprienko ◽

C.U. Ndujiuba

Keyword(s):

Simulation Model ◽

Computer Networks ◽

Computer Network ◽

Data Transfer ◽

Basic Method ◽

Imitation Model ◽

Corporate Networks ◽

Network Applications ◽

Modern Computer ◽

Model And Simulation

This chapter provides solutions for increasing the efficiency of data transfer in modern computer network applications and computing network environments based on the TCP/IP protocol suite. In this work, an imitation model and simulation was used as the basic method in the research. A simulation model was developed for designing and analyzing the computer networks based on TCP/IP protocols suite which fully allows the exact features in realizing the protocols and their impact on increasing the efficiency of data transfer in local and corporate networks. The method of increasing efficiency in the performance of computer networks was offered, based on the TCP/IP protocols by perfection of the modes of data transfer in them. This allows an increased efficient usage of computer networks and network applications without additional expenditure on infrastructure of the network. Practically, the results obtained from this research enable significant increase in the performance efficiency of data transfer in the computer networks environment. An example is the “Donetsk National Technical University” network.

Download Full-text

Modern computer architecture teaching and learning support: An experience in evaluation

International Conference on Information Society (i-Society 2011) ◽

10.1109/i-society18435.2011.5978481 ◽

2011 ◽

Cited By ~ 3

Author(s):

Besim Mustafa

Keyword(s):

Computer Architecture ◽

Teaching And Learning ◽

Learning Support ◽

Modern Computer

Download Full-text

A scalable ASIP for BP Polar decoding with multiple code lengths

MATEC Web of Conferences ◽

10.1051/matecconf/201823201046 ◽

2018 ◽

Vol 232 ◽

pp. 01046

Author(s):

Wan Qiao ◽

Dake Liu

Keyword(s):

Cmos Technology ◽

Single Instruction Multiple Data ◽

Instruction Set ◽

Maximum Throughput ◽

Specific Instruction ◽

Area Efficiency ◽

Multiple Data ◽

High Area ◽

Multiple Code ◽

Application Specific

In this paper, we propose a flexible scalable BP Polar decoding application-specific instruction set processor (PASIP) that supports multiple code lengths (64 to 4096) and any code rates. High throughputs and sufficient programmability are achieved by the single-instruction-multiple-data (SIMD) based architecture and specially designed Polar decoding acceleration instructions. The synthesis result using 65 nm CMOS technology shows that the total area of PASIP is 2.71 mm2. PASIP provides the maximum throughput of 1563 Mbps (for N = 1024) at the work frequency of 400MHz. The comparison with state-of-art Polar decoders reveals PASIP’s high area efficiency.

Download Full-text

Scheduling Divisible Loads on Bus Networks with Start-Up Costs by Utilizing Multiple Data Transfer Streams: PORI

2007 International Conference on Parallel Processing (ICPP 2007) ◽

10.1109/icpp.2007.75 ◽

2007 ◽

Cited By ~ 1

Author(s):

Jie Hu ◽

Raymond Klefstad

Keyword(s):

Data Transfer ◽

Divisible Loads ◽

Multiple Data ◽

Start Up ◽

Bus Networks

Download Full-text

A Review on Memristive Stateful Logic

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.791-793.1845 ◽

2013 ◽

Vol 791-793 ◽

pp. 1845-1849

Author(s):

Xu Dong Fang ◽

Yu Hua Tang ◽

Jun Jie Wu

Keyword(s):

Computer Architecture ◽

Design Methodology ◽

Logic Gates ◽

Working Mechanism ◽

General Design ◽

Computing Paradigm ◽

Pros And Cons ◽

Logic Operations ◽

Conventional Computer

With the realization of physical memristors, using memristors to perform stateful logic operations has been demonstrated feasible. In such operations, memristors simultaneously serve as latches and logic gates, thus enabling the in-situ computing which may open a new computing paradigm for computer architecture. In this paper, we first analyze two types of typical memristive stateful logic gates to reveal the working mechanism of the stateful logic, and then review the recent researches on the memristive stateful logic, and finally discuss the pros and cons of the stateful logic. We reach the conclusion that the stateful logic promises a novel computing paradigm which may revolutionize the conventional computer architecture, while its development is currently subjected to the state drift problem and is constrained by the lack of a general design methodology and physically verification.

Download Full-text

Gradient Boosting with Piece-Wise Linear Regression Trees

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/476 ◽

2019 ◽

Cited By ~ 3

Author(s):

Yu Shi ◽

Jian Li ◽

Zhize Li

Keyword(s):

Linear Regression ◽

Learning Algorithm ◽

Piecewise Linear ◽

Regression Trees ◽

Gradient Boosting ◽

Training Algorithms ◽

Training Time ◽

Modern Computer ◽

Multiple Data ◽

Boosted Decision Trees

Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, several variants of GBDT training algorithms and implementations have been designed and heavily optimized in some very popular open sourced toolkits including XGBoost, LightGBM and CatBoost. In this paper, we show that both the accuracy and efficiency of GBDT can be further enhanced by using more complex base learners. Specifically, we extend gradient boosting to use piecewise linear regression trees (PL Trees), instead of piecewise constant regression trees, as base learners. We show that PL Trees can accelerate convergence of GBDT and improve the accuracy. We also propose some optimization tricks to substantially reduce the training time of PL Trees, with little sacrifice of accuracy. Moreover, we propose several implementation techniques to speedup our algorithm on modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism. The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time.

Download Full-text

Towards a modern computer architecture curriculum

FIE'99 Frontiers in Education. 29th Annual Frontiers in Education Conference. Designing the Future of Science and Engineering Education. Conference Proceedings (IEEE Cat. No.99CH37011 ◽

10.1109/fie.1999.839250 ◽

2003 ◽

Cited By ~ 3

Author(s):

A. Clements ◽

A. Shvartsman ◽

W.K. King ◽

Chao Lu ◽

D. Dupont

Keyword(s):

Computer Architecture ◽

Modern Computer

Download Full-text