scholarly journals Design and Implementation of 6-Stage 64-bit MIPS Pipelined Architecture

Pipelining is the concept of overlapping of multiple instructions to perform their operations to optimize the time and ability of hardware units. This paper presents the design and implementation of 6 stage pipelined architecture for High performance 64-bit Microprocessor without Interlocked Pipeline Stages (MIPS) based Reduced Instruction set computing (RISC) processor. In this work, combining efforts of pre-fetching unit, forwarding unit, Branch and Jump predicting unit, Hazard unit are used to reduce the hazards. Low power unit is used to minimize the power. Cache Memories, other devices and especially balancing pipeline stages optimize the Speed in this work. DDR4 SDRAM (Double Data Rate type4 Synchronous Dynamic Random Access Memory) controller is employed in this pipeline to achieve high-speed data transfers and to manage the entire system efficiently. Low power, Low delay Flip flops are used in pipeline registers that implicitly enhance the performance of the system. The proposed method provides better results compared to the existing models. The simulation and synthesis results of the proposed Architecture are evaluated by Xilinx 14.7 software and supporting graphs are plotted through MATLAB tool

Nanophotonics ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 937-945
Author(s):  
Ruihuan Zhang ◽  
Yu He ◽  
Yong Zhang ◽  
Shaohua An ◽  
Qingming Zhu ◽  
...  

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.


2014 ◽  
Vol 971-973 ◽  
pp. 1581-1585 ◽  
Author(s):  
Jun Liu ◽  
Yan Tian ◽  
Wei Hao ◽  
Lei Qu

In order to meet the request of high-speed data exchange in embedded systems, this paper details the high-speed SRIO (Serial RapidIO) interface protocol and the process of SRIO access timing between the local endpoint devices and the remote endpoint devices. And also we implement the design of the new high-performance RapidIO interconnection between DSP and FPGA. Through the performance testing of SRIO data transmission system, experimental results show that the design can stably transfer data at high speed between processors.


Author(s):  
Sai Venkatramana Prasada G.S ◽  
G. Seshikala ◽  
S. Niranjana

Background: This paper presents the comparative study of power dissipation, delay and power delay product (PDP) of different full adders and multiplier designs. Methods: Full adder is the fundamental operation for any processors, DSP architectures and VLSI systems. Here ten different full adder structures were analyzed for their best performance using a Mentor Graphics tool with 180nm technology. Results: From the analysis result high performance full adder is extracted for further higher level designs. 8T full adder exhibits high speed, low power delay and low power delay product and hence it is considered to construct four different multiplier designs, such as Array multiplier, Baugh Wooley multiplier, Braun multiplier and Wallace Tree multiplier. These different structures of multipliers were designed using 8T full adder and simulated using Mentor Graphics tool in a constant W/L aspect ratio. Conclusion: From the analysis, it is concluded that Wallace Tree multiplier is the high speed multiplier but dissipates comparatively high power. Baugh Wooley multiplier dissipates less power but exhibits more time delay and low PDP.


2005 ◽  
Vol 15 (02) ◽  
pp. 459-476
Author(s):  
C. PATRICK YUE ◽  
JAEJIN PARK ◽  
RUIFENG SUN ◽  
L. RICK CARLEY ◽  
FRANK O'MAHONY

This paper presents the low-power circuit techniques suitable for high-speed digital parallel interfaces each operating at over 10 Gbps. One potential application for such high-performance I/Os is the interface between the channel IC and the magnetic read head in future compact hard disk systems. First, a crosstalk cancellation technique using a novel data encoding scheme is introduced to suppress electromagnetic interference (EMI) generated by the adjacent parallel I/Os . This technique is implemented utilizing a novel 8-4-PAM signaling with a data look-ahead algorithm. The key circuit components in the high-speed interface transceiver including the receive sampler, the phase interpolator, and the transmitter output driver are described in detail. Designed in a 0.13-μm digital CMOS process, the transceiver consumes 310 mW per 10-Gps channel from a I-V supply based on simulation results. Next, a 20-Gbps continuous-time adaptive passive equalizer utilizing on-chip lumped RLC components is described. Passive equalizers offer the advantages of higher bandwidth and lower power consumption compared with conventional designs using active filter. A low-power, continuous-time servo loop is designed to automatically adjust the equalizer frequency response for the optimal gain compensation. The equalizer not only adapts to different channel characteristics, but also accommodates temperature and process variations. Implemented in a 0.25-μm, 1P6M BiCMOS process, the equalizer can compensate up to 20 dB of loss at 10 GHz while only consumes 32 mW from a 2.5-V supply.


2013 ◽  
Vol 1538 ◽  
pp. 291-302
Author(s):  
Edward Yi Chang ◽  
Hai-Dang Trinh ◽  
Yueh-Chin Lin ◽  
Hiroshi Iwai ◽  
Yen-Ku Lin

ABSTRACTIII-V compounds such as InGaAs, InAs, InSb have great potential for future low power high speed devices (such as MOSFETs, QWFETs, TFETs and NWFETs) application due to their high carrier mobility and drift velocity. The development of good quality high k gate oxide as well as high k/III-V interfaces is prerequisite to realize high performance working devices. Besides, the downscaling of the gate oxide into sub-nanometer while maintaining appropriate low gate leakage current is also needed. The lack of high quality III-V native oxides has obstructed the development of implementing III-V based devices on Si template. In this presentation, we will discuss our efforts to improve high k/III-V interfaces as well as high k oxide quality by using chemical cleaning methods including chemical solutions, precursors and high temperature gas treatments. The electrical properties of high k/InSb, InGaAs, InSb structures and their dependence on the thermal processes are also discussed. Finally, we will present the downscaling of the gate oxide into sub-nanometer scale while maintaining low leakage current and a good high k/III-V interface quality.


2012 ◽  
Vol 479-481 ◽  
pp. 65-70
Author(s):  
Xiao Hui Zhang ◽  
Liu Qing ◽  
Mu Li

Based on the target detection of alignment template, the paper designs a lane alignment template by using correlation matching method, and combines with genetic algorithm for template stochastic matching and optimization to realize the lane detection. In order to solve the real-time problem of lane detection algorithm based on genetic algorithm, this paper uses the high performance multi-core DSP chip TMS320C6474 as the core, combines with high-speed data transmission technology of Rapid10, realizes the hardware parallel processing of the lane detection algorithm. By Rapid10 bus, the data transmission speed between the DSP and the DSP can reach 3.125Gbps, it basically realizes transmission without delay, and thereby solves the high speed transmission of the large data quantity between processor. The experimental results show that, no matter the calculated lane line, or the running time is better than the single DSP and PC at the parallel C6474 platform. In addition, the road detection is accurate and reliable, and it has good robustness.


2009 ◽  
Vol 17 (1-2) ◽  
pp. 43-57 ◽  
Author(s):  
Michael Kistler ◽  
John Gunnels ◽  
Daniel Brokenshire ◽  
Brad Benton

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i1processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine™2 architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.


Sign in / Sign up

Export Citation Format

Share Document