A New VLSI Architecture of Parallel Multiplier–Accumulator Based on Radix-2 Modified Booth Algorithm

International Journal of Instrumentation Control and Automation ◽

10.47893/ijica.2011.1036 ◽

2011 ◽

pp. 196-202

Author(s):

P.Sasi Bala ◽

S. Raghavendra

Keyword(s):

High Speed ◽

High Performance ◽

Vlsi Architecture ◽

Alpha Power ◽

Clock Frequency ◽

Parallel Multiplier ◽

Standard Design ◽

Overall Performance ◽

And Performance ◽

Least Significant Bits

In this paper, we proposed a new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic.By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. The proposed architecture was synthesized with 250, 180 and 130 m, and 90 nm standard CMOS library. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. We used Sakurai’s alpha power law for the delay modeling. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

Download Full-text

VLSI ARCHITECTURE OF PARALLEL MULTIPLIER– ACCUMULATOR BASED ON RADIX-2 MODIFIED BOOTH ALGORITHM

International Journal of Electronics and Electical Engineering ◽

10.47893/ijeee.2012.1009 ◽

2012 ◽

pp. 40-46

Author(s):

Mr.M.V. Sathish ◽

Mrs. Sailaja

Keyword(s):

Signal Processing ◽

High Speed ◽

High Performance ◽

Vlsi Architecture ◽

Clock Frequency ◽

Parallel Multiplier ◽

Hybrid Type ◽

Standard Design ◽

Overall Performance ◽

And Performance

A new architecture of multiplier-andaccumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposing method CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

Download Full-text

WATT MATTERS MOST? DESIGN SPACE EXPLORATION OF HIGH-PERFORMANCE MICROPROCESSORS FOR POWER-PERFORMANCE EFFICIENCY

Journal of Circuits System and Computers ◽

10.1142/s0218126607003721 ◽

2007 ◽

Vol 16 (03) ◽

pp. 357-378

Author(s):

PEDRO TRANCOSO

Keyword(s):

High Performance ◽

Design Space Exploration ◽

High Sensitivity ◽

Clock Frequency ◽

Power Performance ◽

Large Power ◽

Multiple Parameters ◽

And Performance ◽

The One ◽

Power Awareness

Computer systems have evolved significantly in the last years leading to high-performance systems. This, however, has come with a cost of large power dissipation. As such, power-awareness has become a major factor in processor design. Therefore, it is important to have a complete understanding of the power and performance behavior of all processor components. In order to achieve this, the current work presents a comprehensive analysis of power-performance efficiency for different high-end microarchitecture configurations using three different workloads: multimedia, scientific, and database. The objectives of this work are: (1) to analyze and compare the power-performance efficiency for different workloads; (2) to present a sensitivity analysis for the microarchitecture parameters in order to identify which ones are more sensitive to changes in terms of power-performance efficiency; and (3) to propose power-performance efficient configurations for each workload. The simulation results show that the multimedia workload is the one achieving the highest efficiency but the database workload is the most sensitive to parameter changes. In addition, the results also show that the parameter sensitivity depends significantly on the workload. While the issue width and clock frequency present very high sensitivity across all workloads (approximately 100%), for the database workload, the first-level instruction cache size shows an even higher sensitivity (149%). The correct configuration of these microarchitecture parameters is essential. A careless configuration of a single parameter from a baseline setup may result in a loss of the power-performance efficiency of up to 99%. Finally, carefully tuning multiple parameters simultaneously may result in gains up to 154% over the power-performance efficiency of the baseline configuration.

Download Full-text

Development of High Performance Moisture Separator Reheater

ASME 2009 Power Conference ◽

10.1115/power2009-81092 ◽

2009 ◽

Cited By ~ 2

Author(s):

Issaku Fujita ◽

Kotaro Machii ◽

Teruaki Sakata

Keyword(s):

High Speed ◽

Nuclear Power ◽

High Performance ◽

Power Plants ◽

Tube Bundle ◽

Separation Performance ◽

Heating Steam ◽

Severe Erosion ◽

Moisture Separator ◽

And Performance

Moisture Separator Reheaters (MSRs) of Nuclear power plants, especially 1st generation type (commercial operation started from between 1970 and 1982), has been suffered from various problems like severe erosion, moisture separation performance deterioration, drain sub cooling. To solve these problems and performance improvement, improved MSR was developed. At the new MSR, high performance SS439 stainless steel round type tube bundle was applied, where heating steam distribution is optimized by orifice plate in order to minimize the drain sub cooling. Based on the CFD approach, cycle steam distribution was optimized and FAC resistant material application for the internal parts of MSRs was determined. As a result, pressure drop was reduced by 0.6% against the HP turbine exhaust pressure. Performance of moisture separation was improved by the latest chevron type separator. Where, the reverse pressure is locally caused at the drainage area of the separator because remarkable longitudinal pressure distribution is formed by the high-speed steam flow in the manifold. Then, a new moisture separation structure was developed in consideration of the influence that this reverse pressure gave to the separator performance.

Download Full-text

Multiplier Design and Performance Estimation with Distributed Arithmetic Algorithm

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2016.1345 ◽

2016 ◽

pp. 90-95

Author(s):

M. Suhasini ◽

K. Prabhu Kumar ◽

P. Srinivas

Keyword(s):

Power Consumption ◽

High Speed ◽

Performance Estimation ◽

Switching Activity ◽

Distributed Arithmetic ◽

Hybrid Type ◽

Arithmetic Algorithm ◽

Overall Performance ◽

Sign Extension ◽

And Performance

A new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1’scomplement- based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. Moreover, depending on data switching activity statistically reduce the power consumption.

Download Full-text

IMPLEMENTATION OF A REDUCED COMPLEXITY HIGH PERFORMANCE DATA ACQUISITION CHIP USING 0.18 MICRON TECHNOLOGY

SYNCHROINFO JOURNAL ◽

10.36724/2664-066x-2021-7-3-22-26 ◽

2021 ◽

Vol 7 (3) ◽

pp. 22-26

Author(s):

Hai P. Le ◽

◽

Aladin Azyegh ◽

Jugdutt Singh ◽

◽

...

Keyword(s):

Low Power ◽

Data Acquisition ◽

High Speed ◽

High Performance ◽

Modern Science ◽

Digital Data ◽

Clock Frequency ◽

Flash Adc ◽

Analog Signals ◽

Wide Range

Data acquisition (DAQ) in the general sense is the process of collecting information from the real world. For engineers and scientists, this data is mostly numerical and is usually collected, stored and analysed using computers. However, most of the input signals cannot be read directly by digital computers. Because they are generally analog signals distinguished by continuous values, while computers can only recognise digital signals containing only the on/off levels. DAQ systems are therefore inevitably necessary, as they include the translation requirements from analog signals to digital data. For this reason, they have become significant in wide range of applications in modern science and technology [1]. The paper precents the disign of a 12-bit high-speed low-power Data Acquisition (DAQ) Chip. In this paper, the disigns of the building block components are aimed at high-accuracy along with high-speed and low power dissipation. A modifided flash Analog-to-Digital converter (ADC) was used instead of the traditional flash proposed DAQ chip operates at 1 GHz master clock frequency and achieves a sampling speed of 125 MS/s. It dissipates only 64.9 mW of power as compared to 97.2 mW when traditional flash ADC was used.

Download Full-text

AFBV: A High-Performance Network Flow Classification Method for Multi-Dimensional Fields and FPGA Implementation

Journal of Circuits System and Computers ◽

10.1142/s0218126619502372 ◽

2019 ◽

Vol 28 (14) ◽

pp. 1950237

Author(s):

Ling Zheng ◽

Zhiliang Qiu ◽

Weina Wang ◽

Weitao Pan ◽

Shiyong Sun ◽

...

Keyword(s):

High Throughput ◽

Network Flow ◽

High Speed ◽

High Performance ◽

Clock Frequency ◽

Pipeline Architecture ◽

Exact Matching ◽

Flow Classification ◽

Rule Sets ◽

Bit Vector

Network flow classification is a key function in high-speed switches and routers. It directly determines the performance of network devices. With the development of the Internet and various kinds of applications, the flow classification needs to support multi-dimensional fields, large rule sets, and sustain a high throughput. Software-based classification cannot meet the performance requirement as high as 100 Gbps. FPGA-based flow classification methods can achieve a very high throughput. However, the range matching is still challenging. For this, this paper proposes a range supported bit vector (RSBV) method. First, the characteristic of range matching is analyzed, then the rules are pre-encoded and stored in memory. Second, the fields of an input packet header are used as addresses to read the memory, and the result of range matching is derived through pipelined Boolean operations. On this basis, bit vector for any types of fields (AFBV) is further proposed, which supports the flow classification for multi-dimensional fields efficiently, including exact matching, longest prefix matching, range matching, and arbitrary wildcard matching. The proposed methods are implemented in FPGA platform. Through a two-dimensional pipeline architecture, the AFBV can operate at a high clock frequency and can achieve a processing speed of more than 100 Gbps. Simulation results show that for a rule set of 512-bit width and 1[Formula: see text]k rules, the AFBV can achieve a throughput of 520 million packets per second (MPPS). The performance is improved by 44% compared with FSBV and 30% compared with Stride BV. The power consumption is reduced by about 43% compared with TCAM solution.

Download Full-text

Design of Low Power CMOS Comparator using 180nm Technology for ADC Application

Circulation in Computer Science ◽

10.22632/ccs-2017-mcsp027 ◽

2017 ◽

Vol MCSP2017 (01) ◽

pp. 11-13

Author(s):

Truptimayee Behera ◽

Ritisnigdha Das

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

High Performance ◽

Input Voltage ◽

Clock Frequency ◽

Nmos Transistor ◽

Body Effect ◽

Low Power Dissipation ◽

Low Power Cmos

In our design of CMOS comparator with high performance using GPDK 180nm technology we optimize these parameters. We analyse the transient response of the schematic design and the gain is calculated in AC analysis and also we measure the power dissipation. The circuit is built by using PMOS and NMOS transistor with a body effect. A plot of phase and gain also discussed in the paper. Finally a test schematic is built and transient analysis for an input voltage of 2V is measured using Cadence virtuoso. Simulation results are presented and it shows that this design can work under high speed clock frequency 200MHz. The design has low power dissipation.

Download Full-text

An Asynchronous Low Power and High Performance VLSI Architecture for Viterbi Decoder Implemented with Quasi Delay Insensitive Templates

The Scientific World JOURNAL ◽

10.1155/2015/621012 ◽

2015 ◽

Vol 2015 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

T. Kalavathi Devi ◽

Sakthivel Palaniappan

Keyword(s):

Low Power ◽

Communication Systems ◽

High Speed ◽

High Performance ◽

Large Scale ◽

Forward Error Correction ◽

Vlsi Architecture ◽

Convolutional Codes ◽

Viterbi Decoder ◽

Asynchronous Design

Convolutional codes are comprehensively used as Forward Error Correction (FEC) codes in digital communication systems. For decoding of convolutional codes at the receiver end, Viterbi decoder is often used to have high priority. This decoder meets the demand of high speed and low power. At present, the design of a competent system in Very Large Scale Integration (VLSI) technology requires these VLSI parameters to be finely defined. The proposed asynchronous method focuses on reducing the power consumption of Viterbi decoder for various constraint lengths using asynchronous modules. The asynchronous designs are based on commonly used Quasi Delay Insensitive (QDI) templates, namely, Precharge Half Buffer (PCHB) and Weak Conditioned Half Buffer (WCHB). The functionality of the proposed asynchronous design is simulated and verified using Tanner Spice (TSPICE) in 0.25 µm, 65 nm, and 180 nm technologies of Taiwan Semiconductor Manufacture Company (TSMC). The simulation result illustrates that the asynchronous design techniques have 25.21% of power reduction compared to synchronous design and work at a speed of 475 MHz.

Download Full-text

Dual Die Package Design Strategy and Performance

Advances in Electronic Packaging, Parts A, B, and C ◽

10.1115/ipack2005-73391 ◽

2005 ◽

Cited By ~ 1

Author(s):

Mahadevan Suryakumar ◽

Lu-Vong T. Phan ◽

Mathew Ma ◽

Wajahat Ahmed

Keyword(s):

Power Efficiency ◽

High Speed ◽

High Performance ◽

Clock Cycle ◽

Average Power ◽

Cost Effective ◽

Design Strategy ◽

Leakage Power ◽

Memory Accesses ◽

And Performance

The alarming growth of power increase has presented numerous packaging challenges for high performance processors. The average power consumed by a processor is the sum of dynamic and leakage power. The dynamic power is proportional to V^2, while the leakage current (therefore leakage power) is proportional to V^b where V is the voltage and b>1 for modern processes. This means lowering voltage reduces energy consumed per clock cycle but reduces the maximum frequency at which the processor can operate at. Since reducing voltage reduces power faster than it does frequency, integrating more cores into the processor would result in better performance/power efficiency but would generate more memory accesses, driving a need for larger cache and high speed signaling [1]. In addition, the design goal to create unified package pinout for both single core and multicore product flavors adds additional constraint to create a cost effective package solution for both market segments. This paper discusses the design strategy and performance of dual die package to optimize package performance for cost.

Download Full-text

Impact of Modern Virtualization Methods on Timing Precision and Performance of High-Speed Applications

Future Internet ◽

10.3390/fi11080179 ◽

2019 ◽

Vol 11 (8) ◽

pp. 179 ◽

Cited By ~ 1

Author(s):

Veronika Kirova ◽

Kirill Karpov ◽

Eduard Siemens ◽

Irina Zander ◽

Oksana Vasylenko ◽

...

Keyword(s):

Virtual Environments ◽

Virtual Environment ◽

High Speed ◽

High Performance ◽

Estimation Accuracy ◽

Network Applications ◽

High Speed Network ◽

Timing Precision ◽

And Performance ◽

The Impact

The presented work is a result of extended research and analysis on timing methods precision, their efficiency in different virtual environments and the impact of timing precision on the performance of high-speed networks applications. We investigated how timer hardware is shared among heavily CPU- and I/O-bound tasks on a virtualized OS as well as on bare OS. By replacing the invoked timing methods within a well-known application for estimation of available path bandwidth, we provide the analysis of their impact on estimation accuracy. We show that timer overhead and precision are crucial for high-performance network applications, and low-precision timing methods usage, e.g., the delays and overheads issued by virtualization result in the degradation of the virtual environment. Furthermore, in this paper, we provide confirmation that, by using the methods we intentionally developed for both precise timing operations and AvB estimation, it is possible to overcome the inefficiency of standard time-related operations and overhead that comes with the virtualization. The impacts of negative virtualization factors were investigated in five different environments to define the most optimal virtual environment for high-speed network applications.

Download Full-text