scholarly journals A New VLSI Architecture of Parallel Multiplier–Accumulator Based on Radix-2 Modified Booth Algorithm

Author(s):  
P.Sasi Bala ◽  
S. Raghavendra

In this paper, we proposed a new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic.By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The CSA propagates the carries to the least significant bits of the partial products and generates the least significant bits in advance to decrease the number of the input bits of the final adder. Also, the proposed MAC accumulates the intermediate results in the type of sum and carry bits instead of the output of the final adder, which made it possible to optimize the pipeline scheme to improve the performance. The proposed architecture was synthesized with 250, 180 and 130 m, and 90 nm standard CMOS library. Based on the theoretical and experimental estimation, we analyzed the results such as the amount of hardware resources, delay, and pipelining scheme. We used Sakurai’s alpha power law for the delay modeling. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.

Author(s):  
Mr.M.V. Sathish ◽  
Mrs. Sailaja

A new architecture of multiplier-andaccumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposing method CSA tree uses 1’s-complement-based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. The proposed MAC showed the superior properties to the standard design in many ways and performance twice as much as the previous research in the similar clock frequency. We expect that the proposed MAC can be adapted to various fields requiring high performance such as the signal processing areas.


2007 ◽  
Vol 16 (03) ◽  
pp. 357-378
Author(s):  
PEDRO TRANCOSO

Computer systems have evolved significantly in the last years leading to high-performance systems. This, however, has come with a cost of large power dissipation. As such, power-awareness has become a major factor in processor design. Therefore, it is important to have a complete understanding of the power and performance behavior of all processor components. In order to achieve this, the current work presents a comprehensive analysis of power-performance efficiency for different high-end microarchitecture configurations using three different workloads: multimedia, scientific, and database. The objectives of this work are: (1) to analyze and compare the power-performance efficiency for different workloads; (2) to present a sensitivity analysis for the microarchitecture parameters in order to identify which ones are more sensitive to changes in terms of power-performance efficiency; and (3) to propose power-performance efficient configurations for each workload. The simulation results show that the multimedia workload is the one achieving the highest efficiency but the database workload is the most sensitive to parameter changes. In addition, the results also show that the parameter sensitivity depends significantly on the workload. While the issue width and clock frequency present very high sensitivity across all workloads (approximately 100%), for the database workload, the first-level instruction cache size shows an even higher sensitivity (149%). The correct configuration of these microarchitecture parameters is essential. A careless configuration of a single parameter from a baseline setup may result in a loss of the power-performance efficiency of up to 99%. Finally, carefully tuning multiple parameters simultaneously may result in gains up to 154% over the power-performance efficiency of the baseline configuration.


Author(s):  
Issaku Fujita ◽  
Kotaro Machii ◽  
Teruaki Sakata

Moisture Separator Reheaters (MSRs) of Nuclear power plants, especially 1st generation type (commercial operation started from between 1970 and 1982), has been suffered from various problems like severe erosion, moisture separation performance deterioration, drain sub cooling. To solve these problems and performance improvement, improved MSR was developed. At the new MSR, high performance SS439 stainless steel round type tube bundle was applied, where heating steam distribution is optimized by orifice plate in order to minimize the drain sub cooling. Based on the CFD approach, cycle steam distribution was optimized and FAC resistant material application for the internal parts of MSRs was determined. As a result, pressure drop was reduced by 0.6% against the HP turbine exhaust pressure. Performance of moisture separation was improved by the latest chevron type separator. Where, the reverse pressure is locally caused at the drainage area of the separator because remarkable longitudinal pressure distribution is formed by the high-speed steam flow in the manifold. Then, a new moisture separation structure was developed in consideration of the influence that this reverse pressure gave to the separator performance.


Author(s):  
M. Suhasini ◽  
K. Prabhu Kumar ◽  
P. Srinivas

A new architecture of multiplier-and-accumulator (MAC) for high-speed arithmetic. By combining multiplication with accumulation and devising a hybrid type of carry save adder (CSA), the performance was improved. Since the accumulator that has the largest delay in MAC was merged into CSA, the overall performance was elevated. The proposed CSA tree uses 1’scomplement- based radix-2 modified Booth’s algorithm (MBA) and has the modified array for the sign extension in order to increase the bit density of the operands. Moreover, depending on data switching activity statistically reduce the power consumption.


2021 ◽  
Vol 7 (3) ◽  
pp. 22-26
Author(s):  
Hai P. Le ◽  
◽  
Aladin Azyegh ◽  
Jugdutt Singh ◽  
◽  
...  

Data acquisition (DAQ) in the general sense is the process of collecting information from the real world. For engineers and scientists, this data is mostly numerical and is usually collected, stored and analysed using computers. However, most of the input signals cannot be read directly by digital computers. Because they are generally analog signals distinguished by continuous values, while computers can only recognise digital signals containing only the on/off levels. DAQ systems are therefore inevitably necessary, as they include the translation requirements from analog signals to digital data. For this reason, they have become significant in wide range of applications in modern science and technology [1]. The paper precents the disign of a 12-bit high-speed low-power Data Acquisition (DAQ) Chip. In this paper, the disigns of the building block components are aimed at high-accuracy along with high-speed and low power dissipation. A modifided flash Analog-to-Digital converter (ADC) was used instead of the traditional flash proposed DAQ chip operates at 1 GHz master clock frequency and achieves a sampling speed of 125 MS/s. It dissipates only 64.9 mW of power as compared to 97.2 mW when traditional flash ADC was used.


2019 ◽  
Vol 28 (14) ◽  
pp. 1950237
Author(s):  
Ling Zheng ◽  
Zhiliang Qiu ◽  
Weina Wang ◽  
Weitao Pan ◽  
Shiyong Sun ◽  
...  

Network flow classification is a key function in high-speed switches and routers. It directly determines the performance of network devices. With the development of the Internet and various kinds of applications, the flow classification needs to support multi-dimensional fields, large rule sets, and sustain a high throughput. Software-based classification cannot meet the performance requirement as high as 100 Gbps. FPGA-based flow classification methods can achieve a very high throughput. However, the range matching is still challenging. For this, this paper proposes a range supported bit vector (RSBV) method. First, the characteristic of range matching is analyzed, then the rules are pre-encoded and stored in memory. Second, the fields of an input packet header are used as addresses to read the memory, and the result of range matching is derived through pipelined Boolean operations. On this basis, bit vector for any types of fields (AFBV) is further proposed, which supports the flow classification for multi-dimensional fields efficiently, including exact matching, longest prefix matching, range matching, and arbitrary wildcard matching. The proposed methods are implemented in FPGA platform. Through a two-dimensional pipeline architecture, the AFBV can operate at a high clock frequency and can achieve a processing speed of more than 100 Gbps. Simulation results show that for a rule set of 512-bit width and 1[Formula: see text]k rules, the AFBV can achieve a throughput of 520 million packets per second (MPPS). The performance is improved by 44% compared with FSBV and 30% compared with Stride BV. The power consumption is reduced by about 43% compared with TCAM solution.


2017 ◽  
Vol MCSP2017 (01) ◽  
pp. 11-13
Author(s):  
Truptimayee Behera ◽  
Ritisnigdha Das

In our design of CMOS comparator with high performance using GPDK 180nm technology we optimize these parameters. We analyse the transient response of the schematic design and the gain is calculated in AC analysis and also we measure the power dissipation. The circuit is built by using PMOS and NMOS transistor with a body effect. A plot of phase and gain also discussed in the paper. Finally a test schematic is built and transient analysis for an input voltage of 2V is measured using Cadence virtuoso. Simulation results are presented and it shows that this design can work under high speed clock frequency 200MHz. The design has low power dissipation.


2015 ◽  
Vol 2015 ◽  
pp. 1-13 ◽  
Author(s):  
T. Kalavathi Devi ◽  
Sakthivel Palaniappan

Convolutional codes are comprehensively used as Forward Error Correction (FEC) codes in digital communication systems. For decoding of convolutional codes at the receiver end, Viterbi decoder is often used to have high priority. This decoder meets the demand of high speed and low power. At present, the design of a competent system in Very Large Scale Integration (VLSI) technology requires these VLSI parameters to be finely defined. The proposed asynchronous method focuses on reducing the power consumption of Viterbi decoder for various constraint lengths using asynchronous modules. The asynchronous designs are based on commonly used Quasi Delay Insensitive (QDI) templates, namely, Precharge Half Buffer (PCHB) and Weak Conditioned Half Buffer (WCHB). The functionality of the proposed asynchronous design is simulated and verified using Tanner Spice (TSPICE) in 0.25 µm, 65 nm, and 180 nm technologies of Taiwan Semiconductor Manufacture Company (TSMC). The simulation result illustrates that the asynchronous design techniques have 25.21% of power reduction compared to synchronous design and work at a speed of 475 MHz.


Author(s):  
Mahadevan Suryakumar ◽  
Lu-Vong T. Phan ◽  
Mathew Ma ◽  
Wajahat Ahmed

The alarming growth of power increase has presented numerous packaging challenges for high performance processors. The average power consumed by a processor is the sum of dynamic and leakage power. The dynamic power is proportional to V^2, while the leakage current (therefore leakage power) is proportional to V^b where V is the voltage and b>1 for modern processes. This means lowering voltage reduces energy consumed per clock cycle but reduces the maximum frequency at which the processor can operate at. Since reducing voltage reduces power faster than it does frequency, integrating more cores into the processor would result in better performance/power efficiency but would generate more memory accesses, driving a need for larger cache and high speed signaling [1]. In addition, the design goal to create unified package pinout for both single core and multicore product flavors adds additional constraint to create a cost effective package solution for both market segments. This paper discusses the design strategy and performance of dual die package to optimize package performance for cost.


2019 ◽  
Vol 11 (8) ◽  
pp. 179 ◽  
Author(s):  
Veronika Kirova ◽  
Kirill Karpov ◽  
Eduard Siemens ◽  
Irina Zander ◽  
Oksana Vasylenko ◽  
...  

The presented work is a result of extended research and analysis on timing methods precision, their efficiency in different virtual environments and the impact of timing precision on the performance of high-speed networks applications. We investigated how timer hardware is shared among heavily CPU- and I/O-bound tasks on a virtualized OS as well as on bare OS. By replacing the invoked timing methods within a well-known application for estimation of available path bandwidth, we provide the analysis of their impact on estimation accuracy. We show that timer overhead and precision are crucial for high-performance network applications, and low-precision timing methods usage, e.g., the delays and overheads issued by virtualization result in the degradation of the virtual environment. Furthermore, in this paper, we provide confirmation that, by using the methods we intentionally developed for both precise timing operations and AvB estimation, it is possible to overcome the inefficiency of standard time-related operations and overhead that comes with the virtualization. The impacts of negative virtualization factors were investigated in five different environments to define the most optimal virtual environment for high-speed network applications.


Sign in / Sign up

Export Citation Format

Share Document