ASYNCHRONOUS INSTRUCTION CACHE MEMORY FOR AVERAGE-CASE PERFORMANCE

2014 ◽  
Vol 23 (05) ◽  
pp. 1450063 ◽  
Author(s):  
JE-HOON LEE ◽  
HYUN GUG CHO

This paper presents an asynchronous instruction cache memory for average-case performance, rather than worst-case performance. Even though the proposed instruction cache design is based on a fixed delay model, it can achieve high throughput by employing a new memory segmentation technique that divides cache memory cell arrays into multiple memory segments. The conventional bit-line memory segmentation divides a whole memory system into multiple segments so that all memory segments have the same size. On the contrary, we propose a new bit-line segmentation technique for the cache memory which consists of multiple segments but all the memory segments have the same delay bound for themselves. We use the resister-capacitor (R-C) modeling of bit-line delay for content addressable memory–random access memory (CAM–RAM) structure in a cache in order to estimate the total bit-line delay. Then, we decide the number of segments to trade-off between the throughput and complexity of a cache system. We synthesized a 128 KB cache memory consisting of various segments from 1 to 16 using Hynix 0.35-μm CMOS process. From the simulation results, our implementation with dividing factor 4 and 16 can reduce the average cache access time to 28% and 35% when compared to the non-segmented counterpart system. It also shows that our implementation can reduce the average cache access time by 11% and 17% when compared to the bit-line segmented cache that consists of the same number of segments that have the same size.

Author(s):  
◽  

Data or instructions that are regularly used are saved in cache so that it is very easy to retrieve for the purpose of increase the cache performance. Evaluating the execution of multi-core systems the part of the cache memory is very important. A multicore processor is shared circuit in which two or more processors are joined to enhance the performance and perform multiple tasks. This paper describes the performance of cache memory based on cache access time, miss rate and miss penalty. Cache mapping methods are defined to increase the performance of cache but it face many difficulties. Some methods and algorithms are used to decrease these difficulties. In this paper describes the study of recent competing processors to evaluate the cache memory performance.


Author(s):  
Sunil Pathak

Background: The significant work has been present to identify suspects, gathering information and examining any videos from CCTV Footage. This exploration work expects to recognize suspicious exercises, i.e. object trade, passage of another individual, peeping into other's answer sheet and individual trade from the video caught by a reconnaissance camera amid examinations. This requires the procedure of face acknowledgment, hand acknowledgment and distinguishing the contact between the face and hands of a similar individual and that among various people. Methods: Segmented frames has given as input to obtain foreground image with the help of Gaussian filtering and background modeling method. Suh foreground images has given to Activity Recognition model to detect normal activity or suspicious activity. Results: Accuracy rate, Precision and Recall are calculate for activities detection, contact detection for Best Case, Average Case and Worst Case. Simulation results are compare with performance parameter such as Material Exchange, Position Exchange, and Introduction of a new person, Face and Hand Detection and Multi Person Scenario. Conclusion: In this paper, a framework is prepared for suspect detection. This framework will absolutely realize an unrest in the field of security observation in the training area.


2018 ◽  
Vol 27 (07) ◽  
pp. 1850116
Author(s):  
Yuanxin Bao ◽  
Wenyuan Li

A high-speed low-supply-sensitivity temperature sensor is presented for thermal monitoring of system on a chip (SoC). The proposed sensor transforms the temperature to complementary to absolute temperature (CTAT) frequency and then into digital code. A CTAT voltage reference supplies a temperature-sensitive ring oscillator, which enhances temperature sensitivity and conversion rate. To reduce the supply sensitivity, an operational amplifier with a unity gain for power supply is proposed. A frequency-to-digital converter with piecewise linear fitting is used to convert the frequency into the digital code corresponding to temperature and correct nonlinearity. These additional characteristics are distinct from the conventional oscillator-based temperature sensors. The sensor is fabricated in a 180[Formula: see text]nm CMOS process and occupies a small area of 0.048[Formula: see text]mm2 excluding bondpads. After a one-point calibration, the sensor achieves an inaccuracy of [Formula: see text][Formula: see text]1.5[Formula: see text]C from [Formula: see text]45[Formula: see text]C to 85[Formula: see text]C under a supply voltage of 1.4–2.4[Formula: see text]V showing a worst-case supply sensitivity of 0.5[Formula: see text]C/V. The sensor maintains a high conversion rate of 45[Formula: see text]KS/s with a fine resolution of 0.25[Formula: see text]C/LSB, which is suitable for SoC thermal monitoring. Under a supply voltage of 1.8[Formula: see text]V, the maximum energy consumption per conversion is only 7.8[Formula: see text]nJ at [Formula: see text]45[Formula: see text]C.


2014 ◽  
Vol 2014 ◽  
pp. 1-11
Author(s):  
Wei Zhou ◽  
Zilong Tan ◽  
Shaowen Yao ◽  
Shipu Wang

Resource location in structured P2P system has a critical influence on the system performance. Existing analytical studies of Chord protocol have shown some potential improvements in performance. In this paper a splay tree-based new Chord structure called SChord is proposed to improve the efficiency of locating resources. We consider a novel implementation of the Chord finger table (routing table) based on the splay tree. This approach extends the Chord finger table with additional routing entries. Adaptive routing algorithm is proposed for implementation, and it can be shown that hop count is significantly minimized without introducing any other protocol overheads. We analyze the hop count of the adaptive routing algorithm, as compared to Chord variants, and demonstrate sharp upper and lower bounds for both worst-case and average case settings. In addition, we theoretically analyze the hop reducing in SChord and derive the fact that SChord can significantly reduce the routing hops as compared to Chord. Several simulations are presented to evaluate the performance of the algorithm and support our analytical findings. The simulation results show the efficiency of SChord.


Algorithmica ◽  
2021 ◽  
Author(s):  
Jie Zhang

AbstractApart from the principles and methodologies inherited from Economics and Game Theory, the studies in Algorithmic Mechanism Design typically employ the worst-case analysis and design of approximation schemes of Theoretical Computer Science. For instance, the approximation ratio, which is the canonical measure of evaluating how well an incentive-compatible mechanism approximately optimizes the objective, is defined in the worst-case sense. It compares the performance of the optimal mechanism against the performance of a truthful mechanism, for all possible inputs. In this paper, we take the average-case analysis approach, and tackle one of the primary motivating problems in Algorithmic Mechanism Design—the scheduling problem (Nisan and Ronen, in: Proceedings of the 31st annual ACM symposium on theory of computing (STOC), 1999). One version of this problem, which includes a verification component, is studied by Koutsoupias (Theory Comput Syst 54(3):375–387, 2014). It was shown that the problem has a tight approximation ratio bound of $$(n+1)/2$$ ( n + 1 ) / 2 for the single-task setting, where n is the number of machines. We show, however, when the costs of the machines to executing the task follow any independent and identical distribution, the average-case approximation ratio of the mechanism given by Koutsoupias (Theory Comput Syst 54(3):375–387, 2014) is upper bounded by a constant. This positive result asymptotically separates the average-case ratio from the worst-case ratio. It indicates that the optimal mechanism devised for a worst-case guarantee works well on average.


2010 ◽  
Vol 5 (1) ◽  
pp. 78-88 ◽  
Author(s):  
Marcelo Porto ◽  
André Silva ◽  
Sergo Almeida ◽  
Eduardo Da Costa ◽  
Sergio Bampi

This paper presents real time HDTV (High Definition Television) architecture for Motion Estimation (ME) using efficient adder compressors. The architecture is based on the Quarter Sub-sampled Diamond Search algorithm (QSDS) with Dynamic Iteration Control (DIC) algorithm. The main characteristic of the proposed architecture is the large amount of Processing Units (PUs) that are used to calculate the SAD (Sum of Absolute Difference) metric. The internal structures of the PUs are composed by a large number of addition operations to calculate the SADs. In this paper, efficient 4-2 and 8-2 adder compressors are used in the PUs architecture to achieve the performance to work with HDTV (High Definition Television) videos in real time at 30 frames per second. These adder compressors enable the simultaneous addition of 4 and 8 operands respectively. The PUs, using adder compressors, were applied to the ME architecture. The implemented architecture was described in VHDL and synthesized to FPGA and, with Leonardo Spectrum tool, to the TSMC 0.18μm CMOS standard cell technology. Synthesis results indicate that the new QSDS-DIC architecture reach the best performance result and enable gains of 12% in terms of processing rate. The architecture can reach real time for full HDTV (1920x1080 pixels) in the worst case processing 65 frames per second, and it can process 269 HDTV frames per second in the average case.


2010 ◽  
Vol DMTCS Proceedings vol. AM,... (Proceedings) ◽  
Author(s):  
Thomas Fernique ◽  
Damien Regnault

International audience This paper introduces a Markov process inspired by the problem of quasicrystal growth. It acts over dimer tilings of the triangular grid by randomly performing local transformations, called $\textit{flips}$, which do not increase the number of identical adjacent tiles (this number can be thought as the tiling energy). Fixed-points of such a process play the role of quasicrystals. We are here interested in the worst-case expected number of flips to converge towards a fixed-point. Numerical experiments suggest a $\Theta (n^2)$ bound, where $n$ is the number of tiles of the tiling. We prove a $O(n^{2.5})$ upper bound and discuss the gap between this bound and the previous one. We also briefly discuss the average-case.


2006 ◽  
Vol 6 (6) ◽  
pp. 483-494
Author(s):  
T. Tulsi ◽  
L.K. Grover ◽  
A. Patel

The standard quantum search lacks a feature, enjoyed by many classical algorithms, of having a fixed point, i.e. monotonic convergence towards the solution. Recently a fixed point quantum search algorithm has been discovered, referred to as the Phase-\pi/3 search algorithm, which gets around this limitation. While searching a database for a target state, this algorithm reduces the error probability from \epsilon to \epsilon^{2q+1} using q oracle queries, which has since been proved to be asymptotically optimal. A different algorithm is presented here, which has the same worst-case behavior as the Phase-\pi/3 search algorithm but much better average-case behavior. Furthermore the new algorithm gives \epsilon^{2q+1} convergence for all integral q, whereas the Phase-\pi/3 search algorithm requires q to be (3^{n}-1)/2 with n a positive integer. In the new algorithm, the operations are controlled by two ancilla qubits, and fixed point behavior is achieved by irreversible measurement operations applied to these ancillas. It is an example of how measurement can allow us to bypass some restrictions imposed by unitarity on quantum computing.


Sign in / Sign up

Export Citation Format

Share Document