AN INTERCONNECT ALLOCATION ALGORITHM FOR PERFORMANCE-DRIVEN DATAPATH SYNTHESIS

1996 ◽  
Vol 06 (04) ◽  
pp. 403-423
Author(s):  
YOUNG-NO KIM ◽  
HAE DONG LEE ◽  
SUN YOUNG HWANG

This paper presents the design of a performance-driven interconnect allocation algorithm. The proposed algorithm is based on the idea that the data transfer time can be reduced by balancing the load for specific hardware modules on possible critical path, such that the clock period can be minimized. By performing load balancing for only the communication lines on critical paths, the proposed algorithm generates interconnection structures with minimum delays. Experimental results confirm the effectiveness of the algorithm by constructing the interconnection structures with minimized clock periods for several benchmark circuits available from the literature.

2021 ◽  
Vol 11 (15) ◽  
pp. 7169
Author(s):  
Mohamed Allouche ◽  
Tarek Frikha ◽  
Mihai Mitrea ◽  
Gérard Memmi ◽  
Faten Chaabane

To bridge the current gap between the Blockchain expectancies and their intensive computation constraints, the present paper advances a lightweight processing solution, based on a load-balancing architecture, compatible with the lightweight/embedding processing paradigms. In this way, the execution of complex operations is securely delegated to an off-chain general-purpose computing machine while the intimate Blockchain operations are kept on-chain. The illustrations correspond to an on-chain Tezos configuration and to a multiprocessor ARM embedded platform (integrated into a Raspberry Pi). The performances are assessed in terms of security, execution time, and CPU consumption when achieving a visual document fingerprint task. It is thus demonstrated that the advanced solution makes it possible for a computing intensive application to be deployed under severely constrained computation and memory resources, as set by a Raspberry Pi 3. The experimental results show that up to nine Tezos nodes can be deployed on a single Raspberry Pi 3 and that the limitation is not derived from the memory but from the computation resources. The execution time with a limited number of fingerprints is 40% higher than using a classical PC solution (value computed with 95% relative error lower than 5%).


2021 ◽  
Vol 11 (14) ◽  
pp. 6486
Author(s):  
Mei-Ling Chiang ◽  
Wei-Lun Su

NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load between cores occurs, the kernel scheduler’s load balancing mechanism then migrates threads between cores or across NUMA nodes. Remote memory access is required for a thread to access memory on the previous node, which degrades performance. Threads to be migrated must be selected effectively and efficiently since the related operations run in the critical path of the kernel scheduler. This study focuses on improving inter-node load balancing for multithreaded applications. We propose a thread-aware selection policy that considers the distribution of threads on nodes for each thread group while migrating one thread for inter-node load balancing. The thread is selected for which its thread group has the least exclusive thread distribution, and thread members are distributed more evenly on nodes. This has less influence on data mapping and thread mapping for the thread group. We further devise several enhancements to eliminate superfluous evaluations for multithreaded processes, so the selection procedure is more efficient. The experimental results for the commonly used PARSEC 3.0 benchmark suite show that the modified Linux kernel with the proposed selection policy increases performance by 10.7% compared with the unmodified Linux kernel.


2021 ◽  
Vol 9 (1) ◽  
pp. 17-24
Author(s):  
Mia Syafrina ◽  
Fandy Bestario Harlan

Construction projects are generally the most high-risk businesses, especially shipbuilding projects. Efforts to reduce the risk can be done by minimizing the potential risk. This study aims to see potential high risk and prevent delays in the completion of ship construction using the Critical Path Method CPM at PT. XYZ. By using the Critical Path Method CPM critical paths can be given more attention so that they will not interfere ship construction projects. In addition, it is also a form of anticipation if there is a delay, it is possible to reschedule.


2017 ◽  
Vol 21 (1) ◽  
pp. 3
Author(s):  
Burhan Khurshid

Generalized Parallel Counters (GPCs) are frequently used in constructing high speed compressor trees. Previous work has focused on achieving efficient mapping of GPCs on FPGAs by using a combination of general Look-up table (LUT) fabric and specialized fast carry chains. The  resulting structures are purely combinational and cannot be efficiently pipelined to achieve the potential FPGA performance. In this paper, we take an alternate approach and try to eliminate the fast carry chain from the GPC structure. We present a heuristic that maps GPCs on FPGAS using only general LUT fabric. The resultant GPCs are then easily re-timed by placing registers at the fan-out nodes of each LUT. We have used our heuristic on various GPCs reported in prior work. Our heuristic successfully eliminates the carry chain from the GPC structure with the same LUT count in most of the cases. Experimental results using Xilinx Kintex-7 FPGAs show a considerable reduction in critical path and dynamic power dissipation with same area utilization in most of the cases.


Stroke ◽  
2017 ◽  
Vol 48 (suppl_1) ◽  
Author(s):  
Tzu-Ching Wu ◽  
Navdeep Sangha ◽  
Feryal N Elorr ◽  
Edgar Olivas ◽  
Christy M Ankrom ◽  
...  

Background: The transfer process for patients with large vessel occlusions from a community hospital to an intra-arterial therapy (IAT)-capable center often involves multiple teams of physicians and administrative personnel, leading to delays in care. Objective We compared time metrics for spoke drip-and-ship telemedicine (TM) patients transferred for IAT to comprehensive stroke centers (CSC) in two different health systems: Kaiser Permanente (KP) with an integrated health care system of spokes and a 50 mile range using ambulances for transfer vs UTHealth (UTH), where patients are transferred by helicopter from varying health systems ranging up to 200 miles from the hub. Methods: We retrospectively identified patients in the KP and UTH networks transferred from TM spokes to the CSC (KP—6 spokes and UTH -17 spokes). From 9/15 to 4/16, a total of 79 TM patients (KP-28 patients, UTH-51 patients) were transferred to the respective hubs for evaluation of IAT. Baseline clinical data, transfer, and IAT metrics were abstracted. Results: On average, it takes ~90 minutes for a TM patient to arrive at the CSC hub once accepted by the transfer center. Patients in the KP Network arrive at the hub faster than UTH patients, but IAT metrics/outcomes are comparable. Over 50% of the patients did not undergo IAT on hub arrival mostly due to lack of clot on CTA (20/45) or symptom improvement (9/45). Conclusion: In two large, yet different TM networks, the transfer time from spoke to hub needs to be shortened. Areas for improvement include spoke arrival to transfer acceptance and transfer acceptance to hub arrival. A prospective study is underway to develop best practice time parameters for this complex process of identifying and transferring patients eligible for IAT.


2012 ◽  
pp. 502-516
Author(s):  
Muzhou Xiong ◽  
Hai Jin

In this chapter, two algorithms have been presented for supporting efficient data transfer in the Grid environment. From a node’s perspective, a multiple data transfer channel can be formed by selecting some other nodes as relays in data transfer. One algorithm requires the sender to be aware of the global connection information while another does not. Experimental results indicate that both algorithms can transfer data efficiently under various circumstances.


2019 ◽  
Vol 9 (1) ◽  
pp. 5
Author(s):  
Mini Jayakrishnan ◽  
Alan Chang ◽  
Tony Tae-Hyoung Kim

Energy efficient semiconductor chips are in high demand to cater the needs of today’s smart products. Advanced technology nodes insert high design margins to deal with rising variations at the cost of power, area and performance. Existing run time resilience techniques are not cost effective due to the additional circuits involved. In this paper, we propose a design time resilience technique using a clock stretched flip-flop to redistribute the available slack in the processor pipeline to the critical paths. We use the opportunistic slack to redesign the critical fan in logic using logic reshaping, better than worst case sigma corner libraries and multi-bit flip-flops to achieve power and area savings. Experimental results prove that we can tune the logic and the library to get significant power and area savings of 69% and 15% in the execute pipeline stage of the processor compared to the traditional worst-case design. Whereas, existing run time resilience hardware results in 36% and 2% power and area overhead respectively.


Sign in / Sign up

Export Citation Format

Share Document