AN INTERCONNECT ALLOCATION ALGORITHM FOR PERFORMANCE-DRIVEN DATAPATH SYNTHESIS

This paper presents the design of a performance-driven interconnect allocation algorithm. The proposed algorithm is based on the idea that the data transfer time can be reduced by balancing the load for specific hardware modules on possible critical path, such that the clock period can be minimized. By performing load balancing for only the communication lines on critical paths, the proposed algorithm generates interconnection structures with minimum delays. Experimental results confirm the effectiveness of the algorithm by constructing the interconnection structures with minimized clock periods for several benchmark circuits available from the literature.

Download Full-text

A Performance Isolation Mechanism Based on Fuzzy Technique for Web Server Load Balancing

IEICE Transactions on Communications ◽

10.1587/transcom.e92.b.1086 ◽

2009 ◽

Vol E92-B (4) ◽

pp. 1086-1093

Author(s):

Bumjoo PARK ◽

Kiejin PARK ◽

Bongjun KIM

Keyword(s):

Load Balancing ◽

Web Server ◽

Performance Isolation ◽

A Performance

Download Full-text

Lightweight Blockchain Processing. Case Study: Scanned Document Tracking on Tezos Blockchain

Applied Sciences ◽

10.3390/app11157169 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7169

Author(s):

Mohamed Allouche ◽

Tarek Frikha ◽

Mihai Mitrea ◽

Gérard Memmi ◽

Faten Chaabane

Keyword(s):

Load Balancing ◽

Relative Error ◽

Execution Time ◽

General Purpose ◽

Experimental Results ◽

Raspberry Pi ◽

Embedded Platform ◽

Memory Resources ◽

Processing Solution

To bridge the current gap between the Blockchain expectancies and their intensive computation constraints, the present paper advances a lightweight processing solution, based on a load-balancing architecture, compatible with the lightweight/embedding processing paradigms. In this way, the execution of complex operations is securely delegated to an off-chain general-purpose computing machine while the intimate Blockchain operations are kept on-chain. The illustrations correspond to an on-chain Tezos configuration and to a multiprocessor ARM embedded platform (integrated into a Raspberry Pi). The performances are assessed in terms of security, execution time, and CPU consumption when achieving a visual document fingerprint task. It is thus demonstrated that the advanced solution makes it possible for a computing intensive application to be deployed under severely constrained computation and memory resources, as set by a Raspberry Pi 3. The experimental results show that up to nine Tezos nodes can be deployed on a single Raspberry Pi 3 and that the limitation is not derived from the memory but from the computation resources. The execution time with a limited number of fingerprints is 40% higher than using a classical PC solution (value computed with 95% relative error lower than 5%).

Download Full-text

Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems

Applied Sciences ◽

10.3390/app11146486 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6486

Author(s):

Mei-Ling Chiang ◽

Wei-Lun Su

Keyword(s):

Load Balancing ◽

Critical Path ◽

Selection Procedure ◽

Remote Memory ◽

Data Mapping ◽

Linux Kernel ◽

Access Memory ◽

Selection Policy ◽

Benchmark Suite ◽

Thread Mapping

NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load between cores occurs, the kernel scheduler’s load balancing mechanism then migrates threads between cores or across NUMA nodes. Remote memory access is required for a thread to access memory on the previous node, which degrades performance. Threads to be migrated must be selected effectively and efficiently since the related operations run in the critical path of the kernel scheduler. This study focuses on improving inter-node load balancing for multithreaded applications. We propose a thread-aware selection policy that considers the distribution of threads on nodes for each thread group while migrating one thread for inter-node load balancing. The thread is selected for which its thread group has the least exclusive thread distribution, and thread members are distributed more evenly on nodes. This has less influence on data mapping and thread mapping for the thread group. We further devise several enhancements to eliminate superfluous evaluations for multithreaded processes, so the selection procedure is more efficient. The experimental results for the commonly used PARSEC 3.0 benchmark suite show that the modified Linux kernel with the proposed selection policy increases performance by 10.7% compared with the unmodified Linux kernel.

Download Full-text

Application Of CPM Methode (Critical Path Methode) In Controlling The Time 100 Teus Contrainer Ship Hull Construction Project

JURNAL AKUNTANSI EKONOMI dan MANAJEMEN BISNIS ◽

10.30871/jaemb.v9i1.3194 ◽

2021 ◽

Vol 9 (1) ◽

pp. 17-24

Author(s):

Mia Syafrina ◽

Fandy Bestario Harlan

Keyword(s):

High Risk ◽

Potential Risk ◽

Construction Projects ◽

Critical Path ◽

Construction Project ◽

Ship Hull ◽

Critical Path Method ◽

Ship Construction ◽

Critical Paths ◽

Hull Construction

Construction projects are generally the most high-risk businesses, especially shipbuilding projects. Efforts to reduce the risk can be done by minimizing the potential risk. This study aims to see potential high risk and prevent delays in the completion of ship construction using the Critical Path Method CPM at PT. XYZ. By using the Critical Path Method CPM critical paths can be given more attention so that they will not interfere ship construction projects. In addition, it is also a form of anticipation if there is a delay, it is possible to reschedule.

Download Full-text

LUT Based Generalized Parallel Counters for State - of - art FPGAs

Electronics ETF ◽

10.7251/els1721003k ◽

2017 ◽

Vol 21 (1) ◽

pp. 3

Author(s):

Burhan Khurshid

Keyword(s):

Power Dissipation ◽

High Speed ◽

Critical Path ◽

Experimental Results ◽

Considerable Reduction ◽

Prior Work ◽

Dynamic Power ◽

Look Up Table ◽

State Of Art

Generalized Parallel Counters (GPCs) are frequently used in constructing high speed compressor trees. Previous work has focused on achieving efficient mapping of GPCs on FPGAs by using a combination of general Look-up table (LUT) fabric and specialized fast carry chains. The resulting structures are purely combinational and cannot be efficiently pipelined to achieve the potential FPGA performance. In this paper, we take an alternate approach and try to eliminate the fast carry chain from the GPC structure. We present a heuristic that maps GPCs on FPGAS using only general LUT fabric. The resultant GPCs are then easily re-timed by placing registers at the fan-out nodes of each LUT. We have used our heuristic on various GPCs reported in prior work. Our heuristic successfully eliminates the carry chain from the GPC structure with the same LUT count in most of the cases. Experimental results using Xilinx Kintex-7 FPGAs show a considerable reduction in critical path and dynamic power dissipation with same area utilization in most of the cases.

Download Full-text

Abstract TMP73: Intra-arterial Transfer Time Metrics Study—Southern California Kaiser Permanente and University of Texas Houston Telestroke Network Experience

Stroke ◽

10.1161/str.48.suppl_1.tmp73 ◽

2017 ◽

Vol 48 (suppl_1) ◽

Author(s):

Tzu-Ching Wu ◽

Navdeep Sangha ◽

Feryal N Elorr ◽

Edgar Olivas ◽

Christy M Ankrom ◽

...

Keyword(s):

Health Systems ◽

Best Practice ◽

Data Transfer ◽

Symptom Improvement ◽

Transfer Time ◽

Kaiser Permanente ◽

University Of Texas ◽

Integrated Health Care System ◽

A Prospective Study ◽

Practice Time

Background: The transfer process for patients with large vessel occlusions from a community hospital to an intra-arterial therapy (IAT)-capable center often involves multiple teams of physicians and administrative personnel, leading to delays in care. Objective We compared time metrics for spoke drip-and-ship telemedicine (TM) patients transferred for IAT to comprehensive stroke centers (CSC) in two different health systems: Kaiser Permanente (KP) with an integrated health care system of spokes and a 50 mile range using ambulances for transfer vs UTHealth (UTH), where patients are transferred by helicopter from varying health systems ranging up to 200 miles from the hub. Methods: We retrospectively identified patients in the KP and UTH networks transferred from TM spokes to the CSC (KP—6 spokes and UTH -17 spokes). From 9/15 to 4/16, a total of 79 TM patients (KP-28 patients, UTH-51 patients) were transferred to the respective hubs for evaluation of IAT. Baseline clinical data, transfer, and IAT metrics were abstracted. Results: On average, it takes ~90 minutes for a TM patient to arrive at the CSC hub once accepted by the transfer center. Patients in the KP Network arrive at the hub faster than UTH patients, but IAT metrics/outcomes are comparable. Over 50% of the patients did not undergo IAT on hub arrival mostly due to lack of clot on CTA (20/45) or symptom improvement (9/45). Conclusion: In two large, yet different TM networks, the transfer time from spoke to hub needs to be shortened. Areas for improvement include spoke arrival to transfer acceptance and transfer acceptance to hub arrival. A prospective study is underway to develop best practice time parameters for this complex process of identifying and transferring patients eligible for IAT.

Download Full-text

Optimization Algorithms for Data Transfer in the Grid Environment

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch210 ◽

2012 ◽

pp. 502-516

Author(s):

Muzhou Xiong ◽

Hai Jin

Keyword(s):

Data Transfer ◽

Optimization Algorithms ◽

Experimental Results ◽

Grid Environment ◽

Transfer Data ◽

Multiple Data ◽

Transfer Channel ◽

Efficient Data ◽

Global Connection

In this chapter, two algorithms have been presented for supporting efficient data transfer in the Grid environment. From a node’s perspective, a multiple data transfer channel can be formed by selecting some other nodes as relays in data transfer. One algorithm requires the sender to be aware of the global connection information while another does not. Experimental results indicate that both algorithms can transfer data efficiently under various circumstances.

Download Full-text

Power and Area Efficient Clock Stretching and Critical Path Reshaping for Error Resilience

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea9010005 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5

Author(s):

Mini Jayakrishnan ◽

Alan Chang ◽

Tony Tae-Hyoung Kim

Keyword(s):

Critical Path ◽

Error Resilience ◽

Cost Effective ◽

Advanced Technology ◽

Worst Case ◽

Flip Flop ◽

Smart Products ◽

Run Time ◽

Critical Paths ◽

And Performance

Energy efficient semiconductor chips are in high demand to cater the needs of today’s smart products. Advanced technology nodes insert high design margins to deal with rising variations at the cost of power, area and performance. Existing run time resilience techniques are not cost effective due to the additional circuits involved. In this paper, we propose a design time resilience technique using a clock stretched flip-flop to redistribute the available slack in the processor pipeline to the critical paths. We use the opportunistic slack to redesign the critical fan in logic using logic reshaping, better than worst case sigma corner libraries and multi-bit flip-flops to achieve power and area savings. Experimental results prove that we can tune the logic and the library to get significant power and area savings of 69% and 15% in the execute pipeline stage of the processor compared to the traditional worst-case design. Whereas, existing run time resilience hardware results in 36% and 2% power and area overhead respectively.

Download Full-text