buffer allocation Latest Research Papers

Multi-head self-attention (attention mechanism) has been employed in a variety of fields such as machine translation, language modeling, and image processing due to its superiority in feature extraction and sequential data analysis. This is benefited from a large number of parameters and sophisticated model architecture behind the attention mechanism. To efficiently deploy attention mechanism on resource-constrained devices, existing works propose to reduce the model size by building a customized smaller model or compressing a big standard model. A customized smaller model is usually optimized for the specific task and needs effort in model parameters exploration. Model compression reduces model size without hurting the model architecture robustness, which can be efficiently applied to different tasks. The compressed weights in the model are usually regularly shaped (e.g. rectangle) but the dimension sizes vary (e.g. differs in rectangle height and width). Such compressed attention mechanism can be efficiently deployed on CPU/GPU platforms as their memory and computing resources can be flexibly assigned with demand. However, for Field Programmable Gate Arrays (FPGAs), the data buffer allocation and computing kernel are fixed at run time to achieve maximum energy efficiency. After compression, weights are much smaller and different in size, which leads to inefficient utilization of FPGA on-chip buffer. Moreover, the different weight heights and widths may lead to inefficient FPGA computing kernel execution. Due to the large number of weights in the attention mechanism, building a unique buffer and computing kernel for each compressed weight on FPGA is not feasible. In this work, we jointly consider the compression impact on buffer allocation and the required computing kernel during the attention mechanism compressing. A novel structural pruning method with memory footprint awareness is proposed and the associated accelerator on FPGA is designed. The experimental results show that our work can compress Transformer (an attention mechanism based model) by 95x. The developed accelerator can fully utilize the FPGA resource, processing the sparse attention mechanism with the run-time throughput performance of 1.87 Tops in ZCU102 FPGA.

Download Full-text

Neural Network Based Optimized Buffer Allocation In Future Networks

10.1109/mysurucon52639.2021.9641538 ◽

2021 ◽

Author(s):

Paramanand Patil ◽

Satyanarayan Padaganur ◽

Umesh Dixit ◽

Achyut Yaragal

Keyword(s):

Neural Network ◽

Buffer Allocation ◽

Future Networks

Download Full-text

Buffer allocation problem in a shoe manufacturing line: A metamodeling approach

Revista Facultad de Ingeniería Universidad de Antioquia ◽

10.17533/udea.redin.20210735 ◽

2021 ◽

Author(s):

José Omar Hernández - Va´zquez ◽

Salvador Hernández-González ◽

José Israel Hernández - V´ázquez ◽

Vicente Figueroa- Fernández ◽

Claudia Iveth Cancino de la Fuente

Keyword(s):

Cycle Time ◽

Flow Time ◽

Allocation Problem ◽

Buffer Allocation ◽

Assembly Operation ◽

Minimum Value ◽

Manufacturing Line ◽

Linear Behavior ◽

Shoe Manufacturing ◽

The Right

Footwear production is subject to the variability inherent in any process, and producers often need to apply tools that allow them to make the right decisions. This work documents the process to optimize the buffer allocation in a shoe manufacturing line minimizing the cycle time in the system, applying a metamodeling approach. It was found that the Front sewing operation, and the interaction between the Lining sewing operation and the assembly operation have the greatest effect on the flow time of the product within the process; the optimum assignment of spaces follows a non-uniform arrangement on the line saturating the slower stations; the cycle time follows a non-linear behavior vs. the total number of spaces (N) in the line. For a certain value of N, the cycle time reaches a minimum value.

Download Full-text

Buffer allocation problem in production flow lines: A new Benders-decomposition-based exact solution approach

IISE Transactions ◽

10.1080/24725854.2021.1905195 ◽

2021 ◽

pp. 1-15

Author(s):

Mengyi Zhang ◽

Erica Pastore ◽

Arianna Alfieri ◽

Andrea Matta

Keyword(s):

Exact Solution ◽

Benders Decomposition ◽

Allocation Problem ◽

Buffer Allocation ◽

Production Flow ◽

Flow Lines ◽

Solution Approach

Download Full-text

SERVER COORDINATION IN QUEUEING SYSTEMS: WHEN AND HOW?

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964821000140 ◽

2021 ◽

pp. 1-55

Author(s):

Junqi Hu ◽

Sigrún Andradóttir ◽

Hayriye Ayhan

Keyword(s):

Queueing Systems ◽

Task Assignment ◽

Numerical Results ◽

Buffer Allocation ◽

Average Throughput ◽

Long Run ◽

Multi Server

Standard server assignment policies for multi-server queueing stations include the noncollaborative policy, where the servers work in parallel on different jobs; and the fully collaborative policy, where the servers work together on the same job. However, if each job can be decomposed into subtasks with no precedence relationships, then we consider a form of server coordination named task assignment where the servers work in parallel on different subtasks of the same job. We identify the task assignment policy that maximizes the long-run average throughput of a queueing station with finite internal buffers when blocked servers can be idled or reassigned to either replace or collaborate with other servers on unblocked subtasks. We then compare the server coordination policies and show that the task assignment is best when the servers are highly specialized; otherwise, the fully collaborative or noncollaborative policies are preferable depending on whether the synergy level among the servers is high or not. We also provide numerical results that quantify our previous comparison. Finally, we address buffer allocation in longer lines where there are precedence relationships between some of the tasks, and present numerical results that suggest our comparisons for one queueing station generalize to longer lines.

Download Full-text

Energy-efficient buffer allocation problem in unreliable production lines

The International Journal of Advanced Manufacturing Technology ◽

10.1007/s00170-021-06971-1 ◽

2021 ◽

Author(s):

Yasmine Alaouchiche ◽

Yassine Ouazene ◽

Farouk Yalaoui

Keyword(s):

Energy Efficient ◽

Allocation Problem ◽

Buffer Allocation ◽

Production Lines

Download Full-text

Buffer allocation design for unreliable production lines using genetic algorithm and finite perturbation analysis

International Journal of Production Research ◽

10.1080/00207543.2021.1909169 ◽

2021 ◽

pp. 1-17

Author(s):

Khelil Kassoul ◽

Naoufel Cheikhrouhou ◽

Nicolas Zufferey

Keyword(s):

Genetic Algorithm ◽

Perturbation Analysis ◽

Buffer Allocation ◽

Production Lines ◽

Finite Perturbation

Download Full-text

Automotive Body Shop Design Problems Using Meta-Models Considering Product-Mix Change and Reconfiguration Strategy

Applied Sciences ◽

10.3390/app11062748 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2748

Author(s):

Dug Hee Moon ◽

Dong Ok Kim ◽

Yang Woo Shin

Keyword(s):

Production Rate ◽

Manufacturing System ◽

Main Body ◽

Buffer Allocation ◽

Product Mix ◽

Body Shop ◽

Meta Model ◽

Automotive Body ◽

Electric Car ◽

Side Body

The estimation of production rate (or throughput) is important in manufacturing system design. Herein, we consider the manufacturing system of an automotive body shop in which two types of car are produced, and one car (engine car) is substituted by the other car (electric car) gradually. In this body shop, two different underbody lines are installed because the underbody structures of the two types of cars differ completely; however, the side body line and main body line are shared by the two cars. Furthermore, we assume that the underbody lines are reconfigurable based on an increase in the product mix of the electric car. A simulation-based meta-model, which is in the form of a quadratic polynomial function, is developed to estimate the production rate. In the meta-modelling process, we group some buffer locations and represent them as one variable to reduce the number of variables included in the meta-model. Subsequently, the meta-models have been used to optimize two types of buffer allocation problems, and optimal solutions are obtained easily.

Download Full-text