CAVLCU: an efficient GPU-based implementation of CAVLC

The Journal of Supercomputing ◽

10.1007/s11227-021-04183-8 ◽

2021 ◽

Author(s):

Antonio Fuentes-Alventosa ◽

Juan Gómez-Luna ◽

José Maria González-Linares ◽

Nicolás Guil ◽

R. Medina-Carnicer

Keyword(s):

High Performance ◽

Data Encryption ◽

Entropy Method ◽

Global Memory ◽

Instruction Level Parallelism ◽

Thread Block ◽

Memory Space ◽

Synchronization Mechanism ◽

Memory Accesses ◽

The One

AbstractCAVLC (Context-Adaptive Variable Length Coding) is a high-performance entropy method for video and image compression. It is the most commonly used entropy method in the video standard H.264. In recent years, several hardware accelerators for CAVLC have been designed. In contrast, high-performance software implementations of CAVLC (e.g., GPU-based) are scarce. A high-performance GPU-based implementation of CAVLC is desirable in several scenarios. On the one hand, it can be exploited as the entropy component in GPU-based H.264 encoders, which are a very suitable solution when GPU built-in H.264 hardware encoders lack certain necessary functionality, such as data encryption and information hiding. On the other hand, a GPU-based implementation of CAVLC can be reused in a wide variety of GPU-based compression systems for encoding images and videos in formats other than H.264, such as medical images. This is not possible with hardware implementations of CAVLC, as they are non-separable components of hardware H.264 encoders. In this paper, we present CAVLCU, an efficient implementation of CAVLC on GPU, which is based on four key ideas. First, we use only one kernel to avoid the long latency global memory accesses required to transmit intermediate results among different kernels, and the costly launches and terminations of additional kernels. Second, we apply an efficient synchronization mechanism for thread-blocks (In this paper, to prevent confusion, a block of pixels of a frame will be referred to as simply block and a GPU thread block as thread-block.) that process adjacent frame regions (in horizontal and vertical dimensions) to share results in global memory space. Third, we exploit fully the available global memory bandwidth by using vectorized loads to move directly the quantized transform coefficients to registers. Fourth, we use register tiling to implement the zigzag sorting, thus obtaining high instruction-level parallelism. An exhaustive experimental evaluation showed that our approach is between 2.5$$\times$$ × and 5.4$$\times$$ × faster than the only state-of-the-art GPU-based implementation of CAVLC.

Download Full-text

Generalized Load Sharing for Homogeneous Networks of Distributed Environment

Journal of Computer Systems Networks and Communications ◽

10.1155/2008/294106 ◽

2008 ◽

Vol 2008 ◽

pp. 1-9 ◽

Cited By ~ 2

Author(s):

A. Satheesh ◽

K. Vimal Kumar ◽

S. Krishnaveni

Keyword(s):

High Performance ◽

Load Sharing ◽

Global Memory ◽

Distributed Environment ◽

Memory Space ◽

Remote Execution ◽

Execution Strategy ◽

Migration Strategies ◽

Overall Performance ◽

Memory Resources

We propose a method for job migration policies by considering effective usage of global memory in addition to CPU load sharing in distributed systems. When a node is identified for lacking sufficient memory space to serve jobs, one or more jobs of the node will be migrated to remote nodes with low memory allocations. If the memory space is sufficiently large, the jobs will be scheduled by a CPU-based load sharing policy. Following the principle of sharing both CPU and memory resources, we present several load sharing alternatives. Our objective is to reduce the number of page faults caused by unbalanced memory allocations for jobs among distributed nodes, so that overall performance of a distributed system can be significantly improved. We have conducted trace-driven simulations to compare CPU-based load sharing policies with our policies. We show that our load sharing policies not only improve performance of memory bound jobs, but also maintain the same load sharing quality as the CPU-based policies for CPU-bound jobs. Regarding remote execution and preemptive migration strategies, our experiments indicate that a strategy selection in load sharing is dependent on the amount of memory demand of jobs, remote execution is more effective for memory-bound jobs, and preemptive migration is more effective for CPU-bound jobs. Our CPU-memory-based policy using either high performance or high throughput approach and using the remote execution strategy performs the best for both CPU-bound and memory-bound job in homogeneous networks of distributed environment.

Download Full-text

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

ACM Transactions on Mathematical Software ◽

10.1145/3441850 ◽

2021 ◽

Vol 47 (2) ◽

pp. 1-28

Author(s):

Goran Flegar ◽

Hartwig Anzt ◽

Terry Cojean ◽

Enrique S. Quintana-Ortí

Keyword(s):

Linear Algebra ◽

Graphics Processing Units ◽

High Performance ◽

Numerical Algorithms ◽

Mixed Precision ◽

Before And After ◽

Memory Accesses ◽

Specialized Hardware ◽

The Individual ◽

Graphics Processing

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

Download Full-text

Comparative Study of the Field Performances of Pressure-Grouted Micropiles Using Gravity and Packers

Applied Sciences ◽

10.3390/app11156736 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6736

Author(s):

Ong Heo ◽

Yeowon Yoon ◽

Jinung Do

Keyword(s):

Urban Areas ◽

High Performance ◽

Driving Forces ◽

Field Performance ◽

Underground Space ◽

Creep Tests ◽

Deep Foundation ◽

Installation Depth ◽

Pressure Grouting ◽

The One

When underground space requires excavation in areas below the water table, the foundation system suffers from buoyancy, which leads to the uplifting of the superstructure. A deep foundation system can be used; however, in cases where a hard layer is encountered, high driving forces and corresponding noises cause civil complaints in urban areas. Micropiles can be an effective alternative option, due to their high performance despite a short installation depth. Pressurized grouting is used with a packer to induce higher interfacial properties between micropile and soil. In this study, the field performance of micropiles installed using gravitational grouting or pressure-grouted using either a geotextile packer or rubber packer was comparatively evaluated by tension and creep tests. Micropiles were installed using pressure grouting in weak and fractured zones. As results, the pressure-grouted micropiles showed more stable and stronger behaviors than ones installed using the gravitational grouting. Moreover, the pressure-grouted micropile installed using the rubber packer showed better performance than the one using the geotextile packer.

Download Full-text

AFX Zeolite for Use as a Support of NH3-SCR Catalyst Mining through AICE Joint Research Project of Industries–Academia–Academia

Catalysts ◽

10.3390/catal11020163 ◽

2021 ◽

Vol 11 (2) ◽

pp. 163

Author(s):

Masaru Ogura ◽

Yumiko Shimada ◽

Takeshi Ohnishi ◽

Naoto Nakazawa ◽

Yoshihiro Kubota ◽

...

Keyword(s):

High Performance ◽

Ionic Species ◽

Research Project ◽

Joint Research ◽

Structure Directing Agent ◽

Scr Catalyst ◽

Organic Structure ◽

Nh3 Scr ◽

Joint Research Project ◽

The One

This paper introduces a joint industries–academia–academia research project started by researchers in several automobile companies and universities working on a single theme. Our first target was to find a zeolite for NH3-SCR, that is, zeolite mining. Zeolite AFX, having the same topology of SSZ-16, was found to be the one of the zeolites. SSZ-16 can be synthesized by using an organic structure-directing agent such as 1,1′-tetramethylenebis(1-azonia-4-azabicyclo[2.2.2]octane; Dab-4, resulting in the formation of Al-rich SSZ-16 with Si/Al below five. We found that AFX crystallized by use of N,N,N′,N′-tetraethylbicyclo[2.2.2]oct-7-ene-2,3:5,6-dipyrrolidinium ion, called TEBOP in this study, had the same analog as SSZ-16 having Si/Al around six and a smaller particle size than SSZ-16. The AFX demonstrated a high performance for NH3-SCR as the zeolitic support to load a large number of divalent Cu ionic species with high hydrothermal stability.

Download Full-text

Constructing Large-Scale Genetic Maps Using an Evolutionary Strategy Algorithm

Genetics ◽

10.1093/genetics/165.4.2269 ◽

2003 ◽

Vol 165 (4) ◽

pp. 2269-2282

Author(s):

D Mester ◽

Y Ronin ◽

D Minkov ◽

E Nevo ◽

A Korol

Keyword(s):

Discrete Optimization ◽

High Performance ◽

Large Scale ◽

Simulated Data ◽

Real Data ◽

Genetic Maps ◽

Chromosome 1 ◽

Evolutionary Strategy ◽

Group A ◽

The One

Abstract This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with ∼50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.

Download Full-text

A Review on the Development of Rotman Lens Antenna

Chinese Journal of Engineering ◽

10.1155/2014/385385 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 9

Author(s):

Shruti Vashist ◽

M. K. Soni ◽

P. K. Singhal

Keyword(s):

High Performance ◽

Beam Steering ◽

Phase Error ◽

Antenna System ◽

Surveillance Systems ◽

Electronic Warfare ◽

Design Concepts ◽

Low Profile ◽

True Time ◽

The One

Rotman lenses are the beguiling devices used by the beamforming networks (BFNs). These lenses are generally used in the radar surveillance systems to see targets in multiple directions due to its multibeam capability without physically moving the antenna system. Now a days these lenses are being integrated into many radars and electronic warfare systems around the world. The antenna should be capable of producing multiple beams which can be steered without changing the orientation of the antenna. Microwave lenses are the one who support low-phase error, wideband, and wide-angle scanning. They are the true time delay (TTD) devices producing frequency independent beam steering. The emerging printed lenses in recent years have facilitated the advancement of designing high performance but low-profile, light-weight, and small-size and networks (BFNs). This paper will review and analyze various design concepts used over the years to improve the scanning capability of the lens developed by various researchers.

Download Full-text

Assisted Study of the Human Force-Power Parameters Influences to the Kinematics Motricities Characteristics

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.555.665 ◽

2014 ◽

Vol 555 ◽

pp. 665-672

Author(s):

Adriana Şerban Târgoveţ ◽

Dragoş Ionescu-Bondoc

Keyword(s):

High Performance ◽

Reference Method ◽

Motor Training ◽

Propulsive Force ◽

Specific Technique ◽

Kinematic Parameters ◽

Technical Parameters ◽

The One ◽

Multimodal Process ◽

Block Start

During swimming competitions starting from the block-start platform, a potential hypothesis was noticed, through an active multimodal process, which can make the swimming start efficient, especially in the case of sprint races, by improving the propulsive force parameters of the inferior limbs. The swimming start research from interdisciplinary perspective: biomechanical, kinematic, informational and statistical can consolidate and improve the specific technique in accordance with the abilities and psycho-motor qualities of the swimmers. The present study is based on an experiment where the spatial-temporal and kinematic parameters were processed with the help of a Dartfish program. The evolution of parameters is researched as a result of a motor training program with the purpose to increase the propulsive force off the block-start. The improvement of spatial-temporal parameters influences the performance and evolution of technical parameters. Initial and final recordings were made on an MLD Station Evo5 and MLD software MuskelLeistungs Diagnose, fromSPSport, SPSportdiagnosegeräte, in order to evaluate the force, the power and the propulsive force. The argumentation of the experimental research is based on the statement: “the spatial characteristics of the motions and actions can be studied for themselves as parameters, characteristics or as a reference method for defining other characteristics, such as velocity or push-off force [1]. The main purpose of this study is to identify the influences of the specific start training upon the force improvement and kick power of the support foot from the block-start, during the classic track start. Given that the track start technique is the same as the one of the kick start executed from the international block-start of Omega, OSB11, developed in 2009, one assumes that the improvement of the classic track start leads by default to the improvement of the kick start. Lack of training to practice this type of start leads to deficient use during competitions, thus obtaining poor performances. There are no kick block-starts in Romania in order to train high performance athletes participating in international competitions and as a consequence, poor results are obtained at sprint races. One assumes that training for this type of start can be succesfully made only from a block-start similar to the kick one. The block-start model adapted by us under the same biomechanical conditions as the ones of the international kick start, is called “athletic kick”. The training specific to the kick start is carried out only with the optimum use of the kick block-start, the reasons for this being presented by N, Houel, A. Charliac, JL.Rey, Phellardin the paper: “How the swimmer could improve his track start using new Olympic plot” [2].

Download Full-text

FPGA implementation and image encryption application of a new PRNG based on a memristive Hopfield neural network with a special activation gradient

Chinese Physics B ◽

10.1088/1674-1056/ac3cb2 ◽

2021 ◽

Author(s):

Fei Yu ◽

Zinan Zhang ◽

Hui Shen ◽

Yuanyuan Huang ◽

Shuo Cai ◽

...

Keyword(s):

Neural Network ◽

Image Encryption ◽

High Performance ◽

Random Sequence ◽

Random Number Generator ◽

Security Analysis ◽

Hopfield Neural Network ◽

Data Encryption ◽

Design Tool ◽

Processing Unit

Abstract In this paper, a memristive Hopfield neural network with a special activation gradient (MHNN) is proposed by adding a suitable memristor to the Hopfield neural network (HNN) with a special activation gradient. The MHNN is simulated and dynamic analyzed, and implemented on FPGA. Then, a new pseudo-random number generator (PRNG) based on MHNN is proposed. The post-processing unit of the PRNG is composed of nonlinear post-processor and XOR calculator, which effectively ensures the randomness of PRNG. The experiments in this paper comply with the IEEE 754-1985 high precision 32-bit floating point standard and are done on the Vivado design tool using a Xilinx XC7Z020CLG400-2 FPGA chip and the Verilog-HDL hardware programming language. The random sequence generated by the PRNG proposed in this paper has passed the NIST SP800-22 test suite and security analysis, proving its randomness and high performance. Finally, an image encryption system based on PRNG is proposed and implemented on FPGA, which proves the value of the image encryption system in the field of data encryption connected to the Internet of Things (IoT).

Download Full-text

An Image Segmentation Method Based on Adaptability Threshold

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.734-737.2912 ◽

2013 ◽

Vol 734-737 ◽

pp. 2912-2916

Author(s):

Hui Li ◽

Ping He

Keyword(s):

Image Segmentation ◽

Strain Measurement ◽

Maximum Entropy Method ◽

Entropy Method ◽

Important Application ◽

Original Image ◽

Segmentation Method ◽

One Dimensional ◽

The One ◽

Application Fields

Automation strain measurement of the sheet metal deforming becomes one of the important application fields of computer vision. The algorithm of image segmentation based on adaptability threshold was presented for image segmentation of metal steel. In order to validate the proposed method, it is tested and compared with Ostu method and the one-dimensional maximum entropy method. Experiment results indicate that the method is simple and effective, and has an advantage of reservation of the main features of the original image.

Download Full-text

Exergy As a Measure of Sustainable Retrofitting of Buildings

Energies ◽

10.3390/en11113139 ◽

2018 ◽

Vol 11 (11) ◽

pp. 3139 ◽

Cited By ~ 2

Author(s):

Carlos Fernández Bandera ◽

Ana Muñoz Mardones ◽

Hu Du ◽

Juan Echevarría Trueba ◽

Germán Ramos Ruiz

Keyword(s):

High Performance ◽

Exergy Analysis ◽

Local Contexts ◽

Primary Selection ◽

Optimization Methodology ◽

The One ◽

Reference Environment ◽

Architectural Characteristics ◽

The University

This study presents a novel optimization methodology for choosing optimal building retrofitting strategies based on the concept of exergy analysis. The study demonstrates that the building exergy analysis may open new opportunities in the design of an optimal retrofit solution despite being a theoretical approach based on the high performance of a Carnot reverse cycle. This exergy-based solution is different from the one selected through traditional efficient retrofits where minimizing energy consumption is the primary selection criteria. The new solution connects the building with the reference environment, which acts as “an unlimited sink or unlimited sources of energy”, and it adapts the building to maximize the intake of energy resources from the reference environment. The building hosting the School of Architecture at the University of Navarra has been chosen as the case study building. The unique architectural appearance and bespoke architectural characteristics of the building limit the choices of retrofitting solutions; therefore, retrofitting solutions on the façade, roof, roof skylight and windows are considered in multi-objective optimization using the jEPlus package. It is remarkable that different retrofitting solutions have been obtained for energy-driven and exergy-driven optimization, respectively. Considering the local contexts and all possible reference environments for the building, three “unlimited sinks or unlimited sources of energy” are selected for the case study building to explore exergy-driven optimization: the external air, the ground in the surrounding area and the nearby river. The evidence shows that no matter which reference environment is chosen, an identical envelope retrofitting solution has been obtained.

Download Full-text