Improving the convergence and parallel memory efficiency of the MLFMM in FEKO

We present new sequential and parallel algorithms for wavelet tree construction based on a new bottom-up technique. This technique makes use of the structure of the wavelet trees—refining the characters represented in a node of the tree with increasing depth—in an opposite way, by first computing the leaves (most refined), and then propagating this information upwards to the root of the tree. We first describe new sequential algorithms, both in RAM and external memory. Based on these results, we adapt these algorithms to parallel computers, where we address both shared memory and distributed memory settings. In practice, all our algorithms outperform previous ones in both time and memory efficiency, because we can compute all auxiliary information solely based on the information we obtained from computing the leaves. Most of our algorithms are also adapted to the wavelet matrix , a variant that is particularly suited for large alphabets.

Download Full-text

Power and Performance Evaluation of Memory-Intensive Applications

Energies ◽

10.3390/en14144089 ◽

2021 ◽

Vol 14 (14) ◽

pp. 4089

Author(s):

Kaiqiang Zhang ◽

Dongyang Ou ◽

Congfeng Jiang ◽

Yeliang Qiu ◽

Longchuan Yan

Keyword(s):

Energy Efficiency ◽

Energy Consumption ◽

Power Consumption ◽

Job Scheduling ◽

Memory System ◽

Processor Core ◽

Memory Efficiency ◽

And Performance ◽

Reasonable Use ◽

Server System

In terms of power and energy consumption, DRAMs play a key role in a modern server system as well as processors. Although power-aware scheduling is based on the proportion of energy between DRAM and other components, when running memory-intensive applications, the energy consumption of the whole server system will be significantly affected by the non-energy proportion of DRAM. Furthermore, modern servers usually use NUMA architecture to replace the original SMP architecture to increase its memory bandwidth. It is of great significance to study the energy efficiency of these two different memory architectures. Therefore, in order to explore the power consumption characteristics of servers under memory-intensive workload, this paper evaluates the power consumption and performance of memory-intensive applications in different generations of real rack servers. Through analysis, we find that: (1) Workload intensity and concurrent execution threads affects server power consumption, but a fully utilized memory system may not necessarily bring good energy efficiency indicators. (2) Even if the memory system is not fully utilized, the memory capacity of each processor core has a significant impact on application performance and server power consumption. (3) When running memory-intensive applications, memory utilization is not always a good indicator of server power consumption. (4) The reasonable use of the NUMA architecture will improve the memory energy efficiency significantly. The experimental results show that reasonable use of NUMA architecture can improve memory efficiency by 16% compared with SMP architecture, while unreasonable use of NUMA architecture reduces memory efficiency by 13%. The findings we present in this paper provide useful insights and guidance for system designers and data center operators to help them in energy-efficiency-aware job scheduling and energy conservation.

Download Full-text

Foot Gesture Recognition Using High-Compression Radar Signature Image and Deep Learning

Sensors ◽

10.3390/s21113937 ◽

2021 ◽

Vol 21 (11) ◽

pp. 3937

Author(s):

Seungeon Song ◽

Bongseok Kim ◽

Sangdong Kim ◽

Jonghun Lee

Keyword(s):

Deep Learning ◽

Gesture Recognition ◽

Smart Home ◽

Doppler Radar ◽

Radar Images ◽

Radar Signature ◽

Memory Efficiency ◽

High Compression ◽

Value Decomposition ◽

Deep Learning Model

Recently, Doppler radar-based foot gesture recognition has attracted attention as a hands-free tool. Doppler radar-based recognition for various foot gestures is still very challenging. So far, no studies have yet dealt deeply with recognition of various foot gestures based on Doppler radar and a deep learning model. In this paper, we propose a method of foot gesture recognition using a new high-compression radar signature image and deep learning. By means of a deep learning AlexNet model, a new high-compression radar signature is created by extracting dominant features via Singular Value Decomposition (SVD) processing; four different foot gestures including kicking, swinging, sliding, and tapping are recognized. Instead of using an original radar signature, the proposed method improves the memory efficiency required for deep learning training by using a high-compression radar signature. Original and reconstructed radar images with high compression values of 90%, 95%, and 99% were applied for the deep learning AlexNet model. As experimental results, movements of all four different foot gestures and of a rolling baseball were recognized with an accuracy of approximately 98.64%. In the future, due to the radar’s inherent robustness to the surrounding environment, this foot gesture recognition sensor using Doppler radar and deep learning will be widely useful in future automotive and smart home industry fields.

Download Full-text

Improving Memory Efficiency in Heterogeneous MPSoCs through Row-Buffer Locality-aware Forwarding

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3377149 ◽

2020 ◽

Vol 17 (1) ◽

pp. 1-26

Author(s):

Yang Song ◽

Bill Lin

Keyword(s):

Memory Efficiency ◽

Row Buffer Locality

Download Full-text

Adjoint-based exact Hessian computation

BIT Numerical Mathematics ◽

10.1007/s10543-020-00833-0 ◽

2021 ◽

Cited By ~ 1

Author(s):

Shin-ichi Ito ◽

Takeru Matsuda ◽

Yuto Miyatake

Keyword(s):

Krylov Subspace ◽

Hessian Matrix ◽

Adjoint System ◽

Coefficient Matrix ◽

Scalar Function ◽

Second Order ◽

Subspace Method ◽

Initial Value ◽

Research Fields ◽

Memory Efficiency

AbstractWe consider a scalar function depending on a numerical solution of an initial value problem, and its second-derivative (Hessian) matrix for the initial value. The need to extract the information of the Hessian or to solve a linear system having the Hessian as a coefficient matrix arises in many research fields such as optimization, Bayesian estimation, and uncertainty quantification. From the perspective of memory efficiency, these tasks often employ a Krylov subspace method that does not need to hold the Hessian matrix explicitly and only requires computing the multiplication of the Hessian and a given vector. One of the ways to obtain an approximation of such Hessian-vector multiplication is to integrate the so-called second-order adjoint system numerically. However, the error in the approximation could be significant even if the numerical integration to the second-order adjoint system is sufficiently accurate. This paper presents a novel algorithm that computes the intended Hessian-vector multiplication exactly and efficiently. For this aim, we give a new concise derivation of the second-order adjoint system and show that the intended multiplication can be computed exactly by applying a particular numerical method to the second-order adjoint system. In the discussion, symplectic partitioned Runge–Kutta methods play an essential role.

Download Full-text