Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. In addition, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

Download Full-text

Parallelization and performance optimization of calculation in three-dimensional underwater acoustic propagation on modern many-core processor

2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC) ◽

10.1109/icspcc.2017.8242531 ◽

2017 ◽

Author(s):

Min Xu ◽

Yongxian Wang

Keyword(s):

Performance Optimization ◽

Three Dimensional ◽

Acoustic Propagation ◽

Underwater Acoustic ◽

And Performance ◽

Many Core

Download Full-text

Roofline Analysis and Performance Optimization of the MGB Hydrological Model

10.5753/wscad.2019.8657 ◽

2019 ◽

Author(s):

Henrique Freitas ◽

Celso Luiz Mendes

Keyword(s):

Shared Memory ◽

Performance Optimization ◽

Hydrological Model ◽

Model Performance ◽

Model Structure ◽

Multicore Architectures ◽

Performance Improvements ◽

Roofline Model ◽

And Performance ◽

Performance Behavior

The Roofline model gives insights about the performance behavior of applications bounded by either memory or processor limits, providing useful guidelines for performance improvements. This work uses the Roofline model on the analysis of the MGB model that simulates hydrological processes in largescale watersheds. Real-world input data are used to characterize the performance on two multicore architectures, one with only CPUs and one with CPUs/GPU. The MGB model performance is improved with optimizations for better memory use, and also with shared-memory (OpenMP) and GPU (OpenACC) parallelism. CPU performance achieves 42.51 % and 50.17 % of each system’s peak, whereas GPU performance is low due to overheads caused by the MGB model structure.

Download Full-text

Injury Prevention and Performance Optimization in Soldiers of the Army 101st Airborne/Air Assault Division

10.21236/ada618622 ◽

2011 ◽

Author(s):

Scott Lephart

Keyword(s):

Injury Prevention ◽

Performance Optimization ◽

And Performance

Download Full-text

Implementation and Validation of Sandia Outdoor Photovoltaic Test Method and Performance Model at Arizona State University

2006 IEEE 4th World Conference on Photovoltaic Energy Conference ◽

10.1109/wcpec.2006.279947 ◽

2006 ◽

Cited By ~ 1

Author(s):

Bo Li ◽

David King ◽

William Boyson ◽

Govindasamy TamizhMani

Keyword(s):

Performance Model ◽

Test Method ◽

State University ◽

Arizona State University ◽

And Performance

Download Full-text

Local Calibration of Pavement Mechanistic-Empirical Faulting Reliability using Pavement Management Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211001392 ◽

2021 ◽

pp. 036119812110013

Author(s):

Lucio Salles de Salles ◽

Lev Khazanovich

Keyword(s):

Model Calibration ◽

Prediction Models ◽

Plain Concrete ◽

Performance Model ◽

Pavement Management ◽

Pavement Design ◽

Reliability Model ◽

Transverse Joint ◽

Design Characteristics ◽

And Performance

The Pavement ME transverse joint faulting model incorporates mechanistic theories that predict development of joint faulting in jointed plain concrete pavements (JPCP). The model is calibrated using the Long-Term Pavement Performance database. However, the Mechanistic-Empirical Pavement Design Guide (MEPDG) encourages transportation agencies, such as state departments of transportation, to perform local calibrations of the faulting model included in Pavement ME. Model calibration is a complicated and effort-intensive process that requires high-quality pavement design and performance data. Pavement management data—which is collected regularly and in large amounts—may present higher variability than is desired for faulting performance model calibration. The MEPDG performance prediction models predict pavement distresses with 50% reliability. JPCP are usually designed for high levels of faulting reliability to reduce likelihood of excessive faulting. For design, improving the faulting reliability model is as important as improving the faulting prediction model. This paper proposes a calibration of the Pavement ME reliability model using pavement management system (PMS) data. It illustrates the proposed approach using PMS data from Pennsylvania Department of Transportation. Results show an increase in accuracy for faulting predictions using the new reliability model with various design characteristics. Moreover, the new reliability model allows design of JPCP considering higher levels of traffic because of the less conservative predictions.

Download Full-text

Design and performance optimization of microchannel condensers for electric vehicles

International Journal of Energy Research ◽

10.1002/er.6576 ◽

2021 ◽

Author(s):

Ni Liu ◽

Huan Li ◽

Kang Li ◽

Yidong Fang ◽

Lin Su ◽

...

Keyword(s):

Electric Vehicles ◽

Performance Optimization ◽

And Performance

Download Full-text

SWIPT in mMIMO system with non-linear energy-harvesting terminals: protocol design and performance optimization

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1378-4 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 2

Author(s):

Kui Xu ◽

Ming Zhang ◽

Jie Liu ◽

Nan Sha ◽

Wei Xie ◽

...

Keyword(s):

Energy Harvesting ◽

Performance Optimization ◽

Power Transmission ◽

Base Station ◽

Second Phase ◽

Wireless Power ◽

Half Duplex ◽

Non Linear ◽

Two Phases ◽

And Performance

Abstract In this paper, we design the simultaneous wireless information and power transfer (SWIPT) protocol for massive multi-input multi-output (mMIMO) system with non-linear energy-harvesting (EH) terminals. In this system, the base station (BS) serves a set of uplink fixed half-duplex (HD) terminals with non-linear energy harvester. Considering the non-linearity of practical energy-harvesting circuits, we adopt the realistic non-linear EH model rather than the idealistic linear EH model. The proposed SWIPT protocol can be divided into two phases. The first phase is designed for terminals EH and downlink training. A beam domain energy beamforming method is employed for the wireless power transmission. In the second phase, the BS forms the two-layer receive beamformers for the reception of signals transmitted by terminals. In order to improve the spectral efficiency (SE) of the system, the BS transmit power- and time-switching ratios are optimized. Simulation results show the superiority of the proposed beam-domain SWIPT protocol on SE performance compared with the conventional mMIMO SWIPT protocols.

Download Full-text