scholarly journals Multi-core and many-core shared-memory parallel raycasting volume rendering optimization and tuning

Author(s):  
E Wes Bethel ◽  
Mark Howison

Given the computing industry trend of increasing processing capacity by adding more cores to a chip, the focus of this work is tuning the performance of a staple visualization algorithm, raycasting volume rendering, for shared-memory parallelism on multi-core CPUs and many-core GPUs. Our approach is to vary tunable algorithmic settings, along with known algorithmic optimizations and two different memory layouts, and measure performance in terms of absolute runtime and L2 memory cache misses. Our results indicate there is a wide variation in runtime performance on all platforms, as much as 254% for the tunable parameters we test on multi-core CPUs and 265% on many-core GPUs, and the optimal configurations vary across platforms, often in a non-obvious way. For example, our results indicate the optimal configurations on the GPU occur at a crossover point between those that maintain good cache utilization and those that saturate computational throughput. This result is likely to be extremely difficult to predict with an empirical performance model for this particular algorithm because it has an unstructured memory access pattern that varies locally for individual rays and globally for the selected viewpoint. Our results also show that optimal parameters on modern architectures are markedly different from those in previous studies run on older architectures. In addition, given the dramatic performance variation across platforms for both optimal algorithm settings and performance results, there is a clear benefit for production visualization and analysis codes to adopt a strategy for performance optimization through auto-tuning. These benefits will likely become more pronounced in the future as the number of cores per chip and the cost of moving data through the memory hierarchy both increase.

2019 ◽  
Author(s):  
Henrique Freitas ◽  
Celso Luiz Mendes

The Roofline model gives insights about the performance behavior of applications bounded by either memory or processor limits, providing useful guidelines for performance improvements. This work uses the Roofline model on the analysis of the MGB model that simulates hydrological processes in largescale watersheds. Real-world input data are used to characterize the performance on two multicore architectures, one with only CPUs and one with CPUs/GPU. The MGB model performance is improved with optimizations for better memory use, and also with shared-memory (OpenMP) and GPU (OpenACC) parallelism. CPU performance achieves 42.51 % and 50.17 % of each system’s peak, whereas GPU performance is low due to overheads caused by the MGB model structure.


Author(s):  
Lucio Salles de Salles ◽  
Lev Khazanovich

The Pavement ME transverse joint faulting model incorporates mechanistic theories that predict development of joint faulting in jointed plain concrete pavements (JPCP). The model is calibrated using the Long-Term Pavement Performance database. However, the Mechanistic-Empirical Pavement Design Guide (MEPDG) encourages transportation agencies, such as state departments of transportation, to perform local calibrations of the faulting model included in Pavement ME. Model calibration is a complicated and effort-intensive process that requires high-quality pavement design and performance data. Pavement management data—which is collected regularly and in large amounts—may present higher variability than is desired for faulting performance model calibration. The MEPDG performance prediction models predict pavement distresses with 50% reliability. JPCP are usually designed for high levels of faulting reliability to reduce likelihood of excessive faulting. For design, improving the faulting reliability model is as important as improving the faulting prediction model. This paper proposes a calibration of the Pavement ME reliability model using pavement management system (PMS) data. It illustrates the proposed approach using PMS data from Pennsylvania Department of Transportation. Results show an increase in accuracy for faulting predictions using the new reliability model with various design characteristics. Moreover, the new reliability model allows design of JPCP considering higher levels of traffic because of the less conservative predictions.


Author(s):  
Kui Xu ◽  
Ming Zhang ◽  
Jie Liu ◽  
Nan Sha ◽  
Wei Xie ◽  
...  

Abstract In this paper, we design the simultaneous wireless information and power transfer (SWIPT) protocol for massive multi-input multi-output (mMIMO) system with non-linear energy-harvesting (EH) terminals. In this system, the base station (BS) serves a set of uplink fixed half-duplex (HD) terminals with non-linear energy harvester. Considering the non-linearity of practical energy-harvesting circuits, we adopt the realistic non-linear EH model rather than the idealistic linear EH model. The proposed SWIPT protocol can be divided into two phases. The first phase is designed for terminals EH and downlink training. A beam domain energy beamforming method is employed for the wireless power transmission. In the second phase, the BS forms the two-layer receive beamformers for the reception of signals transmitted by terminals. In order to improve the spectral efficiency (SE) of the system, the BS transmit power- and time-switching ratios are optimized. Simulation results show the superiority of the proposed beam-domain SWIPT protocol on SE performance compared with the conventional mMIMO SWIPT protocols.


Sign in / Sign up

Export Citation Format

Share Document