Exploiting storage redundancy to speed up randomized shared memory simulations

Author(s):  
Friedhelm Meyer ◽  
Christian Scheideier ◽  
Volker Stemann
Keyword(s):  
Author(s):  
Stavros Hadjitheophanous ◽  
Stelios N. Neophytou ◽  
Maria K. Michael
Keyword(s):  

2019 ◽  
Vol 13 ◽  
Author(s):  
Shraddha D. Oza ◽  
Kalyani R. Joshi

Background: Magnetic resonance (MR) imaging plays a significant role in the computer aided diagnostic systems for remote healthcare. In such systems, the soft textures and tissues within the denoised MR image are classified by the segmentation stage using machine learning algorithms like Hidden Markov Model. Thus, quality of MR image is of extreme importance and is decisive in accuracy of process of classification and diagnosis. Objective: To provide real time medical diagnostics in the remote healthcare intelligent setups, the research work proposes CUDA GPU based accelerated bilateral filter for fast denoising of 2D high resolution knee MR images. Method: To achieve optimized GPU performance with better speed up, the work implements an improvised technique that uses on chip shared memory in combination with constant cache. Results: The speed up of 382x is achieved with the new proposed optimization technique which is 2.7x as that obtained with the shared memory only approach. The superior speed up is along with 90.6%occupancy index indicating effective parallelization. The work here also aims at justifying appropriateness of bilateral filter over other filters for denoising magnetic resonance images. All the patents related to GPU based image denoising are revised and uniqueness of the proposed technique is confirmed. Conclusion: The results indicate that even for a 64Mpixel image, the execution time of the proposed implementation is 334.91 msec only, making the performance almost real time. This will surely contribute to the real time computer aided data diagnostics requirement under remote critical conditions.


1996 ◽  
Vol 162 (2) ◽  
pp. 245-281 ◽  
Author(s):  
Friedhelm Meyer auf der Heide ◽  
Christian Scheideler ◽  
Volker Stemann
Keyword(s):  

2018 ◽  
Vol 18 (5-6) ◽  
pp. 725-758 ◽  
Author(s):  
IAN P. GENT ◽  
IAN MIGUEL ◽  
PETER NIGHTINGALE ◽  
CIARAN MCCREESH ◽  
PATRICK PROSSER ◽  
...  

AbstractAs multi-core computing is now standard, it seems irresponsible for constraints researchers to ignore the implications of it. Researchers need to address a number of issues to exploit parallelism, such as: investigating which constraint algorithms are amenable to parallelisation; whether to use shared memory or distributed computation; whether to use static or dynamic decomposition; and how to best exploit portfolios and cooperating search. We review the literature, and see that we can sometimes do quite well, some of the time, on some instances, but we are far from a general solution. Yet there seems to be little overall guidance that can be given on how best to exploit multi-core computers to speed up constraint solving. We hope at least that this survey will provide useful pointers to future researchers wishing to correct this situation.


2021 ◽  
Vol 35 (2) ◽  
pp. 16-22
Author(s):  
Su-Gyeong Min ◽  
Sung-Chan Kim

This study evaluates the computational efficiency based on the parallel processing mode and domain decomposition method of the FDS model to enhance the computational performance of fire simulation. A single compartment of dimensions 12.0 m × 3.8 m × 3.0 m is considered along with a rectangular fire source (0.4 m × 0.4 m) fueled by n-Heptane. The computational domain was divided into 136,000 cells forming a grid size of 0.1 m, and the computational efficiency for each calculation was evaluated by the wall clock time for a simulation time of 300 s using a computational framework with 24 cores of a single CPU and a 256 GB shared memory system. The MPI and hybrid mode in FDS parallel offers a greater speed-up capability than the OpenMP mode, and the domain decomposition method used greatly affects the computational efficiency. The maximum speed-up with the OpenMP mode was less than 1.5 for a single computational domain, which indicates that there is an optimal condition for thread assignment and domain decomposition in the OpenMP mode. The present study is expected to contribute toward obtaining effective fire simulation results with limited computing power and time in fire protection engineering.


2017 ◽  
Vol 4 (9) ◽  
pp. 170436 ◽  
Author(s):  
Gang Mei ◽  
Liangliang Xu ◽  
Nengxiong Xu

This paper focuses on designing and implementing parallel adaptive inverse distance weighting (AIDW) interpolation algorithms by using the graphics processing unit (GPU). The AIDW is an improved version of the standard IDW, which can adaptively determine the power parameter according to the data points’ spatial distribution pattern and achieve more accurate predictions than those predicted by IDW. In this paper, we first present two versions of the GPU-accelerated AIDW, i.e. the naive version without profiting from the shared memory and the tiled version taking advantage of the shared memory. We also implement the naive version and the tiled version using two data layouts, structure of arrays and array of aligned structures, on both single and double precision. We then evaluate the performance of parallel AIDW by comparing it with its corresponding serial algorithm on three different machines equipped with the GPUs GT730M, M5000 and K40c. The experimental results indicate that: (i) there is no significant difference in the computational efficiency when different data layouts are employed; (ii) the tiled version is always slightly faster than the naive version; and (iii) on single precision the achieved speed-up can be up to 763 (on the GPU M5000), while on double precision the obtained highest speed-up is 197 (on the GPU K40c). To benefit the community, all source code and testing data related to the presented parallel AIDW algorithm are publicly available.


Kybernetes ◽  
1992 ◽  
Vol 21 (7) ◽  
pp. 29-47 ◽  
Author(s):  
K.R. Tout ◽  
D.J. Evans

Applies a parallel backward‐chaining technique to a rule‐based expert system on a shared‐memory multiprocessor system. The condition for a processor to split up its search tree (task‐node) and generate new OR nodes is based on the level in the goal tree at which the task‐node is found. The results indicate satisfactory speed‐up performance for a small number of processors (< 10) and a reasonably large number of rules.


1999 ◽  
Vol 09 (04) ◽  
pp. 475-485 ◽  
Author(s):  
YOSHIHIDE IGARASHI ◽  
YASUAKI NISHITANI

We propose two modifications of the n-process mutual exclusion algorithm by Peterson for the asynchronous multi-writer/reader shared memory model. By any of the modifications we can speed up the original n-process algorithm. The running times for the trying regions of the first modified algorithm and the second modified algorithm are (2n - 3)c + O(n3 l) and (n - 1)c + O(n3 l), respectively, where n is the number of processes, l is an upper bound on the time between two steps, and c is an upper bound on the time that any user spends in the critical region. These running times are improvements on the running time, O(n2c + n4 l) of the original n-process algorithm for the same asynchronous shared memory model.


Author(s):  
Brian Cross

A relatively new entry, in the field of microscopy, is the Scanning X-Ray Fluorescence Microscope (SXRFM). Using this type of instrument (e.g. Kevex Omicron X-ray Microprobe), one can obtain multiple elemental x-ray images, from the analysis of materials which show heterogeneity. The SXRFM obtains images by collimating an x-ray beam (e.g. 100 μm diameter), and then scanning the sample with a high-speed x-y stage. To speed up the image acquisition, data is acquired "on-the-fly" by slew-scanning the stage along the x-axis, like a TV or SEM scan. To reduce the overhead from "fly-back," the images can be acquired by bi-directional scanning of the x-axis. This results in very little overhead with the re-positioning of the sample stage. The image acquisition rate is dominated by the x-ray acquisition rate. Therefore, the total x-ray image acquisition rate, using the SXRFM, is very comparable to an SEM. Although the x-ray spatial resolution of the SXRFM is worse than an SEM (say 100 vs. 2 μm), there are several other advantages.


Author(s):  
A. G. Jackson ◽  
M. Rowe

Diffraction intensities from intermetallic compounds are, in the kinematic approximation, proportional to the scattering amplitude from the element doing the scattering. More detailed calculations have shown that site symmetry and occupation by various atom species also affects the intensity in a diffracted beam. [1] Hence, by measuring the intensities of beams, or their ratios, the occupancy can be estimated. Measurement of the intensity values also allows structure calculations to be made to determine the spatial distribution of the potentials doing the scattering. Thermal effects are also present as a background contribution. Inelastic effects such as loss or absorption/excitation complicate the intensity behavior, and dynamical theory is required to estimate the intensity value.The dynamic range of currents in diffracted beams can be 104or 105:1. Hence, detection of such information requires a means for collecting the intensity over a signal-to-noise range beyond that obtainable with a single film plate, which has a S/N of about 103:1. Although such a collection system is not available currently, a simple system consisting of instrumentation on an existing STEM can be used as a proof of concept which has a S/N of about 255:1, limited by the 8 bit pixel attributes used in the electronics. Use of 24 bit pixel attributes would easily allowthe desired noise range to be attained in the processing instrumentation. The S/N of the scintillator used by the photoelectron sensor is about 106 to 1, well beyond the S/N goal. The trade-off that must be made is the time for acquiring the signal, since the pattern can be obtained in seconds using film plates, compared to 10 to 20 minutes for a pattern to be acquired using the digital scan. Parallel acquisition would, of course, speed up this process immensely.


Sign in / Sign up

Export Citation Format

Share Document