Optimization of Lattice Boltzmann Simulation With Graphics-Processing-Unit Parallel Computing and the Application in Reservoir Characterization

SPE Journal ◽  
2016 ◽  
Vol 21 (04) ◽  
pp. 1425-1435 ◽  
Author(s):  
Cheng Chen ◽  
Zheng Wang ◽  
Deepak Majeti ◽  
Nick Vrvilo ◽  
Timothy Warburton ◽  
...  

Summary Shale permeability is sufficiently low to require an unconventional scale of stimulation treatments, such as very-large-volume, high-rate, multistage hydraulic-fracturing applications. Upscaling of hydrocarbon transport processes in shales is challenging because of the low permeability and strong heterogeneity. Rock characterization with high-resolution imaging [X-ray tomography and scanning electron microscope (SEM)] is usually highly localized and contains significant uncertainties because of the small field of view. Therefore, an effective high-performance computing method is required to collect information over a larger scale to meet the ergodicity requirement in upscaling. The lattice Boltzmann (LB) method has received significant attention in computational fluid dynamics because of its capability in coping with complicated boundary conditions. A combination of high-resolution imaging and LB simulation is a powerful approach for evaluating the transport properties of a porous medium in a timely manner, on the basis of the numerical solution of the Navier-Stokes equations and Darcy's law. In this work, a graphics-processing-unit (GPU) -enhanced lattice Boltzmann simulator (GELBS) was developed, which was optimized by GPU parallel computing on the basis of the inherent parallelism of the LB method. Specifically, the LB method was used to implement the computational kernel; a sparse data structure was applied to optimize memory allocation; the OCCA (Medina et al. 2014) portability library was used, which enables the GELBS codes to use different application-programming interfaces (APIs) including open computing language (OpenCL), compute unified device architecture (CUDA), and open multiprocessing (OpenMP). OpenCL is an open standard for cross-platform parallel computing, CUDA is supported only by NVIDIA devices, and OpenMP is primarily used on central processing units (CPUs). It was found that the GPU-accelerated code was approximately 1,000 times faster than the unoptimized serial code and 10 times faster than the parallel code run on a standalone CPU. The CUDA code was slightly faster than OpenCL code on the NVIDA GPU because of the extra cost of OpenCL used to adapt to a heterogeneous platform. The GELBS was validated by comparing it with analytical solutions, laboratory measurements, and other independent numerical simulators in previous studies, and it was proved to have a second-order global accuracy. The GELBS was then used to analyze thin cuttings extracted from a sandstone reservoir and a shale-gas reservoir. The sandstone permeabilities were found relatively isotropic, whereas the shale permeabilities were strongly anisotropic because of the horizontal lamination structure. In shale cuttings, the average permeability in the horizontal direction was higher than that in the vertical direction by approximately two orders of magnitude. Correlations between porosity and permeability were observed in both rocks. The combination of GELBS and high-resolution imaging methods makes for a powerful tool for permeability evaluation when conventional laboratory measurement is impossible because of small cuttings sizes. The constitutive correlations between geometry and transport properties can be used for upscaling in different rock types. The GPU-optimized code significantly accelerates the computing speed; thus, many more samples can be analyzed given the same processing time. Consequently, the ergodicity requirement is met, which leads to a better reservoir characterization.

2021 ◽  
Vol 20 (3) ◽  
pp. 1-22
Author(s):  
David Langerman ◽  
Alan George

High-resolution, low-latency apps in computer vision are ubiquitous in today’s world of mixed-reality devices. These innovations provide a platform that can leverage the improving technology of depth sensors and embedded accelerators to enable higher-resolution, lower-latency processing for 3D scenes using depth-upsampling algorithms. This research demonstrates that filter-based upsampling algorithms are feasible for mixed-reality apps using low-power hardware accelerators. The authors parallelized and evaluated a depth-upsampling algorithm on two different devices: a reconfigurable-logic FPGA embedded within a low-power SoC; and a fixed-logic embedded graphics processing unit. We demonstrate that both accelerators can meet the real-time requirements of 11 ms latency for mixed-reality apps. 1


2017 ◽  
Vol 29 (3) ◽  
Author(s):  
Simon Lucas Winberg ◽  
Moeko Ramone ◽  
Khagendra Naidoo

The Cape Floristic Kingdom (CFK) is the most diverse floristic kingdom in the world and has been declared an international heritage site. However, it is under threat from wild fires and invasive species. Much of the work of managing this natural resource, such as removing alien vegetation or fighting wild fires, is done by volunteers and casual workers. Many fynbos species, for which the Table Mountain National Park is known, are difficult to identify, particularly by non-expert volunteers. Accurate and fast identification of plant species would be beneficial in these contexts. The Fynbos Leaf Optical Recognition Application (FLORA) was thus developed to assist in the recognition of plants of the CFK. The first version of FLORA was developed as a rapid prototype in MATLAB; it utilized sequential algorithms to identify plant leaves, and much of this code was interpreted M files. The initial implementation suffered from slow performance, though, and could not run as a lightweight standalone executable, making it cumbersome. FLORA was thus re-developed as a standalone C++ version that was subsequently enhanced further by accelerating critical routines, by running them on a graphics processing unit (GPU). This paper presents the design and testing of both the C++ version and the GPU-accelerated version of FLORA. Comparative testing was done on all three versions of FLORA, viz., the original MATLAB prototype, the C++ non-accelerated version, and the C++ GPU-accelerated version to show the performance and accuracy of the different versions. The accuracy of the predictions remained consistent across versions. The C++ version was noticeable faster than the original prototype, achieving an average speed-up of 8.7 for high-resolution 3456x2304 pixel images. The GPU-accelerated version was even faster, saving 51.85 ms on average for high-resolution images. Such a time saving would be perceptible for batch processing, such as rebuilding feature descriptors for all the leaves in the leaf database. Further work on this project involves testing the system with a wider variety of leaves and trying different machine learning algorithms for the leaf prediction routines.


Sign in / Sign up

Export Citation Format

Share Document