Quantitative study of computing time of direct/iterative solver for MoM by GPU computing

Dynamois a package for the processing of tomographic data. As a tool for subtomogram averaging, it includes different alignment and classification strategies. Furthermore, its data-management module allows experiments to be organized in groups of tomograms, while offering specialized three-dimensional tomographic browsers that facilitate visualization, location of regions of interest, modelling and particle extraction in complex geometries. Here, a technical description of the package is presented, focusing on its diverse strategies for optimizing computing performance.Dynamois built upon mbtools (middle layer toolbox), a general-purposeMATLABlibrary for object-oriented scientific programming specifically developed to underpinDynamobut usable as an independent tool. Its structure intertwines a flexibleMATLABcodebase with precompiled C++ functions that carry the burden of numerically intensive operations. The package can be delivered as a precompiled standalone ready for execution without aMATLABlicense. Multicore parallelization on a single node is directly inherited from the high-level parallelization engine provided forMATLAB, automatically imparting a balanced workload among the threads in computationally intense tasks such as alignment and classification, but also in logistic-oriented tasks such as tomogram binning and particle extraction.Dynamosupports the use of graphical processing units (GPUs), yielding considerable speedup factors both for nativeDynamoprocedures (such as the numerically intensive subtomogram alignment) and procedures defined by the user through itsMATLAB-based GPU library for three-dimensional operations. Cloud-based virtual computing environments supplied with a pre-installed version ofDynamocan be publicly accessed through the Amazon Elastic Compute Cloud (EC2), enabling users to rent GPU computing time on a pay-as-you-go basis, thus avoiding upfront investments in hardware and longterm software maintenance.

Download Full-text

CPU AND GPU (CUDA) TEMPLATE MATCHING COMPARISON / CPU IR GPU (CUDA) PALYGINIMAS VYKDANT ŠABLONŲ ATITIKTIES ALGORITMĄ

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2014.16 ◽

2014 ◽

Vol 6 (2) ◽

pp. 129-133

Author(s):

Evaldas Borcovas ◽

Gintautas Daunys

Keyword(s):

Template Matching ◽

Gpu Computing ◽

Computing Time ◽

Processing Unit ◽

Compute Unified Device Architecture ◽

Central Processing ◽

Device Architecture ◽

Cuda Technology ◽

Dual Core ◽

Template Size

Image processing, computer vision or other complicated opticalinformation processing algorithms require large resources. It isoften desired to execute algorithms in real time. It is hard tofulfill such requirements with single CPU processor. NVidiaproposed CUDA technology enables programmer to use theGPU resources in the computer. Current research was madewith Intel Pentium Dual-Core T4500 2.3 GHz processor with4 GB RAM DDR3 (CPU I), NVidia GeForce GT320M CUDAcompliable graphics card (GPU I) and Intel Core I5-2500K3.3 GHz processor with 4 GB RAM DDR3 (CPU II), NVidiaGeForce GTX 560 CUDA compatible graphic card (GPU II).Additional libraries as OpenCV 2.1 and OpenCV 2.4.0 CUDAcompliable were used for the testing. Main test were made withstandard function MatchTemplate from the OpenCV libraries.The algorithm uses a main image and a template. An influenceof these factors was tested. Main image and template have beenresized and the algorithm computing time and performancein Gtpix/s have been measured. According to the informationobtained from the research GPU computing using the hardwarementioned earlier is till 24 times faster when it is processing abig amount of information. When the images are small the performanceof CPU and GPU are not significantly different. Thechoice of the template size makes influence on calculating withCPU. Difference in the computing time between the GPUs canbe explained by the number of cores which they have. Vaizdų apdorojimas, kompiuterinė rega ir kiti sudėtingi algoritmai, apdorojantys optinę informaciją, naudoja dideliusskaičiavimo išteklius. Dažnai šiuos algoritmus reikia realizuoti realiuoju laiku. Šį uždavinį išspręsti naudojant tik vienoCPU (angl. Central processing unit) pajėgumus yra sudėtinga. nVidia pasiūlyta CUDA (angl. Compute unified device architecture)technologija leidžia panaudoti GPU (angl. Graphic processing unit) išteklius. Tyrimui atlikti buvo pasirinkti du skirtingiCPU: Intel Pentium Dual-Core T4500 ir Intel Core I5 2500K, bei GPU: nVidia GeForce GT320M ir NVidia GeForce 560.Tyrime buvo panaudotos vaizdų apdorojimo bibliotekos: OpenCV 2.1 ir OpenCV 2.4. Tyrimui buvo pasirinktas šablonų atitiktiesalgoritmas. Algoritmui realizuoti reikalingas analizuojamas vaizdas ir ieškomo objekto vaizdo šablonas. Tyrimo metu buvokeičiamas vaizdo ir šablono dydis bei stebima, kaip tai veikia algoritmo vykdymo trukmę ir vykdomų operacijų skaičių persekundę. Iš gautų rezultatų galima teigti, kad apdorojant didelį duomenų kiekį GPU realizuoja algoritmą iki 24 kartų greičiaunei tik CPU. Dirbant su nedideliu duomenų kiekiu, skirtumas tarp CPU ir GPU yra minimalus. Lyginant skaičiavimus dviejuoseGPU, pastebėta, kad skaičiavimų sparta yra tiesiogiai proporcinga GPU turimų branduolių kiekiui. Mūsų tyrimo atvejuspartesniame GPU jų buvo 16 kartų daugiau, tad ir skaičiavimai vyko 16 kartų sparčiau.

Download Full-text

Darknet on OpenCL: A Multi-platform Tool for Object Detection and Classification

10.20944/preprints202007.0506.v1 ◽

2020 ◽

Author(s):

Piotr Sowa ◽

Jacek Izydorczyk

Keyword(s):

Neural Networks ◽

Gpu Computing ◽

State Of The Art ◽

Computing Time ◽

Lessons Learned ◽

Memory Transfer ◽

Training Performance ◽

Weak Points ◽

And Training

The article’s goal is to overview challenges and problems on the way from the state of the art CUDA accelerated neural networks code to multi-GPU code. For this purpose, the authors describe the journey of porting the existing in the GitHub, fully-featured CUDA accelerated Darknet engine to OpenCL. The article presents lessons learned and the techniques that were put in place to make this port happen. There are few other implementations on the GitHub that leverage the OpenCL standard, and a few have tried to port Darknet as well. Darknet is a well known convolutional neural network (CNN) framework. The authors of this article investigated all aspects of the porting and achieved the fully-featured Darknet engine on OpenCL. The effort was focused not only on the classification with the use of YOLO1, YOLO2, and YOLO3 CNN models. They also covered other aspects, such as training neural networks, and benchmarks to look for the weak points in the implementation. The GPU computing code substantially improves Darknet computing time compared to the standard CPU version by using underused hardware in existing systems. If the system is OpenCL-based, then it is practically hardware independent. In this article, the authors report comparisons of the computation and training performance compared to the existing CUDA-based Darknet engine in the various computers, including single board computers, and, different CNN use-cases. The authors found that the OpenCL version could perform as fast as the CUDA version in the compute aspect, but it is slower in memory transfer between RAM (CPU memory) and VRAM (GPU memory). It depends on the quality of OpenCL implementation only. Moreover, loosening hardware requirements by the OpenCL Darknet can boost applications of DNN, especially in the energy-sensitive applications of Artificial Intelligence (AI) and Machine Learning (ML).

Download Full-text

Electron-Mirror Microscopic Aspects of Ferroelectric Domains of BaTiO3 And Ca2Sr (C2H5CO2)6

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100069387 ◽

1970 ◽

Vol 28 ◽

pp. 478-479

Author(s):

Teruo Someya ◽

Jinzo Kobayashi

Keyword(s):

Surface Layer ◽

Electric Fields ◽

Quantitative Study ◽

Recent Progress ◽

Ferroelectric Domain ◽

Resolving Power ◽

Surface Pattern ◽

Crystal Surfaces ◽

One Dimensional ◽

Ferroelectric Domains

Recent progress in the electron-mirror microscopy (EMM), e.g., an improvement of its resolving power together with an increase of the magnification makes it useful for investigating the ferroelectric domain physics. English has recently observed the domain texture in the surface layer of BaTiO3. The present authors ) have developed a theory by which one can evaluate small one-dimensional electric fields and/or topographic step heights in the crystal surfaces from their EMM pictures. This theory was applied to a quantitative study of the surface pattern of BaTiO3).

Download Full-text

Calculation of structure images of crystalline defects

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100108556 ◽

1978 ◽

Vol 36 (1) ◽

pp. 282-283

Author(s):

M.A. O'Keefe ◽

Sumio Iijima

Keyword(s):

Diffuse Scattering ◽

Computing Time ◽

Continuous Distribution ◽

Sampling Interval ◽

Reciprocal Space ◽

Objective Lens ◽

Lattice Images ◽

Slice Method ◽

Space Cell ◽

Beam Lattice

We have extended the multi-slice method of computating many-beam lattice images of perfect crystals to calculations for imperfect crystals using the artificial superlattice approach. Electron waves scattered from faulted regions of crystals are distributed continuously in reciprocal space, and all these waves interact dynamically with each other to give diffuse scattering patterns.In the computation, this continuous distribution can be sampled only at a finite number of regularly spaced points in reciprocal space, and thus finer sampling gives an improved approximation. The larger cell also allows us to defocus the objective lens further before adjacent defect images overlap, producing spurious computational Fourier images. However, smaller cells allow us to sample the direct space cell more finely; since the two-dimensional arrays in our program are limited to 128X128 and the sampling interval shoud be less than 1/2Å (and preferably only 1/4Å), superlattice sizes are limited to 40 to 60Å. Apart from finding a compromis superlattice cell size, computing time must be conserved.

Download Full-text

A New Numerical Model for Electron-Probe Analysis at High Depth Resolution

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s042482010016491x ◽

1996 ◽

Vol 54 ◽

pp. 490-491

Author(s):

P.-F. Staub ◽

C. Bonnelle ◽

F. Vergand ◽

P. Jonnard

Keyword(s):

Electron Probe ◽

Surface Segregation ◽

Depth Distribution ◽

Computing Time ◽

Depth Resolution ◽

Physical Parameters ◽

Electron Probe Analysis ◽

Interface Phases ◽

Excitation Conditions ◽

High Depth

Characterizing dimensionally and chemically nanometric structures such as surface segregation or interface phases can be performed efficiently using electron probe (EP) techniques at very low excitation conditions, i.e. using small incident energies (0.5<E0<5 keV) and low incident overvoltages (1<U0<1.7). In such extreme conditions, classical analytical EP models are generally pushed to their validity limits in terms of accuracy and physical consistency, and Monte-Carlo simulations are not convenient solutions as routine tools, because of their cost in computing time. In this context, we have developed an intermediate procedure, called IntriX, in which the ionization depth distributions Φ(ρz) are numerically reconstructed by integration of basic macroscopic physical parameters describing the electron beam/matter interaction, all of them being available under pre-established analytical forms. IntriX’s procedure consists in dividing the ionization depth distribution into three separate contributions:

Download Full-text