scholarly journals Nearest Neighbors Search Using Multi-GPU

2021 ◽  
Author(s):  
Vinícius Nogueira ◽  
Lucas Amorim ◽  
Igor Baratta ◽  
Gabriel Pereira ◽  
Renato Mesquita

Meshless methods are increasingly gaining space in the study of electromagnetic phenomena as an alternative to traditional mesh-based methods. One of their biggest advantages is the absence of a mesh to describe the simulation domain. Instead, the domain discretization is done by spreading nodes along the domain and its boundaries. Thus, meshless methods are based on the interactions of each node with all its neighbors, and determining the neighborhood of the nodes becomes a fundamental task. The k-nearest neighbors (kNN) is a well-known algorithm used for this purpose, but it becomes a bottleneck for these methods due to its high computational cost. One of the alternatives to reduce the kNN high computational cost is to use spatial partitioning data structures (e.g., planar grid) that allow pruning when performing the k-nearest neighbors search. Furthermore, many of these strategies employed for kNN search have been adapted for graphics processing units (GPUs) and can take advantage of its high potential for parallelism. Thus, this paper proposes a multi-GPU version of the grid method for solving the kNN problem. It was possible to achieve a speedup of up to 1.99x and up to 3.94x using two and four GPUs, respectively, when compared against the single-GPU version of the grid method.

Author(s):  
Liam Dunn ◽  
Patrick Clearwater ◽  
Andrew Melatos ◽  
Karl Wette

Abstract The F-statistic is a detection statistic used widely in searches for continuous gravitational waves with terrestrial, long-baseline interferometers. A new implementation of the F-statistic is presented which accelerates the existing "resampling" algorithm using graphics processing units (GPUs). The new implementation runs between 10 and 100 times faster than the existing implementation on central processing units without sacrificing numerical accuracy. The utility of the GPU implementation is demonstrated on a pilot narrowband search for four newly discovered millisecond pulsars in the globular cluster Omega Centauri using data from the second Laser Interferometer Gravitational-Wave Observatory observing run. The computational cost is 17:2 GPU-hours using the new implementation, compared to 1092 core-hours with the existing implementation.


2018 ◽  
Vol 29 (01) ◽  
pp. 63-90 ◽  
Author(s):  
Safia Kedad-Sidhoum ◽  
Florence Monna ◽  
Grégory Mounié ◽  
Denis Trystram

More and more parallel computing platforms are built upon hybrid architectures combining multi-core processors (CPUs) and hardware accelerators like General Purpose Graphics Processing Units (GPGPUs). We present in this paper a new method for scheduling efficiently parallel applications with [Formula: see text] CPUs and [Formula: see text] GPGPUs, where each task of the application can be processed either on an usual core (CPU) or on a GPGPU. We consider the problem of scheduling [Formula: see text] independent tasks with the objective to minimize the time for completing the whole application (makespan). This problem is NP-hard, thus, we present two families of approximation algorithms that can achieve approximation ratios of [Formula: see text] or [Formula: see text] for any integer [Formula: see text] when only one GPGPU is considered, and [Formula: see text] or [Formula: see text] for [Formula: see text] GPGPUs, where [Formula: see text] is an arbitrary small value which corresponds to the target accuracy of a binary search. The proposed method is based on a dual approximation scheme that uses a dynamic programming algorithm. The associated computational costs are for the first (resp. second) family in [Formula: see text] (resp. [Formula: see text]) per step of dual approximation. The greater the value of parameter [Formula: see text], the better the approximation, but the more expensive the computational cost. Finally, we propose a relaxed version of the algorithm which achieves a running time in [Formula: see text] with a constant approximation bound of [Formula: see text]. This last result is compared to the state-of-the-art algorithm HEFT. The proposed solving method is the first general purpose algorithm for scheduling on hybrid machines with a theoretical performance guarantee that can be used for practical purposes.


Mathematics ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 286 ◽  
Author(s):  
Hamid Saadatfar ◽  
Samiyeh Khosravi ◽  
Javad Hassannataj Joloudari ◽  
Amir Mosavi ◽  
Shahaboddin Shamshirband

The K-nearest neighbors (KNN) machine learning algorithm is a well-known non-parametric classification method. However, like other traditional data mining methods, applying it on big data comes with computational challenges. Indeed, KNN determines the class of a new sample based on the class of its nearest neighbors; however, identifying the neighbors in a large amount of data imposes a large computational cost so that it is no longer applicable by a single computing machine. One of the proposed techniques to make classification methods applicable on large datasets is pruning. LC-KNN is an improved KNN method which first clusters the data into some smaller partitions using the K-means clustering method; and then applies the KNN for each new sample on the partition which its center is the nearest one. However, because the clusters have different shapes and densities, selection of the appropriate cluster is a challenge. In this paper, an approach has been proposed to improve the pruning phase of the LC-KNN method by taking into account these factors. The proposed approach helps to choose a more appropriate cluster of data for looking for the neighbors, thus, increasing the classification accuracy. The performance of the proposed approach is evaluated on different real datasets. The experimental results show the effectiveness of the proposed approach and its higher classification accuracy and lower time cost in comparison to other recent relevant methods.


2009 ◽  
Vol 26 (7) ◽  
pp. 1410-1414 ◽  
Author(s):  
Feng Gao

Abstract To efficiently implement the interpolation methods (e.g., Shepard’s method and its variants) for the radar reflectivity field, a fast method that calculates the k-nearest-neighbor nodes (sampling points in radar volume scan) of the interpolated point (grid point) is described and proved. Several geometric propositions of radar volume scan on which the method is based are suggested and proved. Finally, the computational cost of the method is analyzed. The method is fast enough for real-time applications.


Author(s):  
Javier Crespo ◽  
Roque Corral ◽  
Jesus Pueblas

An implicit harmonic balance method for modeling the unsteady non-linear periodic flow about vibrating airfoils in turbomachinery is presented. As departing point, an implicit edge-based three-dimensional Reynolds Averaged Navier-Stokes equations solver for unstructured grids that runs both on central processing units (CPUs) and graphics processing units (GPUs) is used. The harmonic balance method performs a spectral discretization of the time derivatives and marches in pseudo-time a new system of equations where the unknowns are the variables at different time samples. The application of the method to vibrating airfoils is discussed. It is shown that a time spectral scheme may achieve the same temporal accuracy at a much lower computational cost than a Backward Finite Difference method at the expense of using more memory. The performance of the implicit solver has been assessed with several application examples. A speed-up factor of 10 is obtained between the spectral and finite difference version of the code whereas and an additional speed-up factor of 10 is obtained when the code is ported to GPUs, totalizing a speed factor of 100. The performance of the solver in GPUs has been assessed using the 10th standard aeroelastic configuration and a transonic compressor.


2015 ◽  
Vol 138 (3) ◽  
Author(s):  
Javier Crespo ◽  
Roque Corral ◽  
Jesus Pueblas

An implicit harmonic balance (HB) method for modeling the unsteady nonlinear periodic flow about vibrating airfoils in turbomachinery is presented. An implicit edge-based three-dimensional Reynolds-averaged Navier–Stokes equations (RANS) solver for unstructured grids, which runs both on central processing units (CPUs) and graphics processing units (GPUs), is used. The HB method performs a spectral discretization of the time derivatives and marches in pseudotime, a new system of equations where the unknowns are the variables at different time samples. The application of the method to vibrating airfoils is discussed. It is shown that a time-spectral scheme may achieve the same temporal accuracy at a much lower computational cost than a backward finite-difference method at the expense of using more memory. The performance of the implicit solver has been assessed with several application examples. A speed-up factor of 10 is obtained between the spectral and finite-difference version of the code, whereas an additional speed-up factor of 10 is obtained when the code is ported to GPUs, totalizing a speed factor of 100. The performance of the solver in GPUs has been assessed using the tenth standard aeroelastic configuration and a transonic compressor.


2013 ◽  
Vol 392 ◽  
pp. 815-819
Author(s):  
Wei Zhu ◽  
Fang Di ◽  
Jian Li Li ◽  
Li Tian

A de-noising and simplification approach based on spatial connectivity is proposed which is applied to deal with the boundary points of point cloud. First, grid method is used to represent the spatial topology relationship of the scattered point cloud and calculate the k-nearest neighbors for each data point. Then boundary points are extracted according to uniform distribution of point cloud. And next, an algorithm for boundary points simplification of point cloud is presented to further simplify boundary points. Consequently, not only the details characteristics are reserved well, but also the boundary points are simplified. The experimental result shows that the proposed approach can not only reserve characteristics of both details and boundaries but also realize de-noising and simplification of point cloud.


Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 1974 ◽  
Author(s):  
Yibin Huang ◽  
Congying Qiu ◽  
Xiaonan Wang ◽  
Shijun Wang ◽  
Kui Yuan

The advent of convolutional neural networks (CNNs) has accelerated the progress of computer vision from many aspects. However, the majority of the existing CNNs heavily rely on expensive GPUs (graphics processing units). to support large computations. Therefore, CNNs have not been widely used to inspect surface defects in the manufacturing field yet. In this paper, we develop a compact CNN-based model that not only achieves high performance on tiny defect inspection but can be run on low-frequency CPUs (central processing units). Our model consists of a light-weight (LW) bottleneck and a decoder. By a pyramid of lightweight kernels, the LW bottleneck provides rich features with less computational cost. The decoder is also built in a lightweight way, which consists of an atrous spatial pyramid pooling (ASPP) and depthwise separable convolution layers. These lightweight designs reduce the redundant weights and computation greatly. We train our models on groups of surface datasets. The model can successfully classify/segment surface defects with an Intel i3-4010U CPU within 30 ms. Our model obtains similar accuracy with MobileNetV2 while only has less than its 1/3 FLOPs (floating-point operations per second) and 1/8 weights. Our experiments indicate CNNs can be compact and hardware-friendly for future applications in the automated surface inspection (ASI).


2018 ◽  
Author(s):  
Yvens R. Serpa ◽  
Mária Andréia F. Rodrigues

Graphics applications with visual quality and increasing levels of interactivity have been of fundamental interest. Within this context, visibility culling algorithms restrict the processing to the objects actually visible by the observer, speeding up the scene visualization. However, state-of-the-art solutions still require a high computational cost, do not scale in complex scenarios and are limited in generalization. In contrast, this work presents RHView, an innovative generic solution for static and dynamics scenes, which is based on a replicated space-partitioning structure and heuristics. RHView uses novel heuristics for rendering time estimation and balance between processing cost and triangle removal accuracy, while maintaining interactive frame rates, even in scenes with billions of triangles. It is the only solution currently available to reduce draw calls, one of the factors that have the greatest impact on graphics processing. Systematic tests have shown that RHView can be up to 2.8 times faster than the state-of-the-art algorithms.


2019 ◽  
Author(s):  
Wout Bittremieux ◽  
Kris Laukens ◽  
William Stafford Noble

AbstractOpen modification searching (OMS) is a powerful search strategy to identify peptides with any type of modification. OMS works by using a very wide precursor mass window to allow modified spectra to match against their unmodified variants, after which the modification types can be inferred from the corresponding precursor mass differences. A disadvantage of this strategy, however, is the large computational cost, because each query spectrum has to be compared against a multitude of candidate peptides.We have previously introduced the ANN-SoLo tool for fast and accurate open spectral library searching. ANN-SoLo uses approximate nearest neighbor indexing to speed up OMS by selecting only a limited number of the most relevant library spectra to compare to an unknown query spectrum. Here we demonstrate how this candidate selection procedure can be further optimized using graphics processing units. Additionally, we introduce a feature hashing scheme to convert high-resolution spectra to low-dimensional vectors. Based on these algorithmic advances, along with low-level code optimizations, the new version of ANN-SoLo is up to an order of magnitude faster than its initial version. This makes it possible to efficiently perform open searches on a large scale to gain a deeper understanding about the protein modification landscape. We demonstrate the computational efficiency and identification performance of ANN-SoLo based on a large data set of the draft human proteome.ANN-SoLo is implemented in Python and C++. It is freely available under the Apache 2.0 license at https://github.com/bittremieux/ANN-SoLo.


Sign in / Sign up

Export Citation Format

Share Document