mobile gpu
Recently Published Documents


TOTAL DOCUMENTS

69
(FIVE YEARS 8)

H-INDEX

8
(FIVE YEARS 0)

Sensors ◽  
2022 ◽  
Vol 22 (2) ◽  
pp. 491
Author(s):  
Woong Seo ◽  
Sanghun Park ◽  
Insung Ihm

Cluster computing has attracted much attention as an effective way of solving large-scale problems. However, only a few attempts have been made to explore mobile computing clusters that can be easily built using commodity smartphones and tablets. To investigate the possibility of mobile cluster-based rendering of large datasets, we developed a mobile GPU ray tracer that renders nontrivial 3D scenes with many millions of triangles at an interactive frame rate on a small-scale mobile cluster. To cope with the limited processing power and memory space, we first present an effective 3D scene representation scheme suitable for mobile GPU rendering. Then, to avoid performance impairment caused by the high latency and low bandwidth of mobile networks, we propose using a static load balancing strategy, which we found to be more appropriate for the vulnerable mobile clustering environment than a dynamic strategy. Our mobile distributed rendering system achieved a few frames per second when ray tracing 1024 × 1024 images, using only 16 low-end smartphones, for large 3D scenes, some with more than 10 million triangles. Through a conceptual demonstration, we also show that the presented rendering scheme can be effectively explored for augmenting real scene images, captured or perceived by augmented and mixed reality devices, with high quality ray-traced images.


Author(s):  
Szymon Szczęsny ◽  
Paweł Pietrzak

AbstractThis work addresses monitoring vesicle fusions occurring during the exocytosis process, which is the main way of intercellular communication. Certain vesicle behaviors may also indicate certain precancerous conditions in cells. For this purpose we designed a system able to detect two main types of exocytosis: a full fusion and a kiss-and-run fusion, based on data from multiple amperometric sensors at once. It uses many instances of small perceptron neural networks in a massively parallel manner and runs on Jetson TX2 platform, which uses a GPU for parallel processing. Based on performed benchmarking, approximately 140,000 sensors can be processed in real time within the sensor sampling period equal to 10 ms and an accuracy of 99$$\%$$ % . The work includes an analysis of the system performance with varying neural network sizes, input data sizes, and sampling periods of fusion signals.


Computers ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 104
Author(s):  
Evgeny Ponomarev ◽  
Sergey Matveev ◽  
Ivan Oseledets ◽  
Valery Glukhov

A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may not be the best choice. In order to obtain a better approximation of latency, the research community uses lookup tables of all possible layers for the calculation of the inference on a mobile CPU. It requires only a small number of experiments. Unfortunately, on a mobile GPU, this method is not applicable in a straightforward way and shows low precision. In this work, we consider latency approximation on a mobile GPU as a data- and hardware-specific problem. Our main goal is to construct a convenient Latency Estimation Tool for Investigation (LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we make tools that provide a convenient way to conduct massive experiments on different target devices focusing on a mobile GPU. After evaluation of the dataset, one can train the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU.


Agronomy ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 834
Author(s):  
Chao Qi ◽  
Innocent Nyalala ◽  
Kunjie Chen

Detecting the flowering stage of tea chrysanthemum is a key mechanism of the selective chrysanthemum harvesting robot. However, under complex, unstructured scenarios, such as illumination variation, occlusion, and overlapping, detecting tea chrysanthemum at a specific flowering stage is a real challenge. This paper proposes a highly fused, lightweight detection model named the Fusion-YOLO (F-YOLO) model. First, cutout and mosaic input components are equipped, with which the fusion module can better understand the features of the chrysanthemum through slicing. In the backbone component, the Cross-Stage Partial DenseNet (CSPDenseNet) network is used as the main network, and feature fusion modules are added to maximize the gradient flow difference. Next, in the neck component, the Cross-Stage Partial ResNeXt (CSPResNeXt) network is taken as the main network to truncate the redundant gradient flow. Finally, in the head component, the multi-scale fusion network is adopted to aggregate the parameters of two different detection layers from different backbone layers. The results show that the F-YOLO model is superior to state-of-the-art technologies in terms of object detection, that this method can be deployed on a single mobile GPU, and that it will be one of key technologies to build a selective chrysanthemum harvesting robot system in the future.


Author(s):  
Saman Payvar ◽  
Maxime Pelcat ◽  
Timo D. Hämäläinen

AbstractEfficient usage of heterogeneous computing architectures requires distribution of the workload on available processing elements. Traditionally, the mapping is based on information acquired from application profiling and utilized in architecture exploration. To reduce the amount of manual work required, statistical application modeling and architecture modeling can be combined with exploration heuristics. While the application modeling side of the problem has been studied extensively, architecture modeling has received less attention. Linear System Level Architecture (LSLA) is a Model of Architecture that aims at separating the architectural concerns from algorithmic ones when predicting performance. This work builds on the LSLA model and introduces non-linear semantics, specifically to support GPU performance and power modeling, by modeling also the degree of parallelism. The model is evaluated with three signal processing applications with various workload distributions on a desktop GPU and mobile GPU. The measured average fidelity of the new model is 93% for performance, and 84% for power, which can fit design space exploration purposes.


Author(s):  
Qiong Bai ◽  
Jingmin Xin ◽  
Hu Ye ◽  
Qinjie Wang ◽  
Peiwen Shi ◽  
...  

2018 ◽  
Vol 33 (5) ◽  
pp. 315-323 ◽  
Author(s):  
Eugene Vasilev ◽  
Dmitry Lachinov ◽  
Anton Grishin ◽  
Vadim Turlapov

Abstract A fast procedure for generation of regular tetrahedral finite element mesh for objects with complex shape cavities is proposed. The procedure like LBIE-Mesher can generate tetrahedral meshes for the volume interior to a polygonal surface, or for an interval volume between two surfaces having a complex shape and defined in STL-format. This procedure consists of several stages: generation of a regular tetrahedral mesh that fills the volume of the required object; generation of clipping for the uniform grid parts by a boundary surface; shifting vertices of the boundary layer to align onto the surface.We present a sequential and parallel implementation of the algorithm and compare their performance with existing generators of tetrahedral grids such as TetGen, NETGEN, and CGAL. The current version of the algorithm using the mobile GPU is about 5 times faster than NETGEN. The source code of the developed software is available on GitHub.


Sign in / Sign up

Export Citation Format

Share Document