mobile gpu Latest Research Papers

Cluster computing has attracted much attention as an effective way of solving large-scale problems. However, only a few attempts have been made to explore mobile computing clusters that can be easily built using commodity smartphones and tablets. To investigate the possibility of mobile cluster-based rendering of large datasets, we developed a mobile GPU ray tracer that renders nontrivial 3D scenes with many millions of triangles at an interactive frame rate on a small-scale mobile cluster. To cope with the limited processing power and memory space, we first present an effective 3D scene representation scheme suitable for mobile GPU rendering. Then, to avoid performance impairment caused by the high latency and low bandwidth of mobile networks, we propose using a static load balancing strategy, which we found to be more appropriate for the vulnerable mobile clustering environment than a dynamic strategy. Our mobile distributed rendering system achieved a few frames per second when ray tracing 1024 × 1024 images, using only 16 low-end smartphones, for large 3D scenes, some with more than 10 million triangles. Through a conceptual demonstration, we also show that the presented rendering scheme can be effectively explored for augmenting real scene images, captured or perceived by augmented and mixed reality devices, with high quality ray-traced images.

Download Full-text

Exocytotic vesicle fusion classification for early disease diagnosis using a mobile GPU microsystem

Neural Computing and Applications ◽

10.1007/s00521-021-06676-2 ◽

2021 ◽

Author(s):

Szymon Szczęsny ◽

Paweł Pietrzak

Keyword(s):

Neural Network ◽

Neural Networks ◽

System Performance ◽

Input Data ◽

Sampling Period ◽

Disease Diagnosis ◽

Massively Parallel ◽

Early Disease ◽

Amperometric Sensors ◽

Mobile Gpu

AbstractThis work addresses monitoring vesicle fusions occurring during the exocytosis process, which is the main way of intercellular communication. Certain vesicle behaviors may also indicate certain precancerous conditions in cells. For this purpose we designed a system able to detect two main types of exocytosis: a full fusion and a kiss-and-run fusion, based on data from multiple amperometric sensors at once. It uses many instances of small perceptron neural networks in a massively parallel manner and runs on Jetson TX2 platform, which uses a GPU for parallel processing. Based on performed benchmarking, approximately 140,000 sensors can be processed in real time within the sensor sampling period equal to 10 ms and an accuracy of 99$$\%$$ % . The work includes an analysis of the system performance with varying neural network sizes, input data sizes, and sampling periods of fusion signals.

Download Full-text

Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Computers ◽

10.3390/computers10080104 ◽

2021 ◽

Vol 10 (8) ◽

pp. 104

Author(s):

Evgeny Ponomarev ◽

Sergey Matveev ◽

Ivan Oseledets ◽

Valery Glukhov

Keyword(s):

Neural Network ◽

Experimental Data ◽

Neural Networks ◽

Deep Learning ◽

Network Inference ◽

Prediction Models ◽

Specific Problem ◽

Research Community ◽

Specific Task ◽

Mobile Gpu

A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may not be the best choice. In order to obtain a better approximation of latency, the research community uses lookup tables of all possible layers for the calculation of the inference on a mobile CPU. It requires only a small number of experiments. Unfortunately, on a mobile GPU, this method is not applicable in a straightforward way and shows low precision. In this work, we consider latency approximation on a mobile GPU as a data- and hardware-specific problem. Our main goal is to construct a convenient Latency Estimation Tool for Investigation (LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we make tools that provide a convenient way to conduct massive experiments on different target devices focusing on a mobile GPU. After evaluation of the dataset, one can train the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of the popular NAS-Benchmark 101 dataset for two different mobile GPU.

Download Full-text

Detecting the Early Flowering Stage of Tea Chrysanthemum Using the F-YOLO Model

Agronomy ◽

10.3390/agronomy11050834 ◽

2021 ◽

Vol 11 (5) ◽

pp. 834

Author(s):

Chao Qi ◽

Innocent Nyalala ◽

Kunjie Chen

Keyword(s):

Feature Fusion ◽

Gradient Flow ◽

Flowering Stage ◽

Illumination Variation ◽

Detection Model ◽

Multi Scale ◽

Mobile Gpu ◽

The Cross ◽

Real Challenge ◽

Harvesting Robot

Detecting the flowering stage of tea chrysanthemum is a key mechanism of the selective chrysanthemum harvesting robot. However, under complex, unstructured scenarios, such as illumination variation, occlusion, and overlapping, detecting tea chrysanthemum at a specific flowering stage is a real challenge. This paper proposes a highly fused, lightweight detection model named the Fusion-YOLO (F-YOLO) model. First, cutout and mosaic input components are equipped, with which the fusion module can better understand the features of the chrysanthemum through slicing. In the backbone component, the Cross-Stage Partial DenseNet (CSPDenseNet) network is used as the main network, and feature fusion modules are added to maximize the gradient flow difference. Next, in the neck component, the Cross-Stage Partial ResNeXt (CSPResNeXt) network is taken as the main network to truncate the redundant gradient flow. Finally, in the head component, the multi-scale fusion network is adopted to aggregate the parameters of two different detection layers from different backbone layers. The results show that the F-YOLO model is superior to state-of-the-art technologies in terms of object detection, that this method can be deployed on a single mobile GPU, and that it will be one of key technologies to build a selective chrysanthemum harvesting robot system in the future.

Download Full-text

A model of architecture for estimating GPU processing performance and power

Design Automation for Embedded Systems ◽

10.1007/s10617-020-09244-4 ◽

2021 ◽

Author(s):

Saman Payvar ◽

Maxime Pelcat ◽

Timo D. Hämäläinen

Keyword(s):

Design Space Exploration ◽

Heterogeneous Computing ◽

System Level ◽

Power Modeling ◽

Average Fidelity ◽

Processing Elements ◽

Mobile Gpu ◽

Predicting Performance ◽

Statistical Application ◽

Architecture Modeling

AbstractEfficient usage of heterogeneous computing architectures requires distribution of the workload on available processing elements. Traditionally, the mapping is based on information acquired from application profiling and utilized in architecture exploration. To reduce the amount of manual work required, statistical application modeling and architecture modeling can be combined with exploration heuristics. While the application modeling side of the problem has been studied extensively, architecture modeling has received less attention. Linear System Level Architecture (LSLA) is a Model of Architecture that aims at separating the architectural concerns from algorithmic ones when predicting performance. This work builds on the LSLA model and introduces non-linear semantics, specifically to support GPU performance and power modeling, by modeling also the degree of parallelism. The model is evaluated with three signal processing applications with various workload distributions on a desktop GPU and mobile GPU. The measured average fidelity of the new model is 93% for performance, and 84% for power, which can fit design space exploration purposes.

Download Full-text

Towards Real-time CNN Inference from a Video Stream on a Mobile GPU (WiP Paper)

The 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems ◽

10.1145/3372799.3394366 ◽

2020 ◽

Author(s):

Chanyoung Oh ◽

Gunju Park ◽

Sumin Kim ◽

Dohee Kim ◽

Youngmin Yi

Keyword(s):

Real Time ◽

Video Stream ◽

Mobile Gpu

Download Full-text

An efficient pedestrian detection network on mobile GPU with millisecond scale

2019 Chinese Automation Congress (CAC) ◽

10.1109/cac48633.2019.8996619 ◽

2019 ◽

Author(s):

Qiong Bai ◽

Jingmin Xin ◽

Hu Ye ◽

Qinjie Wang ◽

Peiwen Shi ◽

...

Keyword(s):

Pedestrian Detection ◽

Mobile Gpu

Download Full-text

A Multifunction Unit for Matrix, Vector and Elementary Functions Computation in Mobile GPU Shaders

JSTS Journal of Semiconductor Technology and Science ◽

10.5573/jsts.2019.19.1.097 ◽

2019 ◽

Vol 19 (1) ◽

pp. 97-108

Author(s):

Byeong-Gyu Nam

Keyword(s):

Elementary Functions ◽

Mobile Gpu ◽

Matrix Vector

Download Full-text

Fast tetrahedral mesh generation and segmentation of an atlas-based heart model using a periodic uniform grid

Russian Journal of Numerical Analysis and Mathematical Modelling ◽

10.1515/rnam-2018-0026 ◽

2018 ◽

Vol 33 (5) ◽

pp. 315-323 ◽

Cited By ~ 1

Author(s):

Eugene Vasilev ◽

Dmitry Lachinov ◽

Anton Grishin ◽

Vadim Turlapov

Keyword(s):

Complex Shape ◽

Parallel Implementation ◽

Boundary Surface ◽

Tetrahedral Mesh ◽

Uniform Grid ◽

Element Mesh ◽

Mobile Gpu ◽

Polygonal Surface ◽

Tetrahedral Mesh Generation ◽

Fast Procedure

Abstract A fast procedure for generation of regular tetrahedral finite element mesh for objects with complex shape cavities is proposed. The procedure like LBIE-Mesher can generate tetrahedral meshes for the volume interior to a polygonal surface, or for an interval volume between two surfaces having a complex shape and defined in STL-format. This procedure consists of several stages: generation of a regular tetrahedral mesh that fills the volume of the required object; generation of clipping for the uniform grid parts by a boundary surface; shifting vertices of the boundary layer to align onto the surface.We present a sequential and parallel implementation of the algorithm and compare their performance with existing generators of tetrahedral grids such as TetGen, NETGEN, and CGAL. The current version of the algorithm using the mobile GPU is about 5 times faster than NETGEN. The source code of the developed software is available on GitHub.

Download Full-text

A Timing Side-Channel Attack on a Mobile GPU

2018 IEEE 36th International Conference on Computer Design (ICCD) ◽

10.1109/iccd.2018.00020 ◽

2018 ◽

Cited By ~ 6

Author(s):

Elmira Karimi ◽

Zhen Hang Jiang ◽

Yunsi Fei ◽

David Kaeli

Keyword(s):

Side Channel ◽

Side Channel Attack ◽

Mobile Gpu

Download Full-text

mobile gpu
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Efficient Ray Tracing of Large 3D Scenes for Mobile Distributed Computing Environments

Exocytotic vesicle fusion classification for early disease diagnosis using a mobile GPU microsystem

Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Detecting the Early Flowering Stage of Tea Chrysanthemum Using the F-YOLO Model

A model of architecture for estimating GPU processing performance and power

Towards Real-time CNN Inference from a Video Stream on a Mobile GPU (WiP Paper)

An efficient pedestrian detection network on mobile GPU with millisecond scale

A Multifunction Unit for Matrix, Vector and Elementary Functions Computation in Mobile GPU Shaders

Fast tetrahedral mesh generation and segmentation of an atlas-based heart model using a periodic uniform grid

A Timing Side-Channel Attack on a Mobile GPU

Export Citation Format

mobile gpuRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Efficient Ray Tracing of Large 3D Scenes for Mobile Distributed Computing Environments

Exocytotic vesicle fusion classification for early disease diagnosis using a mobile GPU microsystem

Latency Estimation Tool and Investigation of Neural Networks Inference on Mobile GPU

Detecting the Early Flowering Stage of Tea Chrysanthemum Using the F-YOLO Model

A model of architecture for estimating GPU processing performance and power

Towards Real-time CNN Inference from a Video Stream on a Mobile GPU (WiP Paper)

An efficient pedestrian detection network on mobile GPU with millisecond scale

A Multifunction Unit for Matrix, Vector and Elementary Functions Computation in Mobile GPU Shaders

Fast tetrahedral mesh generation and segmentation of an atlas-based heart model using a periodic uniform grid

A Timing Side-Channel Attack on a Mobile GPU

mobile gpu
Recently Published Documents