scholarly journals Zero-skipping in CapsNet. Is it worth it?

10.29007/cd8h ◽  
2020 ◽  
Author(s):  
Ramin Sharifi ◽  
Pouya Shiri ◽  
Amirali Baniasadi

Capsule networks (CapsNet) are the next generation of neural networks. CapsNet can be used for classification of data of different types. Today’s General Purpose Graphical Processing Units (GPGPUs) are more capable than before and let us train these complex networks. However, time and energy consumption remains a challenge. In this work, we investigate if skipping trivial operations i.e. multiplication by zero in CapsNet, can possibly save energy. We base our analysis on the number of multiplications by zero detected while training CapsNet on MNIST and Fashion- MNIST datasets.

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Eva Volna ◽  
Martin Kotyrba ◽  
Hashim Habiballa

The paper deals with ECG prediction based on neural networks classification of different types of time courses of ECG signals. The main objective is to recognise normal cycles and arrhythmias and perform further diagnosis. We proposed two detection systems that have been created with usage of neural networks. The experimental part makes it possible to load ECG signals, preprocess them, and classify them into given classes. Outputs from the classifiers carry a predictive character. All experimental results from both of the proposed classifiers are mutually compared in the conclusion. We also experimented with the new method of time series transparent prediction based on fuzzy transform with linguistic IF-THEN rules. Preliminary results show interesting results based on the unique capability of this approach bringing natural language interpretation of particular prediction, that is, the properties of time series.


2014 ◽  
Vol 596 ◽  
pp. 276-279
Author(s):  
Xiao Hui Pan

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.


2012 ◽  
Vol 25 (10) ◽  
pp. 1443-1461 ◽  
Author(s):  
Shivani Raghav ◽  
Andrea Marongiu ◽  
Christian Pinto ◽  
Martino Ruggiero ◽  
David Atienza ◽  
...  

Author(s):  
Sajid Nazir ◽  
Shushma Patel ◽  
Dilip Patel

The increased processing power of graphical processing units (GPUs) and the availability of large image datasets has fostered a renewed interest in extracting semantic information from images. Promising results for complex image categorization problems have been achieved using deep learning, with neural networks comprised of many layers. Convolutional neural networks (CNN) are one such architecture which provides more opportunities for image classification. Advances in CNN enable the development of training models using large labelled image datasets, but the hyper parameters need to be specified, which is challenging and complex due to the large number of parameters. A substantial amount of computational power and processing time is required to determine the optimal hyper parameters to define a model yielding good results. This article provides a survey of the hyper parameter search and optimization methods for CNN architectures.


2011 ◽  
Vol 14 (3) ◽  
pp. 603-612 ◽  
Author(s):  
P. A. Crous ◽  
J. E. van Zyl ◽  
Y. Roodt

The Engineering discipline has relied on computers to perform numerical calculations in many of its sub-disciplines over the last decades. The advent of graphical processing units (GPUs), parallel stream processors, has the potential to speed up generic simulations that facilitate engineering applications aside from traditional computer graphics applications, using GPGPU (general purpose programming on the GPU). The potential benefits of exploiting the GPU for general purpose computation require the program to be highly arithmetic intensive and also data independent. This paper looks at the specific application of the Conjugate Gradient method used in hydraulic network solvers on the GPU and compares the results to conventional central processing unit (CPU) implementations. The results indicate that the GPU becomes more efficient as the data set size increases. However, with the current hardware and the implementation of the Conjugate Gradient algorithm, the application of stream processing to hydraulic network solvers is only faster and more efficient for exceptionally large water distribution models, which are seldom found in practice.


Author(s):  
I. Yu. Sesin ◽  
R. G. Bolbakov

General Purpose computing for Graphical Processing Units (GPGPU) technology is a powerful tool for offloading parallel data processing tasks to Graphical Processing Units (GPUs). This technology finds its use in variety of domains – from science and commerce to hobbyists. GPU-run general-purpose programs will inevitably run into performance issues stemming from code branch predication. Code predication is a GPU feature that makes both conditional branches execute, masking the results of incorrect branch. This leads to considerable performance losses for GPU programs that have large amounts of code hidden away behind conditional operators. This paper focuses on the analysis of existing approaches to improving software performance in the context of relieving the aforementioned performance loss. Description of said approaches is provided, along with their upsides, downsides and extents of their applicability and whether they address the outlined problem. Covered approaches include: optimizing compilers, JIT-compilation, branch predictor, speculative execution, adaptive optimization, run-time algorithm specialization, profile-guided optimization. It is shown that the aforementioned methods are mostly catered to CPU-specific issues and are generally not applicable, as far as branch-predication performance loss is concerned. Lastly, we outline the need for a separate performance improving approach, addressing specifics of branch predication and GPGPU workflow.


2020 ◽  
Vol 44 (2) ◽  
pp. 236-243 ◽  
Author(s):  
B.V. Faizov ◽  
V.I. Shakhuro ◽  
V.V. Sanzharov ◽  
A.S. Konushin

The paper studies the possibility of using neural networks for the classification of objects that are few or absent at all in the training set. The task is illustrated by the example of classification of rare traffic signs. We consider neural networks trained using a contrastive loss function and its modifications, also we use different methods for generating synthetic samples for classification problems. As a basic method, the indexing of classes using neural network features is used. A comparison is made of classifiers trained with three different types of synthetic samples and their mixtures with real data. We propose a method of classification of rare traffic signs using a neural network discriminator of rare and frequent signs. The experimental evaluation shows that the proposed method allows rare traffic signs to be classified without significant loss of frequent sign classification quality.


2014 ◽  
Vol 22 (2) ◽  
pp. 125-139 ◽  
Author(s):  
Myoungsoo Jung ◽  
Ellis H. Wilson ◽  
Wonil Choi ◽  
John Shalf ◽  
Hasan Metin Aktulga ◽  
...  

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.


Sign in / Sign up

Export Citation Format

Share Document