scholarly journals A dynamic discarding technique to increase speed and preserve accuracy for YOLOv3

Author(s):  
Ignacio Martinez-Alpiste ◽  
Gelayol Golcarenarenji ◽  
Qi Wang ◽  
Jose Maria Alcaraz-Calero

AbstractThis paper proposes an acceleration technique to minimise the unnecessary operations on a state-of-the-art machine learning model and thus to improve the processing speed while maintaining the accuracy. After the study of the main bottlenecks that negatively affect the performance of convolutional neural networks, this paper designs and implements a discarding technique for YOLOv3-based algorithms to increase the speed and maintain accuracy. After applying the discarding technique, YOLOv3 can achieve a 22% of improvement in terms of speed. Moreover, the results of this new discarding technique were tested on Tiny-YOLOv3 with three output layers on an autonomous vehicle for pedestrian detection and it achieved an improvement of 48.7% in speed. The dynamic discarding technique just needs one training process to create the model and thus execute the approach, which preserves accuracy. The improved detector based on the discarding technique is able to readily alert the operator of the autonomous vehicle to take the emergency brake of the vehicle in order to avoid collision and consequently save lives.

2016 ◽  
Vol 21 (9) ◽  
pp. 998-1003 ◽  
Author(s):  
Oliver Dürr ◽  
Beate Sick

Deep learning methods are currently outperforming traditional state-of-the-art computer vision algorithms in diverse applications and recently even surpassed human performance in object recognition. Here we demonstrate the potential of deep learning methods to high-content screening–based phenotype classification. We trained a deep learning classifier in the form of convolutional neural networks with approximately 40,000 publicly available single-cell images from samples treated with compounds from four classes known to lead to different phenotypes. The input data consisted of multichannel images. The construction of appropriate feature definitions was part of the training and carried out by the convolutional network, without the need for expert knowledge or handcrafted features. We compare our results against the recent state-of-the-art pipeline in which predefined features are extracted from each cell using specialized software and then fed into various machine learning algorithms (support vector machine, Fisher linear discriminant, random forest) for classification. The performance of all classification approaches is evaluated on an untouched test image set with known phenotype classes. Compared to the best reference machine learning algorithm, the misclassification rate is reduced from 8.9% to 6.6%.


2020 ◽  
pp. 352-361
Author(s):  
P.I. Andon ◽  
◽  
A.M. Glybovets ◽  
V.V. Kuryliak ◽  
◽  
...  

This paper describes the main areas of research in the field of developing computer models for the automatization of digital image recognition. The concept of the semantic image model is introduced and the implementation of the machine learning model for solving the problem of automatic construction of such a model is described. The semantic model consists of a list of objects represented in the image and their relationships. The developed model was compared to other solutions and showed better results in all but one case. The performance of the model is justified by the use of the latest achievements of machine learning, including ZNM, TL, Faster R-CNN, and VGG16. Much of the links represented in the image are spatial links, so for the model to work better, you need to use that fact in designing it, which was done.


2018 ◽  
Vol 7 (3.31) ◽  
pp. 66
Author(s):  
P Syamala Rao ◽  
Dr G.P.SaradhiVarma ◽  
Rajasekhar Mutukuri

Training a large set of data takes GPU days using Deep convolution neural networks which are a time taking process. Self-driving cars require very low latency for pedestrian detection. Image recognition constrained by limited processing resources for mobile phones. The computation speed of the training set determines that in these situations convolution neural networks was a success. For large filters, Conventional Faster Fourier Transform based convolution is preferably fast, yet in case of small, 3 × 3 filters state of the art convolutional neural networks is used. By using Winograd's minimal filtering algorithms the new class of fast algorithms for convolutional neural networks was introduced by us. Instead of small tiles, minimal complexity convolution was computed by the algorithms, this increases the computing speed with small batch sizes and small filters.  With the VGG network, we benchmark a GPU implementation of our algorithm and at batch sizes from 1 to 64 state of the art throughput was shown. 


2021 ◽  
Vol 24 (1) ◽  
Author(s):  
Facundo Manuel Quiroga

Neural networks are currently the state-of-the-art for many tasks.Invariance and same-equivariance are two fundamental properties to characterize how a model reacts to transformation: equivariance is the generalization of both. Equivariance to transformations of the inputs can be necessary properties of the network for certain tasks. Data augmentation and specially designed layers provide a way for these properties to be learned by networks. However, the mechanisms by which networks encode them is not well understood.We propose several transformational measures to quantify the invariance and same-equivariance of individual activations of a network. Analysis of these results can yield insights into the encoding and distribution of invariance in all layers of a network. The measures are simple to understand and efficient to run, and have been implemented in an open-source library. We perform experiments to validate the measures and understand their properties, showing their stability and effectiveness. Afterwards, we use the measures to characterize common network architectures in terms of these properties, using affine transformations. Our results show, for example, that the distribution of invariance across the layers of a network has well a defined structure that is dependent only on the network design and not on the training process.


2020 ◽  
Vol 14 (10) ◽  
pp. 1319-1327 ◽  
Author(s):  
Pedro Augusto Pinho Ferraz ◽  
Bernardo Augusto Godinho de Oliveira ◽  
Flávia Magalhães Freitas Ferreira ◽  
Carlos Augusto Paiva da Silva Martins

Author(s):  
Jorge F. Lazo ◽  
Aldo Marzullo ◽  
Sara Moccia ◽  
Michele Catellani ◽  
Benoit Rosa ◽  
...  

Abstract Purpose Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ($$m_1$$ m 1 ) and Mask-RCNN ($$m_2$$ m 2 ), which are fed with single still-frames I(t). The other two models ($$M_1$$ M 1 , $$M_2$$ M 2 ) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. $$M_1$$ M 1 , $$M_2$$ M 2 are fed with triplets of frames ($$I(t-1)$$ I ( t - 1 ) , I(t), $$I(t+1)$$ I ( t + 1 ) ) to produce the segmentation for I(t). Results The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.


2021 ◽  
Vol 18 (3) ◽  
pp. 172988142110105
Author(s):  
Jnana Sai Abhishek Varma Gokaraju ◽  
Weon Keun Song ◽  
Min-Ho Ka ◽  
Somyot Kaitwanidvilai

The study investigated object detection and classification based on both Doppler radar spectrograms and vision images using two deep convolutional neural networks. The kinematic models for a walking human and a bird flapping its wings were incorporated into MATLAB simulations to create data sets. The dynamic simulator identified the final position of each ellipsoidal body segment taking its rotational motion into consideration in addition to its bulk motion at each sampling point to describe its specific motion naturally. The total motion induced a micro-Doppler effect and created a micro-Doppler signature that varied in response to changes in the input parameters, such as varying body segment size, velocity, and radar location. Micro-Doppler signature identification of the radar signals returned from the target objects that were animated by the simulator required kinematic modeling based on a short-time Fourier transform analysis of the signals. Both You Only Look Once V3 and Inception V3 were used for the detection and classification of the objects with different red, green, blue colors on black or white backgrounds. The results suggested that clear micro-Doppler signature image-based object recognition could be achieved in low-visibility conditions. This feasibility study demonstrated the application possibility of Doppler radar to autonomous vehicle driving as a backup sensor for cameras in darkness. In this study, the first successful attempt of animated kinematic models and their synchronized radar spectrograms to object recognition was made.


2020 ◽  
Vol 6 ◽  
Author(s):  
Jaime de Miguel Rodríguez ◽  
Maria Eugenia Villafañe ◽  
Luka Piškorec ◽  
Fernando Sancho Caparrini

Abstract This work presents a methodology for the generation of novel 3D objects resembling wireframes of building types. These result from the reconstruction of interpolated locations within the learnt distribution of variational autoencoders (VAEs), a deep generative machine learning model based on neural networks. The data set used features a scheme for geometry representation based on a ‘connectivity map’ that is especially suited to express the wireframe objects that compose it. Additionally, the input samples are generated through ‘parametric augmentation’, a strategy proposed in this study that creates coherent variations among data by enabling a set of parameters to alter representative features on a given building type. In the experiments that are described in this paper, more than 150 k input samples belonging to two building types have been processed during the training of a VAE model. The main contribution of this paper has been to explore parametric augmentation for the generation of large data sets of 3D geometries, showcasing its problems and limitations in the context of neural networks and VAEs. Results show that the generation of interpolated hybrid geometries is a challenging task. Despite the difficulty of the endeavour, promising advances are presented.


Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 624
Author(s):  
Stefan Rohrmanstorfer ◽  
Mikhail Komarov ◽  
Felix Mödritscher

With the always increasing amount of image data, it has become a necessity to automatically look for and process information in these images. As fashion is captured in images, the fashion sector provides the perfect foundation to be supported by the integration of a service or application that is built on an image classification model. In this article, the state of the art for image classification is analyzed and discussed. Based on the elaborated knowledge, four different approaches will be implemented to successfully extract features out of fashion data. For this purpose, a human-worn fashion dataset with 2567 images was created, but it was significantly enlarged by the performed image operations. The results show that convolutional neural networks are the undisputed standard for classifying images, and that TensorFlow is the best library to build them. Moreover, through the introduction of dropout layers, data augmentation and transfer learning, model overfitting was successfully prevented, and it was possible to incrementally improve the validation accuracy of the created dataset from an initial 69% to a final validation accuracy of 84%. More distinct apparel like trousers, shoes and hats were better classified than other upper body clothes.


Algorithms ◽  
2021 ◽  
Vol 14 (2) ◽  
pp. 39
Author(s):  
Carlos Lassance ◽  
Vincent Gripon ◽  
Antonio Ortega

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.


Sign in / Sign up

Export Citation Format

Share Document