scholarly journals Unified Framework for Vision Inference on the Edge

Author(s):  
Vysakh S Mohan

Edge processing for computer vision systems enable incorporating visual intelligence to mobile robotics platforms. Demand for low power, low cost and small form factor devices are on the rise.This work proposes a unified platform to generate deep learning models compatible on edge devices from Intel, NVIDIA and XaLogic. The platform enables users to create custom data annotations,train neural networks and generate edge compatible inference models. As a testimony to the tools ease of use and flexibility, we explore two use cases — vision powered prosthetic hand and drone vision. Neural network models for these use cases will be built using the proposed pipeline and will be open-sourced. Online and offline versions of the tool and corresponding inference modules for edge devices will also be made public for users to create custom computer vision use cases.

2020 ◽  
Author(s):  
Vysakh S Mohan

Edge processing for computer vision systems enable incorporating visual intelligence to mobile robotics platforms. Demand for low power, low cost and small form factor devices are on the rise.This work proposes a unified platform to generate deep learning models compatible on edge devices from Intel, NVIDIA and XaLogic. The platform enables users to create custom data annotations,train neural networks and generate edge compatible inference models. As a testimony to the tools ease of use and flexibility, we explore two use cases — vision powered prosthetic hand and drone vision. Neural network models for these use cases will be built using the proposed pipeline and will be open-sourced. Online and offline versions of the tool and corresponding inference modules for edge devices will also be made public for users to create custom computer vision use cases.


Author(s):  
Megha J Panicker ◽  
Vikas Upadhayay ◽  
Gunjan Sethi ◽  
Vrinda Mathur

In the modern era, image captioning has become one of the most widely required tools. Moreover, there are inbuilt applications that generate and provide a caption for a certain image, all these things are done with the help of deep neural network models. The process of generating a description of an image is called image captioning. It requires recognizing the important objects, their attributes, and the relationships among the objects in an image. It generates syntactically and semantically correct sentences.In this paper, we present a deep learning model to describe images and generate captions using computer vision and machine translation. This paper aims to detect different objects found in an image, recognize the relationships between those objects and generate captions. The dataset used is Flickr8k and the programming language used was Python3, and an ML technique called Transfer Learning will be implemented with the help of the Xception model, to demonstrate the proposed experiment. This paper will also elaborate on the functions and structure of the various Neural networks involved. Generating image captions is an important aspect of Computer Vision and Natural language processing. Image caption generators can find applications in Image segmentation as used by Facebook and Google Photos, and even more so, its use can be extended to video frames. They will easily automate the job of a person who has to interpret images. Not to mention it has immense scope in helping visually impaired people.


2020 ◽  
Vol 45 (3) ◽  
pp. 179-193
Author(s):  
Andrzej Brodzicki ◽  
Michal Piekarski ◽  
Dariusz Kucharski ◽  
Joanna Jaworek-Korjakowska ◽  
Marek Gorgon

AbstractDeep learning methods, used in machine vision challenges, often face the problem of the amount and quality of data. To address this issue, we investigate the transfer learning method. In this study, we briefly describe the idea and introduce two main strategies of transfer learning. We also present the widely-used neural network models, that in recent years performed best in ImageNet classification challenges. Furthermore, we shortly describe three different experiments from computer vision field, that confirm the developed algorithms ability to classify images with overall accuracy 87.2-95%. Achieved numbers are state-of-the-art results in melanoma thickness prediction, anomaly detection and Clostridium di cile cytotoxicity classification problems.


2019 ◽  
Author(s):  
J. Christopher D. Terry ◽  
Helen E. Roy ◽  
Tom A. August

AbstractThe accurate identification of species in images submitted by citizen scientists is currently a bottleneck for many data uses. Machine learning tools offer the potential to provide rapid, objective and scalable species identification for the benefit of many aspects of ecological science. Currently, most approaches only make use of image pixel data for classification. However, an experienced naturalist would also use a wide variety of contextual information such as the location and date of recording.Here, we examine the automated identification of ladybird (Coccinellidae) records from the British Isles submitted to the UK Ladybird Survey, a volunteer-led mass participation recording scheme. Each image is associated with metadata; a date, location and recorder ID, which can be cross-referenced with other data sources to determine local weather at the time of recording, habitat types and the experience of the observer. We built multi-input neural network models that synthesise metadata and images to identify records to species level.We show that machine learning models can effectively harness contextual information to improve the interpretation of images. Against an image-only baseline of 48.2%, we observe a 9.1 percentage-point improvement in top-1 accuracy with a multi-input model compared to only a 3.6% increase when using an ensemble of image and metadata models. This suggests that contextual data is being used to interpret an image, beyond just providing a prior expectation. We show that our neural network models appear to be utilising similar pieces of evidence as human naturalists to make identifications.Metadata is a key tool for human naturalists. We show it can also be harnessed by computer vision systems. Contextualisation offers considerable extra information, particularly for challenging species, even within small and relatively homogeneous areas such as the British Isles. Although complex relationships between disparate sources of information can be profitably interpreted by simple neural network architectures, there is likely considerable room for further progress. Contextualising images has the potential to lead to a step change in the accuracy of automated identification tools, with considerable benefits for large scale verification of submitted records.


2019 ◽  
Author(s):  
Courtney J Spoerer ◽  
Tim C Kietzmann ◽  
Johannes Mehrer ◽  
Ian Charest ◽  
Nikolaus Kriegeskorte

AbstractDeep feedforward neural network models of vision dominate in both computational neuroscience and engineering. The primate visual system, by contrast, contains abundant recurrent connections. Recurrent signal flow enables recycling of limited computational resources over time, and so might boost the performance of a physically finite brain or model. Here we show: (1) Recurrent convolutional neural network models outperform feedforward convolutional models matched in their number of parameters in large-scale visual recognition tasks on natural images. (2) Setting a confidence threshold, at which recurrent computations terminate and a decision is made, enables flexible trading of speed for accuracy. At a given confidence threshold, the model expends more time and energy on images that are harder to recognise, without requiring additional parameters for deeper computations. (3) The recurrent model’s reaction time for an image predicts the human reaction time for the same image better than several parameter-matched and state-of-the-art feedforward models. (4) Across confidence thresholds, the recurrent model emulates the behaviour of feedforward control models in that it achieves the same accuracy at approximately the same computational cost (mean number of floating-point operations). However, the recurrent model can be run longer (higher confidence threshold) and then outperforms parameter-matched feedforward comparison models. These results suggest that recurrent connectivity, a hallmark of biological visual systems, may be essential for understanding the accuracy, flexibility, and dynamics of human visual recognition.Author summaryDeep neural networks provide the best current models of biological vision and achieve the highest performance in computer vision. Inspired by the primate brain, these models transform the image signals through a sequence of stages, leading to recognition. Unlike brains in which outputs of a given computation are fed back into the same computation, these models do not process signals recurrently. The ability to recycle limited neural resources by processing information recurrently could explain the accuracy and flexibility of biological visual systems, which computer vision systems cannot yet match. Here we report that recurrent processing can improve recognition performance compared to similarly complex feedforward networks. Recurrent processing also enabled models to behave more flexibly and trade off speed for accuracy. Like humans, the recurrent network models can compute longer when an object is hard to recognise, which boosts their accuracy. The model’s recognition times predicted human recognition times for the same images. The performance and flexibility of recurrent neural network models illustrates that modeling biological vision can help us improve computer vision.


Beverages ◽  
2019 ◽  
Vol 5 (2) ◽  
pp. 33 ◽  
Author(s):  
Claudia Gonzalez Viejo ◽  
Damir D. Torrico ◽  
Frank R. Dunshea ◽  
Sigfredo Fuentes

Artificial neural networks (ANN) have become popular for optimization and prediction of parameters in foods, beverages, agriculture and medicine. For brewing, they have been explored to develop rapid methods to assess product quality and acceptability. Different beers (N = 17) were analyzed in triplicates using a robotic pourer, RoboBEER (University of Melbourne, Melbourne, Australia), to assess 15 color and foam-related parameters using computer-vision. Those samples were tested using sensory analysis for acceptability of carbonation mouthfeel, bitterness, flavor and overall liking with 30 consumers using a 9-point hedonic scale. ANN models were developed using 17 different training algorithms with 15 color and foam-related parameters as inputs and liking of four descriptors obtained from consumers as targets. Each algorithm was tested using five, seven and ten neurons and compared to select the best model based on correlation coefficients, slope and performance (mean squared error (MSE). Bayesian Regularization algorithm with seven neurons presented the best correlation (R = 0.98) and highest performance (MSE = 0.03) with no overfitting. These models may be used as a cost-effective method for fast-screening of beers during processing to assess acceptability more efficiently. The use of RoboBEER, computer-vision algorithms and ANN will allow the implementation of an artificial intelligence system for the brewing industry to assess its effectiveness.


2021 ◽  
Vol 8 ◽  
Author(s):  
Qinbing Fu ◽  
Xuelong Sun ◽  
Tian Liu ◽  
Cheng Hu ◽  
Shigang Yue

Collision prevention sets a major research and development obstacle for intelligent robots and vehicles. This paper investigates the robustness of two state-of-the-art neural network models inspired by the locust’s LGMD-1 and LGMD-2 visual pathways as fast and low-energy collision alert systems in critical scenarios. Although both the neural circuits have been studied and modelled intensively, their capability and robustness against real-time critical traffic scenarios where real-physical crashes will happen have never been systematically investigated due to difficulty and high price in replicating risky traffic with many crash occurrences. To close this gap, we apply a recently published robotic platform to test the LGMDs inspired visual systems in physical implementation of critical traffic scenarios at low cost and high flexibility. The proposed visual systems are applied as the only collision sensing modality in each micro-mobile robot to conduct avoidance by abrupt braking. The simulated traffic resembles on-road sections including the intersection and highway scenes wherein the roadmaps are rendered by coloured, artificial pheromones upon a wide LCD screen acting as the ground of an arena. The robots with light sensors at bottom can recognise the lanes and signals, tightly follow paths. The emphasis herein is laid on corroborating the robustness of LGMDs neural systems model in different dynamic robot scenes to timely alert potential crashes. This study well complements previous experimentation on such bio-inspired computations for collision prediction in more critical physical scenarios, and for the first time demonstrates the robustness of LGMDs inspired visual systems in critical traffic towards a reliable collision alert system under constrained computation power. This paper also exhibits a novel, tractable, and affordable robotic approach to evaluate online visual systems in dynamic scenes.


2021 ◽  
Vol 14 (1) ◽  
pp. 416
Author(s):  
Mostofa Ahsan ◽  
Sulaymon Eshkabilov ◽  
Bilal Cemek ◽  
Erdem Küçüktopcu ◽  
Chiwon W. Lee ◽  
...  

Deep learning (DL) and computer vision applications in precision agriculture have great potential to identify and classify plant and vegetation species. This study presents the applicability of DL modeling with computer vision techniques to analyze the nutrient levels of hydroponically grown four lettuce cultivars (Lactuca sativa L.), namely Black Seed, Flandria, Rex, and Tacitus. Four different nutrient concentrations (0, 50, 200, 300 ppm nitrogen solutions) were prepared and utilized to grow these lettuce cultivars in the greenhouse. RGB images of lettuce leaves were captured. The results showed that the developed DL’s visual geometry group 16 (VGG16) and VGG19 architectures identified the nutrient levels of lettuces with 87.5 to 100% accuracy for four lettuce cultivars, respectively. Convolution neural network models were also implemented to identify the nutrient levels of the studied lettuces for comparison purposes. The developed modeling techniques can be applied not only to collect real-time nutrient data from other lettuce type cultivars grown in greenhouses but also in fields. Moreover, these modeling approaches can be applied for remote sensing purposes to various lettuce crops. To the best knowledge of the authors, this is a novel study applying the DL technique to determine the nutrient concentrations in lettuce cultivars.


Sign in / Sign up

Export Citation Format

Share Document