Real-Time Lightweight CNN in Robots with Very Limited Computational Resources: Detecting Ball in NAO

Author(s):  
Qingqing Yan ◽  
Shu Li ◽  
Chengju Liu ◽  
Qijun Chen
Author(s):  
Ngozi V. Uti ◽  
Richard Fox

In recent years, mobile phones have become the de facto system of communication across the planet. Mobile phones have helped increase economic growth and critical response in many parts of the world. Mobile phones are even being used for data transmission. However, little academic research has been done on the specific problem of streaming real time video originating from the cameras of mobile devices over cell phone networks. There are many factors that complicate this problem including the limited computational resources of mobile phones, the low and variable bandwidth of cell phone networks, and the need for video compression and streaming algorithms that can be supported by both the mobile phones and cell phone networks. This chapter examines the problems involved and discusses on-going research on the topic. The main goal of this chapter is to identify the real time constraints and challenges of compressing and streaming video from mobile devices for the purpose of designing efficient video compression and streaming techniques that are able to work within the constraints of the limited computational resources and bandwidth available to mobile devices.


2020 ◽  
Vol 10 (14) ◽  
pp. 4959
Author(s):  
Reda Belaiche ◽  
Yu Liu ◽  
Cyrille Migniot ◽  
Dominique Ginhac ◽  
Fan Yang

Micro-Expression (ME) recognition is a hot topic in computer vision as it presents a gateway to capture and understand daily human emotions. It is nonetheless a challenging problem due to ME typically being transient (lasting less than 200 ms) and subtle. Recent advances in machine learning enable new and effective methods to be adopted for solving diverse computer vision tasks. In particular, the use of deep learning techniques on large datasets outperforms classical approaches based on classical machine learning which rely on hand-crafted features. Even though available datasets for spontaneous ME are scarce and much smaller, using off-the-shelf Convolutional Neural Networks (CNNs) still demonstrates satisfactory classification results. However, these networks are intense in terms of memory consumption and computational resources. This poses great challenges when deploying CNN-based solutions in many applications, such as driver monitoring and comprehension recognition in virtual classrooms, which demand fast and accurate recognition. As these networks were initially designed for tasks of different domains, they are over-parameterized and need to be optimized for ME recognition. In this paper, we propose a new network based on the well-known ResNet18 which we optimized for ME classification in two ways. Firstly, we reduced the depth of the network by removing residual layers. Secondly, we introduced a more compact representation of optical flow used as input to the network. We present extensive experiments and demonstrate that the proposed network obtains accuracies comparable to the state-of-the-art methods while significantly reducing the necessary memory space. Our best classification accuracy was 60.17% on the challenging composite dataset containing five objectives classes. Our method takes only 24.6 ms for classifying a ME video clip (less than the occurrence time of the shortest ME which lasts 40 ms). Our CNN design is suitable for real-time embedded applications with limited memory and computing resources.


2021 ◽  
Vol 2120 (1) ◽  
pp. 012025
Author(s):  
J N Goh ◽  
S K Phang ◽  
W J Chew

Abstract Real-time aerial map stitching through aerial images had been done through many different methods. One of the popular methods was a features-based algorithm to detect features and to match the features of two and more images to produce a map. There are several feature-based methods such as ORB, SIFT, SURF, KAZE, AKAZE and BRISK. These methods detect features and compute homography matrix from matched features to stitch images. The aim for this project is to further optimize the existing image stitching algorithm such that it will be possible to run in real-time as the UAV capture images while airborne. First, we propose to use a matrix multiplication method to replace a singular value decomposition method in the RANSAC algorithm. Next, we propose to change the workflow to detect the image features to increase the map stitching rate. The proposed algorithm was implemented and tested with an online aerial image dataset which contain 100 images with the resolution of 640 × 480. We have successfully achieved the result of 1.45 Hz update rate compared to original image stitching algorithm that runs at 0.69 Hz. The improvement shown in our proposed improved algorithm are more than two folds in terms of computational resources. The method introduced in this paper was successful speed up the process time for the program to process map stitching.


Author(s):  
M. M. Nawaf ◽  
J.-M. Boï ◽  
D. Merad ◽  
J.-P. Royer ◽  
P. Drap

This paper provides details of both hardware and software conception and realization of a hand-held stereo embedded system for underwater imaging. The designed system can run most image processing techniques smoothly in real-time. The developed functions provide direct visual feedback on the quality of the taken images which helps taking appropriate actions accordingly in terms of movement speed and lighting conditions. The proposed functionalities can be easily customized or upgraded whereas new functions can be easily added thanks to the available supported libraries. Furthermore, by connecting the designed system to a more powerful computer, a real-time visual odometry can run on the captured images to have live navigation and site coverage map. We use a visual odometry method adapted to low computational resources systems and long autonomy. The system is tested in a real context and showed its robustness and promising further perspectives.


Author(s):  
Carmen Cotelo ◽  
María Aránzazu Amo Baladrón ◽  
Roland Aznar ◽  
Pablo Lorente ◽  
Pablo Rey ◽  
...  

This paper describes a case of the implementation of a pan-European operational oceanography service coexisting with non-operative research jobs that are executed on the same computational resources. The complexity of designing a good operational service increases when the resources that will be used are shared with other computational workloads. However, once the implementation is achieved the result is an optimised and robust service. Computational resources and other necessary services are permanently monitored in order to detect and solve potential problems in real-time. These resources can be used by other researchers during the time interval in which they are not needed for this operational service. The goal of this work is to guarantee the time to solution of the operational execution on a shared computing environment without a large impact on the researchers jobs.


Author(s):  
Umar Asif ◽  
Jianbin Tang ◽  
Stefan Harrer

Recent research on grasp detection has focused on improving accuracy through deep CNN models, but at the cost of large memory and computational resources. In this paper, we propose an efficient CNN architecture which produces high grasp detection accuracy in real-time while maintaining a compact model design. To achieve this, we introduce a CNN architecture termed GraspNet which has two main branches: i) An encoder branch which downsamples an input image using our novel Dilated Dense Fire (DDF) modules - squeeze and dilated convolutions with dense residual connections. ii) A decoder branch which upsamples the output of the encoder branch to the original image size using deconvolutions and fuse connections. We evaluated GraspNet for grasp detection using offline datasets and a real-world robotic grasping setup. In experiments, we show that GraspNet achieves superior grasp detection accuracy compared to the stateof-the-art computation-efficient CNN models with real-time inference speed on embedded GPU hardware (Nvidia Jetson TX1), making it suitable for low-powered devices.


Author(s):  
Hendrik Macedo ◽  
Thiago Almeida ◽  
Leonardo Matos ◽  
Bruno Prado

Research on Traffic Light Recognition (TLR) has grown in recent years, primarily driven by the growing interest in autonomous vehicles development. Machine Learning algorithms have been widely used to that purpose. Mainstream approaches, however, require large amount of data to properly work, and as a consequence, a lot of computational resources. In this paper we propose the use of Expert Instruction (IE) as a mechanism to reduce the amount of data required to provide accurate ML models for TLR. Given an image of the exterior scene taken from the inside of the vehicle, we stand the hypothesis that the picture of a traffic light is more likely to appear in the central and upper regions of the image. Frequency Maps of traffic light location were thus constructed to confirm this hypothesis. The frequency maps are the result of a manual effort of human experts in annotating each image with the coordinates of the region where the traffic light appears. Results show that EI increased the accuracy obtained by the classification algorithm in two different image datasets by at least 15%. Evaluation rates achieved by the inclusion of EI were also higher in further experiments, including traffic light detection followed by classification by the trained algorithm. The inclusion of EI in the PCANet achieved a precision of 83% and recall of 73% against 75.3% and 51.1%, respectively, of its counterpart. We finally presents a prototype of a TLR Device with that expert model embedded to assist drivers. The TLR uses a smartphone as a camera and processing unit. To show the feasibility of the apparatus, a dataset was obtained in real time usage and tested in an Adaptive Background Suppression Filter (AdaBSF) and Support Vector Machines (SVMs) algorithm to detect and recognize traffic lights. Results show precision of 100% and recall of 65%.


Author(s):  
Rui Zheng ◽  
Fei Jiang ◽  
Ruimin Shen

Students’ gestures, hand-raising, stand-up, and sleeping, indicates the engagement of students in classrooms and partially reflects teaching quality. Therefore, fast and automatically recognizing these gestures are of great importance. Due to limited computational resources in primary and secondary schools, we propose a real-time student behavior detector based on light-weight MobileNetV2-SSD to reduce the dependency of GPUs. Firstly, we build a large-scale corpus from real schools to capture various behavior gestures. Based on such a corpus, we transfer the gesture recognition task into object detections. Secondly, we design a multi-dimensional attention-based detector, named GestureDet, for real-time and accurate gesture analysis. The multi-dimensional attention mechanisms simultaneously consider all the dimensions of the training set, aiming to pay more attention to discriminative features and samples that are important for the final performance. Specifically, the spatial attention is constructed with stacked dilated convolution layers to generate a soft and learnable mask for re-weighting foreground and background features; the channel attention introduces the context modeling and squeeze-and-excitation module to focus on discriminative features; the batch attention discriminates important samples with a new designed reweight strategy. Experimental results demonstrate the effectiveness and versatility of GestureDet, which achieves 75.2% mAP on real student behavior dataset, and 74.5% on public PASCAL VOC dataset at 20fps on embedding device Nvidia Jetson TX2. Code will be made publicly available.


2015 ◽  
Vol 5 (4) ◽  
pp. 832-840
Author(s):  
L. Perneel ◽  
H. Fayyad-Kazan ◽  
L. Peng ◽  
F. Guan ◽  
M. Timmerman

System virtualization is one of the hottest trends in information technology today. It is not just another nice to use technology but has become fundamental across the business world. It is successfully used with many business application classes where cloud computing is the most visual one. Recently, it started to be used for soft Real-Time (RT) applications such as IP telephony, media servers, audio and video streaming servers, automotive and communication systems in general. Running these applications on a traditional system (Hardware + Operating System) guarantee their Quality of Service (QoS); virtualizing them means inserting a new layer between the hardware and the (virtual) Operating System (OS), and thus adding extra overhead. Although these applications’ areas do not always demand hard time guarantees, they require the underlying virtualization layer supports low latency and provide adequate computational resources for completion within a reasonable or predictable timeframe. These aspects are intimately intertwined with the logic of the hypervisor scheduler. In this paper, a series of tests are conducted on three hypervisors (VMware ESXi, Hyper-V server and Xen) to provide a benchmark of the latencies added to the applications running on top of them. These tests are conducted for different scenarios (use cases) to take into consideration all the parameters and configurations of the hypervisors’ schedulers. Finally, this benchmark can be used as a reference for choosing the best hypervisor-application combination.


Sign in / Sign up

Export Citation Format

Share Document