scholarly journals Real-Time Instance Segmentation of Traffic Videos for Embedded Devices

Sensors ◽  
2021 ◽  
Vol 21 (1) ◽  
pp. 275
Author(s):  
Ruben Panero Martinez ◽  
Ionut Schiopu ◽  
Bruno Cornelis ◽  
Adrian Munteanu

The paper proposes a novel instance segmentation method for traffic videos devised for deployment on real-time embedded devices. A novel neural network architecture is proposed using a multi-resolution feature extraction backbone and improved network designs for the object detection and instance segmentation branches. A novel post-processing method is introduced to ensure a reduced rate of false detection by evaluating the quality of the output masks. An improved network training procedure is proposed based on a novel label assignment algorithm. An ablation study on speed-vs.-performance trade-off further modifies the two branches and replaces the conventional ResNet-based performance-oriented backbone with a lightweight speed-oriented design. The proposed architectural variations achieve real-time performance when deployed on embedded devices. The experimental results demonstrate that the proposed instance segmentation method for traffic videos outperforms the you only look at coefficients algorithm, the state-of-the-art real-time instance segmentation method. The proposed architecture achieves qualitative results with 31.57 average precision on the COCO dataset, while its speed-oriented variations achieve speeds of up to 66.25 frames per second on the Jetson AGX Xavier module.

Author(s):  
Mohammad Javad Shaifee ◽  
Brendan Chywl ◽  
Francis Li ◽  
Alexander Wong

Object detection is considered one of the most challenging problemsin this field of computer vision, as it involves the combinationof object classification and object localization within a scene. Recently,deep neural networks (DNNs) have been demonstrated toachieve superior object detection performance compared to otherapproaches, with YOLOv2 (an improved You Only Look Once model)being one of the state-of-the-art in DNN-based object detectionmethods in terms of both speed and accuracy. Although YOLOv2can achieve real-time performance on a powerful GPU, it still remainsvery challenging for leveraging this approach for real-timeobject detection in video on embedded computing devices withlimited computational power and limited memory. In this paper,we propose a new framework called Fast YOLO, a fast You OnlyLook Once framework which accelerates YOLOv2 to be able toperform object detection in video on embedded devices in a realtimemanner. First, we leverage the evolutionary deep intelligenceframework to evolve the YOLOv2 network architecture and producean optimized architecture (referred to as O-YOLOv2 here) that has2.8X fewer parameters with just a 2% IOU drop. To further reducepower consumption on embedded devices while maintaining performance,a motion-adaptive inference method is introduced intothe proposed Fast YOLO framework to reduce the frequency ofdeep inference with O-YOLOv2 based on temporal motion characteristics.Experimental results show that the proposed Fast YOLOframework can reduce the number of deep inferences by an averageof 38.13%, and an average speedup of 3.3X for objectiondetection in video compared to the original YOLOv2, leading FastYOLO to run an average of 18FPS on a Nvidia Jetson TX1 embeddedsystem.


Sensors ◽  
2021 ◽  
Vol 21 (17) ◽  
pp. 5918
Author(s):  
Marc-André Fiedler ◽  
Philipp Werner ◽  
Aly Khalifa ◽  
Ayoub Al-Hamadi

Face and person detection are important tasks in computer vision, as they represent the first component in many recognition systems, such as face recognition, facial expression analysis, body pose estimation, face attribute detection, or human action recognition. Thereby, their detection rate and runtime are crucial for the performance of the overall system. In this paper, we combine both face and person detection in one framework with the goal of reaching a detection performance that is competitive to the state of the art of lightweight object-specific networks while maintaining real-time processing speed for both detection tasks together. In order to combine face and person detection in one network, we applied multi-task learning. The difficulty lies in the fact that no datasets are available that contain both face as well as person annotations. Since we did not have the resources to manually annotate the datasets, as it is very time-consuming and automatic generation of ground truths results in annotations of poor quality, we solve this issue algorithmically by applying a special training procedure and network architecture without the need of creating new labels. Our newly developed method called Simultaneous Face and Person Detection (SFPD) is able to detect persons and faces with 40 frames per second. Because of this good trade-off between detection performance and inference time, SFPD represents a useful and valuable real-time framework especially for a multitude of real-world applications such as, e.g., human–robot interaction.


Author(s):  
Thomas Kurmann ◽  
Pablo Márquez-Neila ◽  
Max Allan ◽  
Sebastian Wolf ◽  
Raphael Sznitman

Abstract Purpose The detection and segmentation of surgical instruments has been a vital step for many applications in minimally invasive surgical robotics. Previously, the problem was tackled from a semantic segmentation perspective, yet these methods fail to provide good segmentation maps of instrument types and do not contain any information on the instance affiliation of each pixel. We propose to overcome this limitation by using a novel instance segmentation method which first masks instruments and then classifies them into their respective type. Methods We introduce a novel method for instance segmentation where a pixel-wise mask of each instance is found prior to classification. An encoder–decoder network is used to extract instrument instances, which are then separately classified using the features of the previous stages. Furthermore, we present a method to incorporate instrument priors from surgical robots. Results Experiments are performed on the robotic instrument segmentation dataset of the 2017 endoscopic vision challenge. We perform a fourfold cross-validation and show an improvement of over 18% to the previous state-of-the-art. Furthermore, we perform an ablation study which highlights the importance of certain design choices and observe an increase of 10% over semantic segmentation methods. Conclusions We have presented a novel instance segmentation method for surgical instruments which outperforms previous semantic segmentation-based methods. Our method further provides a more informative output of instance level information, while retaining a precise segmentation mask. Finally, we have shown that robotic instrument priors can be used to further increase the performance.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Bayu Adhi Nugroho

AbstractA common problem found in real-word medical image classification is the inherent imbalance of the positive and negative patterns in the dataset where positive patterns are usually rare. Moreover, in the classification of multiple classes with neural network, a training pattern is treated as a positive pattern in one output node and negative in all the remaining output nodes. In this paper, the weights of a training pattern in the loss function are designed based not only on the number of the training patterns in the class but also on the different nodes where one of them treats this training pattern as positive and the others treat it as negative. We propose a combined approach of weights calculation algorithm for deep network training and the training optimization from the state-of-the-art deep network architecture for thorax diseases classification problem. Experimental results on the Chest X-Ray image dataset demonstrate that this new weighting scheme improves classification performances, also the training optimization from the EfficientNet improves the performance furthermore. We compare the aggregate method with several performances from the previous study of thorax diseases classifications to provide the fair comparisons against the proposed method.


Author(s):  
Ashish Singh ◽  
Kakali Chatterjee ◽  
Suresh Chandra Satapathy

AbstractThe Mobile Edge Computing (MEC) model attracts more users to its services due to its characteristics and rapid delivery approach. This network architecture capability enables users to access the information from the edge of the network. But, the security of this edge network architecture is a big challenge. All the MEC services are available in a shared manner and accessed by users via the Internet. Attacks like the user to root, remote login, Denial of Service (DoS), snooping, port scanning, etc., can be possible in this computing environment due to Internet-based remote service. Intrusion detection is an approach to protect the network by detecting attacks. Existing detection models can detect only the known attacks and the efficiency for monitoring the real-time network traffic is low. The existing intrusion detection solutions cannot identify new unknown attacks. Hence, there is a need of an Edge-based Hybrid Intrusion Detection Framework (EHIDF) that not only detects known attacks but also capable of detecting unknown attacks in real time with low False Alarm Rate (FAR). This paper aims to propose an EHIDF which is mainly considered the Machine Learning (ML) approach for detecting intrusive traffics in the MEC environment. The proposed framework consists of three intrusion detection modules with three different classifiers. The Signature Detection Module (SDM) uses a C4.5 classifier, Anomaly Detection Module (ADM) uses Naive-based classifier, and Hybrid Detection Module (HDM) uses the Meta-AdaboostM1 algorithm. The developed EHIDF can solve the present detection problems by detecting new unknown attacks with low FAR. The implementation results illustrate that EHIDF accuracy is 90.25% and FAR is 1.1%. These results are compared with previous works and found improved performance. The accuracy is improved up to 10.78% and FAR is reduced up to 93%. A game-theoretical approach is also discussed to analyze the security strength of the proposed framework.


2021 ◽  
Vol 11 (15) ◽  
pp. 7148
Author(s):  
Bedada Endale ◽  
Abera Tullu ◽  
Hayoung Shi ◽  
Beom-Soo Kang

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.


2021 ◽  
Vol 11 (3) ◽  
pp. 968
Author(s):  
Yingchun Sun ◽  
Wang Gao ◽  
Shuguo Pan ◽  
Tao Zhao ◽  
Yahui Peng

Recently, multi-level feature networks have been extensively used in instance segmentation. However, because not all features are beneficial to instance segmentation tasks, the performance of networks cannot be adequately improved by synthesizing multi-level convolutional features indiscriminately. In order to solve the problem, an attention-based feature pyramid module (AFPM) is proposed, which integrates the attention mechanism on the basis of a multi-level feature pyramid network to efficiently and pertinently extract the high-level semantic features and low-level spatial structure features; for instance, segmentation. Firstly, we adopt a convolutional block attention module (CBAM) into feature extraction, and sequentially generate attention maps which focus on instance-related features along the channel and spatial dimensions. Secondly, we build inter-dimensional dependencies through a convolutional triplet attention module (CTAM) in lateral attention connections, which is used to propagate a helpful semantic feature map and filter redundant informative features irrelevant to instance objects. Finally, we construct branches for feature enhancement to strengthen detailed information to boost the entire feature hierarchy of the network. The experimental results on the Cityscapes dataset manifest that the proposed module outperforms other excellent methods under different evaluation metrics and effectively upgrades the performance of the instance segmentation method.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4507
Author(s):  
Zhujun Xu ◽  
Damien Vivet

Existing methods for video instance segmentation (VIS) mostly rely on two strategies: (1) building a sophisticated post-processing to associate frame level segmentation results and (2) modeling a video clip as a 3D spatial-temporal volume with a limit of resolution and length due to memory constraints. In this work, we propose a frame-to-frame method built upon transformers. We use a set of queries, called instance sequence queries (ISQs), to drive the transformer decoder and produce results at each frame. Each query represents one instance in a video clip. By extending the bipartite matching loss to two frames, our training procedure enables the decoder to adjust the ISQs during inference. The consistency of instances is preserved by the corresponding order between query slots and network outputs. As a result, there is no need for complex data association. On TITAN Xp GPU, our method achieves a competitive 34.4% mAP at 33.5 FPS with ResNet-50 and 35.5% mAP at 26.6 FPS with ResNet-101 on the Youtube-VIS dataset.


Author(s):  
Zhiyong Gao ◽  
Jianhong Xiang

Background: While detecting the object directly from the 3D point cloud, the natural 3D patterns and invariance of 3D data are often obscure. Objective: In this work, we aimed at studying the 3D object detection from discrete, disordered and sparse 3D point clouds. Methods: The CNN is composed of the frustum sequence module, 3D instance segmentation module S-NET, 3D point cloud transformation module T-NET, and 3D boundary box estimation module E-NET. The search space of the object is determined by the frustum sequence module. The instance segmentation of the point cloud is performed by the 3D instance segmentation module. The 3D coordinates of the object are confirmed by the transformation module and the 3D bounding box estimation module. Results: Evaluated on KITTI benchmark dataset, our method outperforms the state of the art by remarkable margins while having real-time capability. Conclusion: We achieve real-time 3D object detection by proposing an improved convolutional neural network (CNN) based on image-driven point clouds.


Sign in / Sign up

Export Citation Format

Share Document