New End-to-End Strategy Based on DeepLabv3+ Semantic Segmentation for Human Head Detection

In the field of computer vision, object detection consists of automatically finding objects in images by giving their positions. The most common fields of application are safety systems (pedestrian detection, identification of behavior) and control systems. Another important application is head/person detection, which is the primary material for road safety, rescue, surveillance, etc. In this study, we developed a new approach based on two parallel Deeplapv3+ to improve the performance of the person detection system. For the implementation of our semantic segmentation model, a working methodology with two types of ground truths extracted from the bounding boxes given by the original ground truths was established. The approach has been implemented in our two private datasets as well as in a public dataset. To show the performance of the proposed system, a comparative analysis was carried out on two deep learning semantic segmentation state-of-art models: SegNet and U-Net. By achieving 99.14% of global accuracy, the result demonstrated that the developed strategy could be an efficient way to build a deep neural network model for semantic segmentation. This strategy can be used, not only for the detection of the human head but also be applied in several semantic segmentation applications.

Download Full-text

Head Detection Based on DR Feature Extraction Network and Mixed Dilated Convolution Module

Electronics ◽

10.3390/electronics10131565 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1565

Author(s):

Junwen Liu ◽

Yongjun Zhang ◽

Jianbin Xie ◽

Yan Wei ◽

Zewei Wang ◽

...

Keyword(s):

Neural Network ◽

Feature Extraction ◽

Transmission Rate ◽

Pedestrian Detection ◽

Human Head ◽

Detection Rates ◽

Translational Invariance ◽

Dilated Convolution ◽

Head Detection ◽

Small Targets

Pedestrian detection for complex scenes suffers from pedestrian occlusion issues, such as occlusions between pedestrians. As well-known, compared with the variability of the human body, the shape of a human head and their shoulders changes minimally and has high stability. Therefore, head detection is an important research area in the field of pedestrian detection. The translational invariance of neural network enables us to design a deep convolutional neural network, which means that, even if the appearance and location of the target changes, it can still be recognized effectively. However, the problems of scale invariance and high miss detection rates for small targets still exist. In this paper, a feature extraction network DR-Net based on Darknet-53 is proposed to improve the information transmission rate between convolutional layers and to extract more semantic information. In addition, the MDC (mixed dilated convolution) with different sampling rates of dilated convolution is embedded to improve the detection rate of small targets. We evaluated our method on three publicly available datasets and achieved excellent results. The AP (Average Precision) value on the Brainwash dataset, HollywoodHeads dataset, and SCUT-HEAD dataset reached 92.1%, 84.8%, and 90% respectively.

Download Full-text

A unified framework for automated person re-indentification

Transport and Communication Science Journal ◽

10.25073/tcsj.71.7.11 ◽

2020 ◽

Vol 71 (7) ◽

pp. 868-880

Author(s):

Nguyen Hong-Quan ◽

Nguyen Thuy-Binh ◽

Tran Duc-Long ◽

Le Thi-Lan

Keyword(s):

Deep Learning ◽

Video Analysis ◽

Camera Network ◽

Unified Framework ◽

Person Detection ◽

Practical Applications ◽

Detection And Tracking ◽

Analysis System ◽

Bounding Boxes

Along with the strong development of camera networks, a video analysis system has been become more and more popular and has been applied in various practical applications. In this paper, we focus on person re-identification (person ReID) task that is a crucial step of video analysis systems. The purpose of person ReID is to associate multiple images of a given person when moving in a non-overlapping camera network. Many efforts have been made to person ReID. However, most of studies on person ReID only deal with well-alignment bounding boxes which are detected manually and considered as the perfect inputs for person ReID. In fact, when building a fully automated person ReID system the quality of the two previous steps that are person detection and tracking may have a strong effect on the person ReID performance. The contribution of this paper are two-folds. First, a unified framework for person ReID based on deep learning models is proposed. In this framework, the coupling of a deep neural network for person detection and a deep-learning-based tracking method is used. Besides, features extracted from an improved ResNet architecture are proposed for person representation to achieve a higher ReID accuracy. Second, our self-built dataset is introduced and employed for evaluation of all three steps in the fully automated person ReID framework.

Download Full-text

Edge computing-based person detection system for top view surveillance: Using CenterNet with transfer learning

Applied Soft Computing ◽

10.1016/j.asoc.2021.107489 ◽

2021 ◽

pp. 107489

Author(s):

Imran Ahmed ◽

Misbah Ahmad ◽

Joel J.P.C. Rodrigues ◽

Gwanggil Jeon

Keyword(s):

Transfer Learning ◽

Detection System ◽

Edge Computing ◽

Person Detection ◽

Top View

Download Full-text

Manipulation Planning for Object Re-Orientation Based on Semantic Segmentation Keypoint Detection

Sensors ◽

10.3390/s21072280 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2280

Author(s):

Ching-Chang Wong ◽

Li-Yu Yeh ◽

Chih-Cheng Liu ◽

Chi-Yi Tsai ◽

Hisasuki Aoyama

Keyword(s):

Neural Network ◽

Robot Manipulator ◽

Detection System ◽

Semantic Segmentation ◽

Target Object ◽

Planning System ◽

Normal Vector ◽

Manipulation Planning ◽

Keypoint Detection ◽

3D Keypoint Detection

In this paper, a manipulation planning method for object re-orientation based on semantic segmentation keypoint detection is proposed for robot manipulator which is able to detect and re-orientate the randomly placed objects to a specified position and pose. There are two main parts: (1) 3D keypoint detection system; and (2) manipulation planning system for object re-orientation. In the 3D keypoint detection system, an RGB-D camera is used to obtain the information of the environment and can generate 3D keypoints of the target object as inputs to represent its corresponding position and pose. This process simplifies the 3D model representation so that the manipulation planning for object re-orientation can be executed in a category-level manner by adding various training data of the object in the training phase. In addition, 3D suction points in both the object’s current and expected poses are also generated as the inputs of the next operation stage. During the next stage, Mask Region-Convolutional Neural Network (Mask R-CNN) algorithm is used for preliminary object detection and object image. The highest confidence index image is selected as the input of the semantic segmentation system in order to classify each pixel in the picture for the corresponding pack unit of the object. In addition, after using a convolutional neural network for semantic segmentation, the Conditional Random Fields (CRFs) method is used to perform several iterations to obtain a more accurate result of object recognition. When the target object is segmented into the pack units of image process, the center position of each pack unit can be obtained. Then, a normal vector of each pack unit’s center points is generated by the depth image information and pose of the object, which can be obtained by connecting the center points of each pack unit. In the manipulation planning system for object re-orientation, the pose of the object and the normal vector of each pack unit are first converted into the working coordinate system of the robot manipulator. Then, according to the current and expected pose of the object, the spherical linear interpolation (Slerp) algorithm is used to generate a series of movements in the workspace for object re-orientation on the robot manipulator. In addition, the pose of the object is adjusted on the z-axis of the object’s geodetic coordinate system based on the image features on the surface of the object, so that the pose of the placed object can approach the desired pose. Finally, a robot manipulator and a vacuum suction cup made by the laboratory are used to verify that the proposed system can indeed complete the planned task of object re-orientation.

Download Full-text

Multimodal person detection system

Multimedia Tools and Applications ◽

10.1007/s11042-020-10307-8 ◽

2021 ◽

Author(s):

Philip Barello ◽

Md Shafaeat Hossain

Keyword(s):

Detection System ◽

Person Detection

Download Full-text

Multi-Scale Feature Pyramid Network: A Heavily Occluded Pedestrian Detection Network Based on ResNet

Sensors ◽

10.3390/s21051820 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1820

Author(s):

Xiaotao Shao ◽

Qing Wang ◽

Wei Yang ◽

Yun Chen ◽

Yi Xie ◽

...

Keyword(s):

Semantic Information ◽

Detection System ◽

Pedestrian Detection ◽

Detection Accuracy ◽

The Public ◽

Scale Feature ◽

Detection Algorithms ◽

Multi Scale ◽

Art Works ◽

Feature Pyramid

The existing pedestrian detection algorithms cannot effectively extract features of heavily occluded targets which results in lower detection accuracy. To solve the heavy occlusion in crowds, we propose a multi-scale feature pyramid network based on ResNet (MFPN) to enhance the features of occluded targets and improve the detection accuracy. MFPN includes two modules, namely double feature pyramid network (FPN) integrated with ResNet (DFR) and repulsion loss of minimum (RLM). We propose the double FPN which improves the architecture to further enhance the semantic information and contours of occluded pedestrians, and provide a new way for feature extraction of occluded targets. The features extracted by our network can be more separated and clearer, especially those heavily occluded pedestrians. Repulsion loss is introduced to improve the loss function which can keep predicted boxes away from the ground truths of the unrelated targets. Experiments carried out on the public CrowdHuman dataset, we obtain 90.96% AP which yields the best performance, 5.16% AP gains compared to the FPN-ResNet50 baseline. Compared with the state-of-the-art works, the performance of the pedestrian detection system has been boosted with our method.

Download Full-text