scholarly journals Fusion in Dissimilarity Space Between RGB D and Skeleton for Person Re Identification

Author(s):  
Md Kamal Uddin ◽  
◽  
Amran Bhuiyan ◽  
Mahmudul Hasan ◽  
◽  
...  

Person re-identification (Re-id) is one of the important tools of video surveillance systems, which aims to recognize an individual across the multiple disjoint sensors of a camera network. Despite the recent advances on RGB camera-based person re-identification methods under normal lighting conditions, Re-id researchers fail to take advantages of modern RGB-D sensor-based additional information (e.g. depth and skeleton information). When traditional RGB-based cameras fail to capture the video under poor illumination conditions, RGB-D sensor-based additional information can be advantageous to tackle these constraints. This work takes depth images and skeleton joint points as additional information along with RGB appearance cues and proposes a person re-identification method. We combine 4-channel RGB-D image features with skeleton information using score-level fusion strategy in dissimilarity space to increase re-identification accuracy. Moreover, our propose method overcomes the illumination problem because we use illumination invariant depth image and skeleton information. We carried out rigorous experiments on two publicly available RGBD-ID re-identification datasets and proved the use of combined features of 4-channel RGB-D images and skeleton information boost up the rank 1 recognition accuracy.

2021 ◽  
pp. 1-21
Author(s):  
S.S. Suni ◽  
K. Gopakumar

In this study, we propose a multimodal feature based framework for recognising hand gestures from RGB and depth images. In addition to the features from the RGB image, the depth image features are explored into constructing the discriminative feature labels of various gestures. Depth maps having powerful source of information, increases the performance level of various computer vision problems. A newly refined Gradient-Local Binary Pattern (G-LBP) is applied to extract the features from depth images and histogram of gradients (HOG) features are extracted from RGB images. The components from both RGB and depth channels, are concatenated to form a multimodal feature vector. In the final process, classification is performed using K-Nearest Neighbour and multi-class Support Vector Machines. The designed system is invariant to scale, rotation and illumination. The newly developed feature combination method is helpful to achieve superior recognition rates for future innovations.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1166
Author(s):  
Wei Zhang ◽  
Liang Gong ◽  
Suyue Chen ◽  
Wenjie Wang ◽  
Zhonghua Miao ◽  
...  

In the process of collaborative operation, the unloading automation of the forage harvester is of great significance to improve harvesting efficiency and reduce labor intensity. However, non-standard transport trucks and unstructured field environments make it extremely difficult to identify and properly position loading containers. In this paper, a global model with three coordinate systems is established to describe a collaborative harvesting system. Then, a method based on depth perception is proposed to dynamically identify and position the truck container, including data preprocessing, point cloud pose transformation based on the singular value decomposition (SVD) algorithm, segmentation and projection of the upper edge, edge lines extraction and corner points positioning based on the Random Sample Consensus (RANSAC) algorithm, and fusion and visualization of results on the depth image. Finally, the effectiveness of the proposed method has been verified by field experiments with different trucks. The results demonstrated that the identification accuracy of the container region is about 90%, and the absolute error of center point positioning is less than 100 mm. The proposed method is robust to containers with different appearances and provided a methodological reference for dynamic identification and positioning of containers in forage harvesting.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1299
Author(s):  
Honglin Yuan ◽  
Tim Hoogenkamp ◽  
Remco C. Veltkamp

Deep learning has achieved great success on robotic vision tasks. However, when compared with other vision-based tasks, it is difficult to collect a representative and sufficiently large training set for six-dimensional (6D) object pose estimation, due to the inherent difficulty of data collection. In this paper, we propose the RobotP dataset consisting of commonly used objects for benchmarking in 6D object pose estimation. To create the dataset, we apply a 3D reconstruction pipeline to produce high-quality depth images, ground truth poses, and 3D models for well-selected objects. Subsequently, based on the generated data, we produce object segmentation masks and two-dimensional (2D) bounding boxes automatically. To further enrich the data, we synthesize a large number of photo-realistic color-and-depth image pairs with ground truth 6D poses. Our dataset is freely distributed to research groups by the Shape Retrieval Challenge benchmark on 6D pose estimation. Based on our benchmark, different learning-based approaches are trained and tested by the unified dataset. The evaluation results indicate that there is considerable room for improvement in 6D object pose estimation, particularly for objects with dark colors, and photo-realistic images are helpful in increasing the performance of pose estimation algorithms.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yanping Zhang ◽  
Jing Peng ◽  
Xiaohui Yuan ◽  
Lisi Zhang ◽  
Dongzi Zhu ◽  
...  

AbstractRecognizing plant cultivars reliably and efficiently can benefit plant breeders in terms of property rights protection and innovation of germplasm resources. Although leaf image-based methods have been widely adopted in plant species identification, they seldom have been applied in cultivar identification due to the high similarity of leaves among cultivars. Here, we propose an automatic leaf image-based cultivar identification pipeline called MFCIS (Multi-feature Combined Cultivar Identification System), which combines multiple leaf morphological features collected by persistent homology and a convolutional neural network (CNN). Persistent homology, a multiscale and robust method, was employed to extract the topological signatures of leaf shape, texture, and venation details. A CNN-based algorithm, the Xception network, was fine-tuned for extracting high-level leaf image features. For fruit species, we benchmarked the MFCIS pipeline on a sweet cherry (Prunus avium L.) leaf dataset with >5000 leaf images from 88 varieties or unreleased selections and achieved a mean accuracy of 83.52%. For annual crop species, we applied the MFCIS pipeline to a soybean (Glycine max L. Merr.) leaf dataset with 5000 leaf images of 100 cultivars or elite breeding lines collected at five growth periods. The identification models for each growth period were trained independently, and their results were combined using a score-level fusion strategy. The classification accuracy after score-level fusion was 91.4%, which is much higher than the accuracy when utilizing each growth period independently or mixing all growth periods. To facilitate the adoption of the proposed pipelines, we constructed a user-friendly web service, which is freely available at http://www.mfcis.online.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1356
Author(s):  
Linda Christin Büker ◽  
Finnja Zuber ◽  
Andreas Hein ◽  
Sebastian Fudickar

With approaches for the detection of joint positions in color images such as HRNet and OpenPose being available, consideration of corresponding approaches for depth images is limited even though depth images have several advantages over color images like robustness to light variation or color- and texture invariance. Correspondingly, we introduce High- Resolution Depth Net (HRDepthNet)—a machine learning driven approach to detect human joints (body, head, and upper and lower extremities) in purely depth images. HRDepthNet retrains the original HRNet for depth images. Therefore, a dataset is created holding depth (and RGB) images recorded with subjects conducting the timed up and go test—an established geriatric assessment. The images were manually annotated RGB images. The training and evaluation were conducted with this dataset. For accuracy evaluation, detection of body joints was evaluated via COCO’s evaluation metrics and indicated that the resulting depth image-based model achieved better results than the HRNet trained and applied on corresponding RGB images. An additional evaluation of the position errors showed a median deviation of 1.619 cm (x-axis), 2.342 cm (y-axis) and 2.4 cm (z-axis).


Mathematics ◽  
2021 ◽  
Vol 9 (21) ◽  
pp. 2815
Author(s):  
Shih-Hung Yang ◽  
Yao-Mao Cheng ◽  
Jyun-We Huang ◽  
Yon-Ping Chen

Automatic fingerspelling recognition tackles the communication barrier between deaf and hearing individuals. However, the accuracy of fingerspelling recognition is reduced by high intra-class variability and low inter-class variability. In the existing methods, regular convolutional kernels, which have limited receptive fields (RFs) and often cannot detect subtle discriminative details, are applied to learn features. In this study, we propose a receptive field-aware network with finger attention (RFaNet) that highlights the finger regions and builds inter-finger relations. To highlight the discriminative details of these fingers, RFaNet reweights the low-level features of the hand depth image with those of the non-forearm image and improves finger localization, even when the wrist is occluded. RFaNet captures neighboring and inter-region dependencies between fingers in high-level features. An atrous convolution procedure enlarges the RFs at multiple scales and a non-local operation computes the interactions between multi-scale feature maps, thereby facilitating the building of inter-finger relations. Thus, the representation of a sign is invariant to viewpoint changes, which are primarily responsible for intra-class variability. On an American Sign Language fingerspelling dataset, RFaNet achieved 1.77% higher classification accuracy than state-of-the-art methods. RFaNet achieved effective transfer learning when the number of labeled depth images was insufficient. The fingerspelling representation of a depth image can be effectively transferred from large- to small-scale datasets via highlighting the finger regions and building inter-finger relations, thereby reducing the requirement for expensive fingerspelling annotations.


2020 ◽  
Vol 2020 ◽  
pp. 1-18
Author(s):  
Chao Tang ◽  
Huosheng Hu ◽  
Wenjian Wang ◽  
Wei Li ◽  
Hua Peng ◽  
...  

The representation and selection of action features directly affect the recognition effect of human action recognition methods. Single feature is often affected by human appearance, environment, camera settings, and other factors. Aiming at the problem that the existing multimodal feature fusion methods cannot effectively measure the contribution of different features, this paper proposed a human action recognition method based on RGB-D image features, which makes full use of the multimodal information provided by RGB-D sensors to extract effective human action features. In this paper, three kinds of human action features with different modal information are proposed: RGB-HOG feature based on RGB image information, which has good geometric scale invariance; D-STIP feature based on depth image, which maintains the dynamic characteristics of human motion and has local invariance; and S-JRPF feature-based skeleton information, which has good ability to describe motion space structure. At the same time, multiple K-nearest neighbor classifiers with better generalization ability are used to integrate decision-making classification. The experimental results show that the algorithm achieves ideal recognition results on the public G3D and CAD60 datasets.


2017 ◽  
Vol 2017 ◽  
pp. 1-6
Author(s):  
Shirui Huo ◽  
Tianrui Hu ◽  
Ce Li

Human action recognition is an important recent challenging task. Projecting depth images onto three depth motion maps (DMMs) and extracting deep convolutional neural network (DCNN) features are discriminant descriptor features to characterize the spatiotemporal information of a specific action from a sequence of depth images. In this paper, a unified improved collaborative representation framework is proposed in which the probability that a test sample belongs to the collaborative subspace of all classes can be well defined and calculated. The improved collaborative representation classifier (ICRC) based on l2-regularized for human action recognition is presented to maximize the likelihood that a test sample belongs to each class, then theoretical investigation into ICRC shows that it obtains a final classification by computing the likelihood for each class. Coupled with the DMMs and DCNN features, experiments on depth image-based action recognition, including MSRAction3D and MSRGesture3D datasets, demonstrate that the proposed approach successfully using a distance-based representation classifier achieves superior performance over the state-of-the-art methods, including SRC, CRC, and SVM.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5318
Author(s):  
Dongnian Li ◽  
Changming Li ◽  
Chengjun Chen ◽  
Zhengxu Zhao

Locating and identifying the components mounted on a printed circuit board (PCB) based on machine vision is an important and challenging problem for automated PCB inspection and automated PCB recycling. In this paper, we propose a PCB semantic segmentation method based on depth images that segments and recognizes components in the PCB through pixel classification. The image training set for the PCB was automatically synthesized with graphic rendering. Based on a series of concentric circles centered at the given depth pixel, we extracted the depth difference features from the depth images in the training set to train a random forest pixel classifier. By using the constructed random forest pixel classifier, we performed semantic segmentation for the PCB to segment and recognize components in the PCB through pixel classification. Experiments on both synthetic and real test sets were conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that our method can segment and recognize most of the components from a real depth image of the PCB. Our method is immune to illumination changes and can be implemented in parallel on a GPU.


Sign in / Sign up

Export Citation Format

Share Document