scholarly journals Multimodal Art Pose Recognition and Interaction With Human Intelligence Enhancement

2021 ◽  
Vol 12 ◽  
Author(s):  
Chengming Ma ◽  
Qian Liu ◽  
Yaqi Dang

This paper provides an in-depth study and analysis of human artistic poses through intelligently enhanced multimodal artistic pose recognition. A complementary network model architecture of multimodal information based on motion energy proposed. The network exploits both the rich information of appearance features provided by RGB data and the depth information provided by depth data as well as the characteristics of robustness to luminance and observation angle. The multimodal fusion is accomplished by the complementary information characteristics of the two modalities. Moreover, to better model the long-range temporal structure while considering action classes with sub-action sharing phenomena, an energy-guided video segmentation method is employed. And in the feature fusion stage, a cross-modal cross-fusion approach is proposed, which enables the convolutional network to share local features of two modalities not only in the shallow layer but also to obtain the fusion of global features in the deep convolutional layer by connecting the feature maps of multiple convolutional layers. Firstly, the Kinect camera is used to acquire the color image data of the human body, the depth image data, and the 3D coordinate data of the skeletal points using the Open pose open-source framework. Then, the action automatically extracted from keyframes based on the distance between the hand and the head, and the relative distance features are extracted from the keyframes to describe the action, the local occupancy pattern features and HSV color space features are extracted to describe the object, and finally, the feature fusion is performed and the complex action recognition task is completed. To solve the consistency problem of virtual-reality fusion, the mapping relationship between hand joint point coordinates and the virtual scene is determined in the augmented reality scene, and the coordinate consistency model of natural hand and virtual model is established; finally, the real-time interaction between hand gesture and virtual model is realized, and the average correct rate of its hand gesture reaches 99.04%, which improves the robustness and real-time interaction of hand gesture recognition.

2013 ◽  
Vol 765-767 ◽  
pp. 2826-2829 ◽  
Author(s):  
Song Lin ◽  
Rui Min Hu ◽  
Yu Lian Xiao ◽  
Li Yu Gong

In this paper, we propose a novel real-time 3D hand gesture recognition algorithm based on depth information. We segment out the hand region from depth image and convert it to a point cloud. Then, 3D moment invariant features are computed at the point cloud. Finally, support vector machine (SVM) is employed to classify the shape of hand into different categories. We collect a benchmark dataset using Microsoft Kinect for Xbox and test the propose algorithm on it. Experimental results prove the robustness of our proposed algorithm.


2018 ◽  
Vol 7 (3.34) ◽  
pp. 86
Author(s):  
Eun Seo Song ◽  
Gi Tae Kim ◽  
Sung Dae Hong

Background/Objectives: The purpose of this study is control technology to reflect user's appearance and movement in the void display in real time.Methods/Statistical analysis: In this paper, we have developed real-time shading image data acquisition based on RGB-D sensor and real-time interaction image control structure for realizing 0-255 Depth image of physical void display. We also study integrated interlocking control solution for integrated interlocking of hardware and software.Findings: Conventional flip displays show data in 0,1 image representation. On the other hand, the void display we are studying acquires real-time data based on RGB-D and shows the data in depth 0-255 image representation.Improvements/Applications: In the void display, the image representation of 0.1 was extended to the depth 0-255 representation.


Author(s):  
Yan Wu ◽  
Jiqian Li ◽  
Jing Bai

RGB-D-based object recognition has been enthusiastically investigated in the past few years. RGB and depth images provide useful and complementary information. Fusing RGB and depth features can significantly increase the accuracy of object recognition. However, previous works just simply take the depth image as the fourth channel of the RGB image and concatenate the RGB and depth features, ignoring the different power of RGB and depth information for different objects. In this paper, a new method which contains three different classifiers is proposed to fuse features extracted from RGB image and depth image for RGB-D-based object recognition. Firstly, a RGB classifier and a depth classifier are trained by cross-validation to get the accuracy difference between RGB and depth features for each object. Then a variant RGB-D classifier is trained with different initialization parameters for each class according to the accuracy difference. The variant RGB-D-classifier can result in a more robust classification performance. The proposed method is evaluated on two benchmark RGB-D datasets. Compared with previous methods, ours achieves comparable performance with the state-of-the-art method.


2021 ◽  
Vol 18 (5) ◽  
pp. 172988142110396
Author(s):  
Tao Xu ◽  
Jiyong Zhou ◽  
Wentao Guo ◽  
Lei Cai ◽  
Yukun Ma

Complicated underwater environments, such as occlusion by foreign objects and dim light, causes serious loss of underwater targets feature. Furthermore, underwater ripples cause target deformation, which greatly increases the difficulty of feature extraction. Then, existing image reconstruction models cannot effectively achieve target reconstruction due to insufficient underwater target features, and there is a blurred texture in the reconstructed area. To solve the above problems, a fine reconstruction of underwater images with the target feature missing from the environment feature was proposed. Firstly, the salient features of underwater images are obtained in terms of positive and negative sample learning. Secondly, a layered environmental attention mechanism is proposed to retrieve the relevant local and global features in the context. Finally, a coarse-to-fine image reconstruction model, with gradient penalty constraints, is constructed to obtain the fine restoration results. Contrast experiment between the proposed algorithm and the existing image reconstruction methods has been done in stereo quantitative underwater image data set, real-world underwater image enhancement data set, and underwater image data set, clearly proving that the proposed one is more effective and superior.


Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4566
Author(s):  
Chanhwi Lee ◽  
Jaehan Kim ◽  
Seoungbae Cho ◽  
Jinwoong Kim ◽  
Jisang Yoo ◽  
...  

The use of human gesturing to interact with devices such as computers or smartphones has presented several problems. This form of interaction relies on gesture interaction technology such as Leap Motion from Leap Motion, Inc, which enables humans to use hand gestures to interact with a computer. The technology has excellent hand detection performance, and even allows simple games to be played using gestures. Another example is the contactless use of a smartphone to take a photograph by simply folding and opening the palm. Research on interaction with other devices via hand gestures is in progress. Similarly, studies on the creation of a hologram display from objects that actually exist are also underway. We propose a hand gesture recognition system that can control the Tabletop holographic display based on an actual object. The depth image obtained using the latest Time-of-Flight based depth camera Azure Kinect is used to obtain information about the hand and hand joints by using the deep-learning model CrossInfoNet. Using this information, we developed a real time system that defines and recognizes gestures indicating left, right, up, and down basic rotation, and zoom in, zoom out, and continuous rotation to the left and right.


Sensors ◽  
2020 ◽  
Vol 20 (24) ◽  
pp. 7089
Author(s):  
Bushi Liu ◽  
Yongbo Lv ◽  
Yang Gu ◽  
Wanjun Lv

Due to deep learning’s accurate cognition of the street environment, the convolutional neural network has achieved dramatic development in the application of street scenes. Considering the needs of autonomous driving and assisted driving, in a general way, computer vision technology is used to find obstacles to avoid collisions, which has made semantic segmentation a research priority in recent years. However, semantic segmentation has been constantly facing new challenges for quite a long time. Complex network depth information, large datasets, real-time requirements, etc., are typical problems that need to be solved urgently in the realization of autonomous driving technology. In order to address these problems, we propose an improved lightweight real-time semantic segmentation network, which is based on an efficient image cascading network (ICNet) architecture, using multi-scale branches and a cascaded feature fusion unit to extract rich multi-level features. In this paper, a spatial information network is designed to transmit more prior knowledge of spatial location and edge information. During the course of the training phase, we append an external loss function to enhance the learning process of the deep learning network system as well. This lightweight network can quickly perceive obstacles and detect roads in the drivable area from images to satisfy autonomous driving characteristics. The proposed model shows substantial performance on the Cityscapes dataset. With the premise of ensuring real-time performance, several sets of experimental comparisons illustrate that SP-ICNet enhances the accuracy of road obstacle detection and provides nearly ideal prediction outputs. Compared to the current popular semantic segmentation network, this study also demonstrates the effectiveness of our lightweight network for road obstacle detection in autonomous driving.


2018 ◽  
Vol 15 (02) ◽  
pp. 1750022 ◽  
Author(s):  
Jing Li ◽  
Jianxin Wang ◽  
Zhaojie Ju

Gesture recognition plays an important role in human–computer interaction. However, most existing methods are complex and time-consuming, which limit the use of gesture recognition in real-time environments. In this paper, we propose a static gesture recognition system that combines depth information and skeleton data to classify gestures. Through feature fusion, hand digit gestures of 0–9 can be recognized accurately and efficiently. According to the experimental results, the proposed gesture recognition system is effective and robust, which is invariant to complex background, illumination changes, reversal, structural distortion, rotation, etc. We have tested the system both online and offline which proved that our system is satisfactory to real-time requirements, and therefore it can be applied to gesture recognition in real-world human–computer interaction systems.


2018 ◽  
Vol 8 (11) ◽  
pp. 2017 ◽  
Author(s):  
Gyu-cheol Lee ◽  
Sang-ha Lee ◽  
Jisang Yoo

People counting in surveillance cameras is a key technology for understanding the flow population and generating heat maps. In recent years, people detection performance has been greatly improved with the development of object detection algorithms using deep learning. However, in places where people are crowded, the detection rate is low as people are often occluded by other people. We proposed a people-counting method using a stereo camera to resolve the non-detection problem due to the occlusion. We applied stereo matching to extract the depth image and convert the camera view to top view using depth information. People were detected using a height map and an occupancy map, and people were tracked and counted using a Kalman filter-based tracker. We operated the proposed method on the NVIDIA Jetson TX2 to check the real-time operation possibility on the embedded board. Experimental results showed that the proposed method had higher accuracy than the existing methods and that real-time processing is possible.


Sign in / Sign up

Export Citation Format

Share Document