Detecting Hand Posture in Piano Playing Using Depth Data

We present research for automatic assessment of pianist hand posture that is intended to help beginning piano students improve their piano-playing technique during practice sessions. To automatically assess a student's hand posture, we propose a system that is able to recognize three categories of postures from a single depth map containing a pianist's hands during performance. This is achieved through a computer vision pipeline that uses machine learning on the depth maps for both hand segmentation and detection of hand posture. First, we segment the left and right hands from the scene captured in the depth map using per-pixel classification. To train the hand-segmentation models, we experiment with two feature descriptors, depth image features and depth context features, that describe the context of individual pixels' neighborhoods. After the hands have been segmented from the depth map, a posture-detection model classifies each hand as one of three possible posture categories: correct posture, low wrists, or flat hands. Two methods are tested for extracting descriptors from the segmented hands, histograms of oriented gradients and histograms of normal vectors. To account for variation in hand size and practice space, detection models are individually built for each student using support vector machines with the extracted descriptors. We validate this approach using a data set that was collected by recording four beginning piano students while performing standard practice exercises. The results presented in this article show the effectiveness of this approach, with depth context features and histograms of normal vectors performing the best.

Download Full-text

Approach to hand posture recognition based on hand shape features for human–robot interaction

Complex & Intelligent Systems ◽

10.1007/s40747-021-00333-w ◽

2021 ◽

Author(s):

Jing Qi ◽

Kun Xu ◽

Xilun Ding

Keyword(s):

Gaussian Mixture ◽

Human Robot Interaction ◽

Polar Coordinates ◽

Support Vector ◽

Hand Posture ◽

Data Set ◽

Hand Shape ◽

Hand Posture Recognition ◽

Hand Segmentation ◽

Posture Recognition

AbstractHand segmentation is the initial step for hand posture recognition. To reduce the effect of variable illumination in hand segmentation step, a new CbCr-I component Gaussian mixture model (GMM) is proposed to detect the skin region. The hand region is selected as a region of interest from the image using the skin detection technique based on the presented CbCr-I component GMM and a new adaptive threshold. A new hand shape distribution feature described in polar coordinates is proposed to extract hand contour features to solve the false recognition problem in some shape-based methods and effectively recognize the hand posture in cases when different hand postures have the same number of outstretched fingers. A multiclass support vector machine classifier is utilized to recognize the hand posture. Experiments were carried out on our data set to verify the feasibility of the proposed method. The results showed the effectiveness of the proposed approach compared with other methods.

Download Full-text

Iranian kinect face database (IKFDB): a color-depth based face database collected by kinect v.2 sensor

SN Applied Sciences ◽

10.1007/s42452-020-03999-y ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Seyed Muhammad Hossein Mousavi ◽

S. Younes Mirinezhad

Keyword(s):

Neural Network ◽

Facial Expression ◽

Facial Expression Recognition ◽

Depth Image ◽

Sensor Technology ◽

Support Vector ◽

Expression Recognition ◽

Face Database ◽

Depth Data ◽

Color Depth

AbstractThis study presents a new color-depth based face database gathered from different genders and age ranges from Iranian subjects. Using suitable databases, it is possible to validate and assess available methods in different research fields. This database has application in different fields such as face recognition, age estimation and Facial Expression Recognition and Facial Micro Expressions Recognition. Image databases based on their size and resolution are mostly large. Color images usually consist of three channels namely Red, Green and Blue. But in the last decade, another aspect of image type has emerged, named “depth image”. Depth images are used in calculating range and distance between objects and the sensor. Depending on the depth sensor technology, it is possible to acquire range data differently. Kinect sensor version 2 is capable of acquiring color and depth data simultaneously. Facial expression recognition is an important field in image processing, which has multiple uses from animation to psychology. Currently, there is a few numbers of color-depth (RGB-D) facial micro expressions recognition databases existing. With adding depth data to color data, the accuracy of final recognition will be increased. Due to the shortage of color-depth based facial expression databases and some weakness in available ones, a new and almost perfect RGB-D face database is presented in this paper, covering Middle-Eastern face type. In the validation section, the database will be compared with some famous benchmark face databases. For evaluation, Histogram Oriented Gradients features are extracted, and classification algorithms such as Support Vector Machine, Multi-Layer Neural Network and a deep learning method, called Convolutional Neural Network or are employed. The results are so promising.

Download Full-text

A Residual Network and FPGA Based Real-Time Depth Map Enhancement System

Entropy ◽

10.3390/e23050546 ◽

2021 ◽

Vol 23 (5) ◽

pp. 546

Author(s):

Zhenni Li ◽

Haoyi Sun ◽

Yuliang Gao ◽

Jiao Wang

Keyword(s):

Real Time ◽

Super Resolution ◽

Depth Map ◽

Acquisition System ◽

Depth Image ◽

Fpga Design ◽

Depth Sensing ◽

Residual Network ◽

Real Time Processing ◽

Depth Maps

Depth maps obtained through sensors are often unsatisfactory because of their low-resolution and noise interference. In this paper, we propose a real-time depth map enhancement system based on a residual network which uses dual channels to process depth maps and intensity maps respectively and cancels the preprocessing process, and the algorithm proposed can achieve real-time processing speed at more than 30 fps. Furthermore, the FPGA design and implementation for depth sensing is also introduced. In this FPGA design, intensity image and depth image are captured by the dual-camera synchronous acquisition system as the input of neural network. Experiments on various depth map restoration shows our algorithms has better performance than existing LRMC, DE-CNN and DDTF algorithms on standard datasets and has a better depth map super-resolution, and our FPGA completed the test of the system to ensure that the data throughput of the USB 3.0 interface of the acquisition system is stable at 226 Mbps, and support dual-camera to work at full speed, that is, 54 fps@ (1280 × 960 + 328 × 248 × 3).

Download Full-text

An Evaluation of the Effectiveness of Image-based Texture Features Extracted from Static B-mode Ultrasound Images in Distinguishing between Benign and Malignant Ovarian Masses

Ultrasonic Imaging ◽

10.1177/0161734621998091 ◽

2021 ◽

pp. 016173462199809

Author(s):

Dhurgham Al-karawi ◽

Hisham Al-Assam ◽

Hongbo Du ◽

Ahmad Sayasneh ◽

Chiara Landolfo ◽

...

Keyword(s):

Gabor Filter ◽

Empirical Evaluation ◽

Texture Features ◽

Image Texture ◽

Support Vector ◽

Simple Majority ◽

Ultrasound Scan ◽

Histograms Of Oriented Gradients ◽

The Individual ◽

Ovarian Masses

Significant successes in machine learning approaches to image analysis for various applications have energized strong interest in automated diagnostic support systems for medical images. The evolving in-depth understanding of the way carcinogenesis changes the texture of cellular networks of a mass/tumor has been informing such diagnostics systems with use of more suitable image texture features and their extraction methods. Several texture features have been recently applied in discriminating malignant and benign ovarian masses by analysing B-mode images from ultrasound scan of the ovary with different levels of performance. However, comparative performance evaluation of these reported features using common sets of clinically approved images is lacking. This paper presents an empirical evaluation of seven commonly used texture features (histograms, moments of histogram, local binary patterns [256-bin and 59-bin], histograms of oriented gradients, fractal dimensions, and Gabor filter), using a collection of 242 ultrasound scan images of ovarian masses of various pathological characteristics. The evaluation examines not only the effectiveness of classification schemes based on the individual texture features but also the effectiveness of various combinations of these schemes using the simple majority-rule decision level fusion. Trained support vector machine classifiers on the individual texture features without any specific pre-processing, achieve levels of accuracy between 75% and 85% where the seven moments and the 256-bin LBP are at the lower end while the Gabor filter is at the upper end. Combining the classification results of the top k ( k = 3, 5, 7) best performing features further improve the overall accuracy to a level between 86% and 90%. These evaluation results demonstrate that each of the investigated image-based texture features provides informative support in distinguishing benign or malignant ovarian masses.

Download Full-text

Robust hand gesture recognition using multiple shape-oriented visual cues

EURASIP Journal on Image and Video Processing ◽

10.1186/s13640-021-00567-1 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Samy Bakheet ◽

Ayoub Al-Hamadi

Keyword(s):

Real Time ◽

Gesture Recognition ◽

Pose Estimation ◽

Depth Map ◽

Hand Gesture Recognition ◽

Support Vector ◽

Hand Gesture ◽

Hand Pose Estimation ◽

Time Operation ◽

Hand Pose

AbstractRobust vision-based hand pose estimation is highly sought but still remains a challenging task, due to its inherent difficulty partially caused by self-occlusion among hand fingers. In this paper, an innovative framework for real-time static hand gesture recognition is introduced, based on an optimized shape representation build from multiple shape cues. The framework incorporates a specific module for hand pose estimation based on depth map data, where the hand silhouette is first extracted from the extremely detailed and accurate depth map captured by a time-of-flight (ToF) depth sensor. A hybrid multi-modal descriptor that integrates multiple affine-invariant boundary-based and region-based features is created from the hand silhouette to obtain a reliable and representative description of individual gestures. Finally, an ensemble of one-vs.-all support vector machines (SVMs) is independently trained on each of these learned feature representations to perform gesture classification. When evaluated on a publicly available dataset incorporating a relatively large and diverse collection of egocentric hand gestures, the approach yields encouraging results that agree very favorably with those reported in the literature, while maintaining real-time operation.

Download Full-text

Recognition of Symbolic Gestures Using Depth Information

Advances in Human-Computer Interaction ◽

10.1155/2018/1069823 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 2

Author(s):

Hasan Mahmud ◽

Md. Kamrul Hasan ◽

Abdullah-Al-Tariq ◽

Md. Hasanul Kabir ◽

M. A. Mottalib

Keyword(s):

Contextual Information ◽

Depth Map ◽

Recognition System ◽

Support Vector ◽

Svm Classifier ◽

Depth Information ◽

Scale Invariant ◽

Binary Images ◽

Depth Images ◽

Symbolic Gestures

Symbolic gestures are the hand postures with some conventionalized meanings. They are static gestures that one can perform in a very complex environment containing variations in rotation and scale without using voice. The gestures may be produced in different illumination conditions or occluding background scenarios. Any hand gesture recognition system should find enough discriminative features, such as hand-finger contextual information. However, in existing approaches, depth information of hand fingers that represents finger shapes is utilized in limited capacity to extract discriminative features of fingers. Nevertheless, if we consider finger bending information (i.e., a finger that overlaps palm), extracted from depth map, and use them as local features, static gestures varying ever so slightly can become distinguishable. Our work here corroborated this idea and we have generated depth silhouettes with variation in contrast to achieve more discriminative keypoints. This approach, in turn, improved the recognition accuracy up to 96.84%. We have applied Scale-Invariant Feature Transform (SIFT) algorithm which takes the generated depth silhouettes as input and produces robust feature descriptors as output. These features (after converting into unified dimensional feature vectors) are fed into a multiclass Support Vector Machine (SVM) classifier to measure the accuracy. We have tested our results with a standard dataset containing 10 symbolic gesture representing 10 numeric symbols (0-9). After that we have verified and compared our results among depth images, binary images, and images consisting of the hand-finger edge information generated from the same dataset. Our results show higher accuracy while applying SIFT features on depth images. Recognizing numeric symbols accurately performed through hand gestures has a huge impact on different Human-Computer Interaction (HCI) applications including augmented reality, virtual reality, and other fields.

Download Full-text

Machine Learning based Human Fall Detection System

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35394 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 2677-2682

Author(s):

Nishanth P

Keyword(s):

Machine Learning ◽

Detection System ◽

Wearable Sensors ◽

Fall Detection ◽

The Elderly ◽

World Health ◽

Support Vector ◽

Data Output ◽

Health Organization ◽

Context Features

Falls have become one of the reasons for death. It is common among the elderly. According to World Health Organization (WHO), 3 out of 10 living alone elderly people of age 65 and more tend to fall. This rate may get higher in the upcoming years. In recent years, the safety of elderly residents alone has received increased attention in a number of countries. The fall detection system based on the wearable sensors has made its debut in response to the early indicator of detecting the fall and the usage of the IoT technology, but it has some drawbacks, including high infiltration, low accuracy, poor reliability. This work describes a fall detection that does not reliant on wearable sensors and is related on machine learning and image analysing in Python. The camera's high-frequency pictures are sent to the network, which uses the Convolutional Neural Network technique to identify the main points of the human. The Support Vector Machine technique uses the data output from the feature extraction to classify the fall. Relatives will be notified via mobile message. Rather than modelling individual activities, we use both motion and context information to recognize activities in a scene. This is based on the notion that actions that are spatially and temporally connected rarely occur alone and might serve as background for one another. We propose a hierarchical representation of action segments and activities using a two-layer random field model. The model allows for the simultaneous integration of motion and a variety of context features at multiple levels, as well as the automatic learning of statistics that represent the patterns of the features.

Download Full-text

Precise hand segmentation from a single depth image

2016 23rd International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr.2016.7899995 ◽

2016 ◽

Author(s):

Minglei Li ◽

Lei Sun ◽

Qiang Huo

Keyword(s):

Depth Image ◽

Hand Segmentation

Download Full-text

No-Reference Depth Map Quality Evaluation Model Based on Depth Map Edge Confidence Measurement in Immersive Video Applications

Future Internet ◽

10.3390/fi11100204 ◽

2019 ◽

Vol 11 (10) ◽

pp. 204 ◽

Cited By ~ 3

Author(s):

Dogan ◽

Haddad ◽

Ekmekcioglu ◽

Kondoz

Keyword(s):

Quality Evaluation ◽

Evaluation Model ◽

Depth Map ◽

Depth Image ◽

Depth Maps ◽

Model Based ◽

Objective Quality ◽

Quality Evaluation Model ◽

Confidence Measurement

When it comes to evaluating perceptual quality of digital media for overall quality of experience assessment in immersive video applications, typically two main approaches stand out: Subjective and objective quality evaluation. On one hand, subjective quality evaluation offers the best representation of perceived video quality assessed by the real viewers. On the other hand, it consumes a significant amount of time and effort, due to the involvement of real users with lengthy and laborious assessment procedures. Thus, it is essential that an objective quality evaluation model is developed. The speed-up advantage offered by an objective quality evaluation model, which can predict the quality of rendered virtual views based on the depth maps used in the rendering process, allows for faster quality assessments for immersive video applications. This is particularly important given the lack of a suitable reference or ground truth for comparing the available depth maps, especially when live content services are offered in those applications. This paper presents a no-reference depth map quality evaluation model based on a proposed depth map edge confidence measurement technique to assist with accurately estimating the quality of rendered (virtual) views in immersive multi-view video content. The model is applied for depth image-based rendering in multi-view video format, providing comparable evaluation results to those existing in the literature, and often exceeding their performance.

Download Full-text

Automatic Badminton Action Recognition Using RGB-D Sensor

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1042.89 ◽

2014 ◽

Vol 1042 ◽

pp. 89-93 ◽

Cited By ~ 4

Author(s):

H.Y. Ting ◽

K.S. Sim ◽

F.S. Abas

Keyword(s):

Action Recognition ◽

Support Vector Machine Classifier ◽

Recognition Accuracy ◽

Depth Map ◽

Microsoft Kinect ◽

Experimental Result ◽

Support Vector ◽

Validation Test ◽

And Gender ◽

Microsoft Kinect Sensor

This paper presents a method to recognize badminton action from depth map sequences acquired by Microsoft Kinect sensor. Badminton is one of Malaysia’s most popular, but there is still lack of research on action recognition focusing on this sport. In this research, bone orientation details of badminton players are computed and extracted in order to form a bag of quaternions feature vectors. After conversion to log-covariance matrix, the system is trained and the badminton actions are classified by a support vector machine classifier. Our experimental dataset of depth map sequences composed of 300 badminton action samples of 10 badminton actions performed by six badminton players. The dataset varies in terms of human body size, clothes, speed, and gender. Experimental result has shown that nearly 92% of average recognition accuracy (ARA) was achieved in inter-class leave one sample out cross validation test. At the same time, 86% of ARA was achieved in inter-class cross subject validation test.

Download Full-text