scholarly journals PARAMETRIC REPRESENTATION OF THE SPEAKER’S LIPS FOR MULTIMODAL SIGN LANGUAGE AND SPEECH RECOGNITION

Author(s):  
D. Ryumin ◽  
A. A. Karpov

In this article, we propose a new method for parametric representation of human’s lips region. The functional diagram of the method is described and implementation details with the explanation of its key stages and features are given. The results of automatic detection of the regions of interest are illustrated. A speed of the method work using several computers with different performances is reported. This universal method allows applying parametrical representation of the speaker’s lipsfor the tasks of biometrics, computer vision, machine learning, and automatic recognition of face, elements of sign languages, and audio-visual speech, including lip-reading.

Electronics ◽  
2020 ◽  
Vol 9 (8) ◽  
pp. 1302
Author(s):  
Luis Naranjo-Zeledón ◽  
Mario Chacón-Rivas ◽  
Jesús Peral ◽  
Antonio Ferrández

The study of phonological proximity makes it possible to establish a basis for future decision-making in the treatment of sign languages. Knowing how close a set of signs are allows the interested party to decide more easily its study by clustering, as well as the teaching of the language to third parties based on similarities. In addition, it lays the foundation for strengthening disambiguation modules in automatic recognition systems. To the best of our knowledge, this is the first study of its kind for Costa Rican Sign Language (LESCO, for its Spanish acronym), and forms the basis for one of the modules of the already operational system of sign and speech editing called the International Platform for Sign Language Edition (PIELS). A database of 2665 signs, grouped into eight contexts, is used, and a comparison of similarity measures is made, using standard statistical formulas to measure their degree of correlation. This corpus will be especially useful in machine learning approaches. In this work, we have proposed an analysis of different similarity measures between signs in order to find out the phonological proximity between them. After analyzing the results obtained, we can conclude that LESCO is a sign language with high levels of phonological proximity, particularly in the orientation and location components, but they are noticeably lower in the form component. We have also concluded as an outstanding contribution of our research that automatic recognition systems can take as a basis for their first prototypes the contexts or sign domains that map to clusters with lower levels of similarity. As mentioned, the results obtained have multiple applications such as in the teaching area or the Natural Language Processing area for automatic recognition tasks.


There are a lot of people who have many disabilities in our world out of which,people who are deaf and dumb cannot convey there messages to the normal people. Conversation becomes very difficult for this people. Deaf people cannot understand and hear what normal people is going to convey ,similarly dumb people need to convey their message using sign languages where normal people cannot understand unless he/she knows or understands the sign language. This brings to a need of an application which can be useful for having conversation between deaf,dumb and normal people. Here we are using hand gestures of Indian sign language (ISL) which contain all the alphabets and 0-9 digit gestures. The dataset of alphabets and digits is created by us.After dataset building we extracted the features using bagof- words and image preprocessing.With the feature extraction, histograms are been generated which maps alphabets to images. Finally, these features are fed to the supervised machine learning model to predict the gesture/sign. We did also use CNN model for training the model.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Sapna Juneja ◽  
Abhinav Juneja ◽  
Gaurav Dhiman ◽  
Shashank Jain ◽  
Anu Dhankhar ◽  
...  

Hand gesture recognition is one of the most sought technologies in the field of machine learning and computer vision. There has been an unprecedented demand for applications through which one can detect the hand signs for deaf people and people who use sign language to communicate, thereby detecting hand signs and correspondingly predicting the next word or recommending the word that may be most appropriate, followed by producing the word that the deaf people and people who use sign language to communicate want to say. This article presents an approach to develop such a system by that we can determine the most appropriate character from the sign that is being shown by the user or the person to the system. To enable pattern recognition, various machine learning techniques have been explored and we have used the CNN networks as a reliable solution in our context. The creation of such a system involves several convolution layers through which features have been captured layer by layer. The gathered features from the image are further used for training the model. The trained model efficiently predicts the most appropriate character in response to the sign exposed to the model. Thereafter, the predicted character is used to predict further words from it according to the recommendation system used in this case. The proposed system attains a prediction accuracy of 91.07%.


Author(s):  
D. Ivanko ◽  
D. Ryumin ◽  
A. Karpov

<p><strong>Abstract.</strong> Inability to use speech interfaces greatly limits the deaf and hearing impaired people in the possibility of human-machine interaction. To solve this problem and to increase the accuracy and reliability of the automatic Russian sign language recognition system it is proposed to use lip-reading in addition to hand gestures recognition. Deaf and hearing impaired people use sign language as the main way of communication in everyday life. Sign language is a structured form of hand gestures and lips movements involving visual motions and signs, which is used as a communication system. Since sign language includes not only hand gestures, but also lip movements that mimic vocalized pronunciation, it is of interest to investigate how accurately such a visual speech can be recognized by a lip-reading system, especially considering the fact that the visual speech of hearing impaired people is often characterized with hyper-articulation, which should potentially facilitate its recognition. For this purpose, thesaurus of Russian sign language (TheRusLan) collected in SPIIRAS in 2018–19 was used. The database consists of color optical FullHD video recordings of 13 native Russian sign language signers (11 females and 2 males) from “Pavlovsk boarding school for the hearing impaired”. Each of the signers demonstrated 164 phrases for 5 times. This work covers the initial stages of this research, including data collection, data labeling, region-of-interest detection and methods for informative features extraction. The results of this study can later be used to create assistive technologies for deaf or hearing impaired people.</p>


Author(s):  
Mikhail G. Grif ◽  
◽  
R. Elakkiya ◽  
Alexey L. Prikhodko ◽  
Maxim А. Bakaev ◽  
...  

In the paper, we consider recognition of sign languages (SL) with a particular focus on Russian and Indian SLs. The proposed recognition system includes five components: configuration, orientation, localization, movement and non-manual markers. The analysis uses methods of recognition of individual gestures and continuous sign speech for Indian and Russian sign languages (RSL). To recognize individual gestures, the RSL Dataset was developed, which includes more than 35,000 files for over 1000 signs. Each sign was performed with 5 repetitions and at least by 5 deaf native speakers of the Russian Sign Language from Siberia. To isolate epenthesis for continuous RSL, 312 sentences with 5 repetitions were selected and recorded on video. Five types of movements were distinguished, namely, "No gesture", "There is a gesture", "Initial movement", "Transitional movement", "Final movement". The markup of sentences for highlighting epenthesis was carried out on the Supervisely.ly platform. A recurrent network architecture (LSTM) was built, implemented using the TensorFlow Keras machine learning library. The accuracy of correct recognition of epenthesis was 95 %. The work on a similar dataset for the recognition of both individual gestures and continuous Indian sign language (ISL) is continuing. To recognize hand gestures, the mediapipe holistic library module was used. It contains a group of trained neural network algorithms that allow obtaining the coordinates of the key points of the body, palms and face of a person in the image. The accuracy of 85 % was achieved for the verification data. In the future, it is necessary to significantly increase the amount of labeled data. To recognize non-manual components, a number of rules have been developed for certain movements in the face. These rules include positions for the eyes, eyelids, mouth, tongue, and head tilt.


Author(s):  
Dmitry Ryumin ◽  
Ildar Kagirov ◽  
Alexander Axyonov ◽  
Alexey Karpov

Introduction: Currently, the recognition of gestures and sign languages is one of the most intensively developing areas in computer vision and applied linguistics. The results of current investigations are applied in a wide range of areas, from sign language translation to gesture-based interfaces. In that regard, various systems and methods for the analysis of gestural data are being developed. Purpose: A detailed review of methods and a comparative analysis of current approaches in automatic recognition of gestures and sign languages. Results: The main gesture recognition problems are the following: detection of articulators (mainly hands), pose estimation and segmentation of gestures in the flow of speech. The authors conclude that the use of two-stream convolutional and recurrent neural network architectures is generally promising for efficient extraction and processing of spatial and temporal features, thus solving the problem of dynamic gestures and coarticulations. This solution, however, heavily depends on the quality and availability of data sets. Practical relevance: This review can be considered a contribution to the study of rapidly developing sign language recognition, irrespective to particular natural sign languages. The results of the work can be used in the development of software systems for automatic gesture and sign language recognition.


Algorithms ◽  
2020 ◽  
Vol 13 (12) ◽  
pp. 310
Author(s):  
Valentin Belissen ◽  
Annelies Braffort ◽  
Michèle Gouiffès

Sign Languages (SLs) are visual–gestural languages that have developed naturally in deaf communities. They are based on the use of lexical signs, that is, conventionalized units, as well as highly iconic structures, i.e., when the form of an utterance and the meaning it carries are not independent. Although most research in automatic Sign Language Recognition (SLR) has focused on lexical signs, we wish to broaden this perspective and consider the recognition of non-conventionalized iconic and syntactic elements. We propose the use of corpora made by linguists like the finely and consistently annotated dialogue corpus Dicta-Sign-LSF-v2. We then redefined the problem of automatic SLR as the recognition of linguistic descriptors, with carefully thought out performance metrics. Moreover, we developed a compact and generalizable representation of signers in videos by parallel processing of the hands, face and upper body, then an adapted learning architecture based on a Recurrent Convolutional Neural Network (RCNN). Through a study focused on the recognition of four linguistic descriptors, we show the soundness of the proposed approach and pave the way for a wider understanding of Continuous Sign Language Recognition (CSLR).


Data in Brief ◽  
2021 ◽  
pp. 107021
Author(s):  
Ali Imran ◽  
Abdul Razzaq ◽  
Irfan Ahmad Baig ◽  
Aamir Hussain ◽  
Sharaiz Shahid ◽  
...  

2020 ◽  
Vol 37 (4) ◽  
pp. 571-608
Author(s):  
Diane Brentari ◽  
Laura Horton ◽  
Susan Goldin-Meadow

Abstract Two differences between signed and spoken languages that have been widely discussed in the literature are: the degree to which morphology is expressed simultaneously (rather than sequentially), and the degree to which iconicity is used, particularly in predicates of motion and location, often referred to as classifier predicates. In this paper we analyze a set of properties marking agency and number in four sign languages for their crosslinguistic similarities and differences regarding simultaneity and iconicity. Data from American Sign Language (ASL), Italian Sign Language (LIS), British Sign Language (BSL), and Hong Kong Sign Language (HKSL) are analyzed. We find that iconic, cognitive, phonological, and morphological factors contribute to the distribution of these properties. We conduct two analyses—one of verbs and one of verb phrases. The analysis of classifier verbs shows that, as expected, all four languages exhibit many common formal and iconic properties in the expression of agency and number. The analysis of classifier verb phrases (VPs)—particularly, multiple-verb predicates—reveals (a) that it is grammatical in all four languages to express agency and number within a single verb, but also (b) that there is crosslinguistic variation in expressing agency and number across the four languages. We argue that this variation is motivated by how each language prioritizes, or ranks, several constraints. The rankings can be captured in Optimality Theory. Some constraints in this account, such as a constraint to be redundant, are found in all information systems and might be considered non-linguistic; however, the variation in constraint ranking in verb phrases reveals the grammatical and arbitrary nature of linguistic systems.


Sign in / Sign up

Export Citation Format

Share Document