scholarly journals Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks

Author(s):  
Ben Saunders ◽  
Necati Cihan Camgoz ◽  
Richard Bowden

AbstractSign languages are multi-channel visual languages, where signers use a continuous 3D space to communicate. Sign language production (SLP), the automatic translation from spoken to sign languages, must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community. Previous deep learning-based SLP works have produced only a concatenation of isolated signs focusing primarily on the manual features, leading to a robotic and non-expressive production. In this work, we propose a novel Progressive Transformer architecture, the first SLP model to translate from spoken language sentences to continuous 3D multi-channel sign pose sequences in an end-to-end manner. Our transformer network architecture introduces a counter decoding that enables variable length continuous sequence generation by tracking the production progress over time and predicting the end of sequence. We present extensive data augmentation techniques to reduce prediction drift, alongside an adversarial training regime and a mixture density network (MDN) formulation to produce realistic and expressive sign pose sequences. We propose a back translation evaluation mechanism for SLP, presenting benchmark quantitative results on the challenging PHOENIX14T dataset and setting baselines for future research. We further provide a user evaluation of our SLP model, to understand the Deaf reception of our sign pose productions.

2021 ◽  
Vol 6 ◽  
Author(s):  
Karen Emmorey

The first 40 years of research on the neurobiology of sign languages (1960–2000) established that the same key left hemisphere brain regions support both signed and spoken languages, based primarily on evidence from signers with brain injury and at the end of the 20th century, based on evidence from emerging functional neuroimaging technologies (positron emission tomography and fMRI). Building on this earlier work, this review focuses on what we have learned about the neurobiology of sign languages in the last 15–20 years, what controversies remain unresolved, and directions for future research. Production and comprehension processes are addressed separately in order to capture whether and how output and input differences between sign and speech impact the neural substrates supporting language. In addition, the review includes aspects of language that are unique to sign languages, such as pervasive lexical iconicity, fingerspelling, linguistic facial expressions, and depictive classifier constructions. Summary sketches of the neural networks supporting sign language production and comprehension are provided with the hope that these will inspire future research as we begin to develop a more complete neurobiological model of sign language processing.


1995 ◽  
Vol 53 ◽  
pp. 61-69
Author(s):  
Carola Rooijmans

Research has shown parallels in the development of linguistic aspects found in sign languages and spoken languages when acquired as a first language (Newport & Meier, 1985). Deaf children of deaf parents (DCDP) are exposed to sign language early and are able to acquire it effortlessly. However, only about 10% of deaf children have deaf parents. More commonly the deaf child is born into a hearing family. These hearing parents usually use a communication system in which spoken words are supported simultaneously with signs. Such a sign system differs considerably from a sign language as it is not a natural language. Deaf children of hearing parents (DCHP) come into contact with sign language when they go to a school for the deaf. Research indicates that DCHP do acquire sign language structures, but this acquisition is delayed (Knoors, 1992). In this study a description of the development of morpho-syntactic and lexical aspects of the Sign Language of the Netherlands is given. The sign language production of three DCDP is analysed every six months from 1;0 to 3;6. Furthermore, the sign language production of three DCHP at the age of 3;6 is compared with that of the DCDP at the same age. The study includes both general measures such as Mean Length of Utterance and Type/Token Ratio and aspects specific to sign languages such as the use of POINTS in two sign combinations. Recommendations will be made with respect to the improvement of observational research on language acquision of DCDP and DCHP.


Author(s):  
Amanda Elizabeth Smith ◽  
Dai O'Brien

This chapter outlines the experiences of the authors when using video technologies in creating resources for teaching British Sign Language (BSL). The authors outline their own experiences of creating resources for teaching and how the increasing availability of video technology and video hosting websites has impacted on their teaching practice. The chapter outlines some practical stages in creating online video resources for the teaching of sign language, and also how to ensure that less computer literate students can engage with this new technology. The authors conclude with some suggestions about future research directions to measure the impact and effectiveness of such resources and technologies and call other teachers of sign languages to explore the potential of these approaches for themselves.


2014 ◽  
Vol 17 (1) ◽  
pp. 82-101 ◽  
Author(s):  
Jesse Stewart

In spoken languages, disfluent speech, narrative effects, discourse information, and phrase position may influence the lengthening of segments beyond their typical duration. In sign languages, however, the primary use of the visual-gestural modality results in articulatory differences not expressed in spoken languages. This paper looks at sign lengthening in American Sign Language (ASL). Comparing two retellings of the Pear Story narrative from five signers, three primary lengthening mechanisms were identified: elongation, repetition, and deceleration. These mechanisms allow signers to incorporate lengthening into signs which may benefit from decelerated language production due to high information load or complex articulatory processes. Using a mixed effects model, significant differences in duration were found between (i) non-conventionalized forms vs. lexical signs, (ii) signs produced during role shift vs. non-role shift, (iii) signs in phrase-final/initial vs. phrase-medial position, (iv) new vs. given information, and (v) (non-disordered) disfluent signing vs. non-disfluent signing. These results provide insights into duration effects caused by information load and articulatory processes in ASL.


Electronics ◽  
2021 ◽  
Vol 10 (6) ◽  
pp. 670
Author(s):  
Jakob Abeßer ◽  
Meinard Müller

In this paper, we adapt a recently proposed U-net deep neural network architecture from melody to bass transcription. We investigate pitch shifting and random equalization as data augmentation techniques. In a parameter importance study, we study the influence of the skip connection strategy between the encoder and decoder layers, the data augmentation strategy, as well as of the overall model capacity on the system’s performance. Using a training set that covers various music genres and a validation set that includes jazz ensemble recordings, we obtain the best transcription performance for a downscaled version of the reference algorithm combined with skip connections that transfer intermediate activations between the encoder and decoder. The U-net based method outperforms previous knowledge-driven and data-driven bass transcription algorithms by around five percentage points in overall accuracy. In addition to a pitch estimation improvement, the voicing estimation performance is clearly enhanced.


Author(s):  
Mikhail G. Grif ◽  
◽  
R. Elakkiya ◽  
Alexey L. Prikhodko ◽  
Maxim А. Bakaev ◽  
...  

In the paper, we consider recognition of sign languages (SL) with a particular focus on Russian and Indian SLs. The proposed recognition system includes five components: configuration, orientation, localization, movement and non-manual markers. The analysis uses methods of recognition of individual gestures and continuous sign speech for Indian and Russian sign languages (RSL). To recognize individual gestures, the RSL Dataset was developed, which includes more than 35,000 files for over 1000 signs. Each sign was performed with 5 repetitions and at least by 5 deaf native speakers of the Russian Sign Language from Siberia. To isolate epenthesis for continuous RSL, 312 sentences with 5 repetitions were selected and recorded on video. Five types of movements were distinguished, namely, "No gesture", "There is a gesture", "Initial movement", "Transitional movement", "Final movement". The markup of sentences for highlighting epenthesis was carried out on the Supervisely.ly platform. A recurrent network architecture (LSTM) was built, implemented using the TensorFlow Keras machine learning library. The accuracy of correct recognition of epenthesis was 95 %. The work on a similar dataset for the recognition of both individual gestures and continuous Indian sign language (ISL) is continuing. To recognize hand gestures, the mediapipe holistic library module was used. It contains a group of trained neural network algorithms that allow obtaining the coordinates of the key points of the body, palms and face of a person in the image. The accuracy of 85 % was achieved for the verification data. In the future, it is necessary to significantly increase the amount of labeled data. To recognize non-manual components, a number of rules have been developed for certain movements in the face. These rules include positions for the eyes, eyelids, mouth, tongue, and head tilt.


Author(s):  
Muhammad Ezar Al Rivan ◽  
Mochammad Trinanda Noviardy

Sign languages have various types, one of which is American Sign Language (ASL). In this study, ASL images from the handshape alphabet were extracted using Histogram of Oriented Gradient (HOG) then these features were used for the classification of Artificial Neural Networks (ANN) with various training functions using 3 variations of multi-layer network architecture where ANN architecture consists of one hidden layer. Based on ANN training, trainbr test results have a higher success rate than other training functions. In architecture with 15 neurons in the hidden layer get an accuracy value of 99.29%, a precision of 91.84%, and a recall of 91.47%. The test results show that using the HOG feature and ANN classification method for ASL recognition gives a good level of accuracy, with an overall accuracy of 5 neurons 95.38%, 10 neurons 96.64%, and 15 neurons with 97.32%.   Keywords— Artificial Neural Network; American Sign Language; Histogram of Oriented Gradient; Training Function


Author(s):  
Yidnekachew Kibru Afework ◽  
Taye Girma Debelee

Bacterial Wilt disease is the most determinant factor as it results in a serious reduction in the quality and quantity of food produced by Enset crop. Therefore, early detection of Bacterial Wilt disease is important to diagnose and fight the disease. To this end, a deep learning approach that can detect the disease by using healthy and infected leave images of the crop is proposed. In particular, a convolutional neural network architecture is designed to classify the images collected from different farms as diseased or healthy. A total of 4896 images that were captured directly from the farm with the help of experts in the field of agriculture was used to train the proposed model. The proposed model was trained using these images and data augmentation techniques was applied to generate more images. Besides training the proposed model, a pre-trained model namely VGG16 is trained by using our dataset. The proposed model achieved a mean accuracy of 98.5% and the VGG16 pre-trained model achieved a mean accuracy of 96.6% by using a mini-batch size of 32 and a learning rate of 0.001. The preliminary results demonstrated that the effectiveness of the proposed approach under challenging conditions such as illumination, complex background, different resolutions, variable scale, rotation, and orientation of the real scene images.


2020 ◽  
Vol 10 ◽  
Author(s):  
Josée-Anna Tanner ◽  
Nina Doré

This article draws on translanguaging theory and research to consider a common pedagogical practice in American Sign Language (ASL) as a second language (L2) classroom, the No Voice policy (i.e., spoken language use is forbidden). The No Voice policy serves important cultural and practical purposes, but by nature limits learners’ access to their entire linguistic repertoire, which raises questions about the overall impact of the policy on learners’ language development. Current literature about pedagogical translanguaging has not yet addressed practices that integrate (and, by extension, limit) selective modalities; we evaluate this gap and propose several directions for future research on the topic.Moreover, previous discussions of translanguaging practices involving recognized minority (e.g., Basque, Welsh, Irish) spoken languages are not wholly comparable to sign languages, which are not yet official or fully recognized languages in most countries and are therefore additionally vulnerable.We take into account the impact of ASL L2 learners on the language community, as many learners go on to become interpreters and allies to the deaf community. Keywords: American Sign Language as a second language, hearing adult learners, selective modality, pedagogical translanguaging, minority language


Sign in / Sign up

Export Citation Format

Share Document