scholarly journals Automatic Thai Finger Spelling Transcription

Author(s):  
Pisit NAKJAI ◽  
Tatpong KATANYUKUL

This article explores a transcription of a video recording Thai Finger Spelling (TFS)—a specific signing mode used in Thai sign language—to a corresponding Thai word. TFS copes with 42 Thai alphabets and 20 vowels using multiple and complex schemes. This leads to many technical challenges uncommon in spelling schemes of other sign languages. Our proposed system, Automatic Thai Finger Spelling Transcription (ATFS), processes a signing video in 3 stages: ALS marking video frames to easily remove any non-signing frame as well as conveniently group frames associating to the same alphabet, SR classifying a signing image frame to a sign label (or its equivalence), and SSR transcribing a series of signs into alphabets. ALS utilizes the TFS practice of signing different alphabets at different locations. SR and SSC employ well-adopted spatial and sequential models. Our ATFS has been found to achieve Alphabet Error Rate (AER) 0.256 (c.f. 0.63 of the baseline method). In addition to ATFS, our findings have disclosed a benefit of coupling image classification and sequence modeling stages by using a feature or penultimate vector for label representation rather than a definitive label or one-hot coding. Our results also assert the necessity of a smoothening mechanism in ALS and reveal a benefit of our proposed WFS, which could lead to over 15.88 % improvement. For TFS transcription, our work emphasizes the utilization of signing location in the identification of different alphabets. This is contrary to a common belief of exploiting signing time duration, which are shown to be ineffective by our data. HIGHLIGHTS Prototype of Thai finger spelling transcription (transcribing a signing video to alphabets) Utilization of signing location as cue for identification of different alphabets Disclosure of a benefit of coupling image classification and sequence modeling in signing transcription Examination of various frame smoothing techniques and their contributions to the overall transcription performance GRAPHICAL ABSTRACT

2020 ◽  
Author(s):  
Muhammad Awais ◽  
Xi Long ◽  
Bin Yin ◽  
Chen chen ◽  
Saeed Akbarzadeh ◽  
...  

Abstract Objective: In this paper, we propose to evaluate the use of a pre-trained convolutional neural networks (CNNs) as a features extractor followed by the Principal Component Analysis (PCA) to find the best discriminant features to perform classification using support vector machine (SVM) algorithm for neonatal sleep and wake states using Fluke® facial video frames. Using pre-trained CNNs as feature extractor would hugely reduce the effort of collecting new neonatal data for training a neural network which could be computationally very expensive. The features are extracted after fully connected layers (FCL’s), where we compare several pre-trained CNNs, e.g., VGG16, VGG19, InceptionV3, GoogLeNet, ResNet, and AlexNet. Results: From around 2-h Fluke® video recording of seven neonate, we achieved a modest classification performance with an accuracy, sensitivity, and specificity of 65.3%, 69.8%, 61.0%, respectively with AlexNet using Fluke® (RGB) video frames. This indicates that using a pre-trained model as a feature extractor could not fully suffice for highly reliable sleep and wake classification in neonates. Therefore, in future a dedicated neural network trained on neonatal data or a transfer learning approach is required.


Biology Open ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. bio055962
Author(s):  
Maja Mielke ◽  
Peter Aerts ◽  
Chris Van Ginneken ◽  
Sam Van Wassenbergh ◽  
Falk Mielke

ABSTRACTDigitization of video recordings often requires the laborious procedure of manually clicking points of interest on individual video frames. Here, we present progressive tracking, a procedure that facilitates manual digitization of markerless videos. In contrast to existing software, it allows the user to follow points of interest with a cursor in the progressing video, without the need to click. To compare the performance of progressive tracking with the conventional frame-wise tracking, we quantified speed and accuracy of both methods, testing two different input devices (mouse and stylus pen). We show that progressive tracking can be twice as fast as frame-wise tracking while maintaining accuracy, given that playback speed is controlled. Using a stylus pen can increase frame-wise tracking speed. The complementary application of the progressive and frame-wise mode is exemplified on a realistic video recording. This study reveals that progressive tracking can vastly facilitate video analysis in experimental research.


2020 ◽  
Vol 13 (1) ◽  
Author(s):  
Muhammad Awais ◽  
Xi Long ◽  
Bin Yin ◽  
Chen Chen ◽  
Saeed Akbarzadeh ◽  
...  

Abstract Objective In this paper, we propose to evaluate the use of pre-trained convolutional neural networks (CNNs) as a features extractor followed by the Principal Component Analysis (PCA) to find the best discriminant features to perform classification using support vector machine (SVM) algorithm for neonatal sleep and wake states using Fluke® facial video frames. Using pre-trained CNNs as a feature extractor would hugely reduce the effort of collecting new neonatal data for training a neural network which could be computationally expensive. The features are extracted after fully connected layers (FCL’s), where we compare several pre-trained CNNs, e.g., VGG16, VGG19, InceptionV3, GoogLeNet, ResNet, and AlexNet. Results From around 2-h Fluke® video recording of seven neonates, we achieved a modest classification performance with an accuracy, sensitivity, and specificity of 65.3%, 69.8%, 61.0%, respectively with AlexNet using Fluke® (RGB) video frames. This indicates that using a pre-trained model as a feature extractor could not fully suffice for highly reliable sleep and wake classification in neonates. Therefore, in future work a dedicated neural network trained on neonatal data or a transfer learning approach is required.


2020 ◽  
Vol 7 (4) ◽  
pp. 269-281
Author(s):  
Lukáš Jurík ◽  
Natália Horňáková ◽  
Veronika Domčeková

The production process fluency is often interrupted by idle time. A significant proportion of individual idle times is caused by changeover. Trends, such as the individualization of requirements, the constant effort to meet the customers’ requirements on time and the maintaining of the production process fluency at low costs, are aimed at eliminating idle time. In terms of contradictory goals such as individualization of customer requirements, which is reflected in the high variability of production / products and minimalization of the production time and its fluency, it is necessary to pay increased attention to the changeover process. The problem related to the changeover process can be solved in two ways: by reducing the number of changeovers (reducing production variability and achieving dissatisfaction with individual customer requirements) or by shortening the changeover time (while maintaining production variability and ability to satisfy a wide range of individual customer requirements). The Single-Minute Exchange of Die - SMED method is used to shorten the time duration of the changeover process and eliminate waste in the given process. The aim of the paper is to apply the SMED method in vibration welder changeover process in a selected industrial enterprise and thus achieve a shortening of the changeover process. The SMED method was applied in the enterprise which belongs to the group of small and medium-sized enterprises. The research method was indirect observation via video recording and time snap. Various types of waste were identified based on the analysis, and subsequently eliminated by proposed rationalization measures. Finally, the time duration of the changeover process before the analysis and after the implementation of rationalization measures was compared.


2020 ◽  
Author(s):  
Muhammad Awais ◽  
Xi Long ◽  
Bin Yin ◽  
Chen chen ◽  
Saeed Akbarzadeh ◽  
...  

Abstract Objective In this paper, we propose to evaluate the use of a pre-trained convolutional neural networks (CNN’s) as a features extractor followed by the Principal Component Analysis (PCA) to find the best discriminant features to perform classification using support vector machine (SVM) algorithm for neonatal sleep and wake states using Fluke® facial video frames. Using pre-trained CNN’s as feature extractor would hugely reduce the effort of collecting new neonatal data for training a neural network which could be computationally very expensive. The features are extracted after fully connected layers (FCL’s), where we compare several pre-trained CNN’s, e.g., VGG16, VGG19, InceptionV3, GoogLeNet, ResNet, and AlexNet. Results From around 2-h Fluke® video recording of seven neonate, we achieved a modest classification performance with an accuracy, sensitivity, and specificity of 65.3%, 69.8%, 61.0%, respectively with AlexNet using Fluke® (RGB) video frames. This indicates that using a pre-trained model as a feature extractor could not fully suffice for highly reliable sleep and wake classification in neonates. Therefore, in future a dedicated neural network trained on neonatal data or a transfer learning approach is required.


2018 ◽  
Vol 938 ◽  
pp. 119-123
Author(s):  
Y.N. Saraev ◽  
A.G. Lunev ◽  
A.S. Kiselev ◽  
Anton S. Gordynets ◽  
V.M. Semenchuk

The results of the development and manufacturing of a unique research complex for investigating fast-flowing processes of heat and mass transfer during arc welding with a melting electrode are presented. The advantages of the developed complex over traditionally used ones are shown using cinema and video cameras by shadowing the image of heat and mass transfer of a metal. Studies of fast processes using high-speed video recording require powerful backlighting, such as a scattered laser beam, which enhances the visualization of the observed object, namely, the process of melting and transferring each electrode metal droplet under conditions of intense light emission from the electric arc. The article contains explanatory diagrams, control algorithms, video frames of a separate welding microcycle, and examples of recorded oscillograms and the graphical representations of the changes in their quantitative values.


2020 ◽  
Author(s):  
Muhammad Awais ◽  
Xi Long ◽  
Bin Yin ◽  
Chen chen ◽  
Saeed Akbarzadeh ◽  
...  

Abstract Objective: In this paper, we propose to evaluate the use of a pre-trained convolutional neural networks (CNNs) as a features extractor followed by the Principal Component Analysis (PCA) to find the best discriminant features to perform classification using support vector machine (SVM) algorithm for neonatal sleep and wake states using Fluke® facial video frames. Using pre-trained CNNs as feature extractor would hugely reduce the effort of collecting new neonatal data for training a neural network which could be computationally very expensive. The features are extracted after fully connected layers (FCL’s), where we compare several pre-trained CNNs, e.g., VGG16, VGG19, InceptionV3, GoogLeNet, ResNet, and AlexNet.Results: From around 2-h Fluke® video recording of seven neonate, we achieved a modest classification performance with an accuracy, sensitivity, and specificity of 65.3%, 69.8%, 61.0%, respectively with AlexNet using Fluke® (RGB) video frames. This indicates that using a pre-trained model as a feature extractor could not fully suffice for highly reliable sleep and wake classification in neonates. Therefore, in future a dedicated neural network trained on neonatal data or a transfer learning approach is required.


2021 ◽  
Vol 5 (3) ◽  
pp. 489-495
Author(s):  
Mohammad Farid Naufal ◽  
Sesilia Shania ◽  
Jessica Millenia ◽  
Stefan Axel ◽  
Juan Timothy Soebroto ◽  
...  

People who have hearing loss (deafness) or speech impairment (hearing impairment) usually use sign language to communicate. One of the most basic and flexible sign languages ​​is the Alphabet Sign Language to spell out the words you want to pronounce. Sign language uses hand, finger, and face movements to speak the user's thoughts. However, for alphabetical sign language, facial expressions are not used but only gestures or symbols formed using fingers and hands. In fact, there are still many people who don't understand the meaning of sign language. The use of image classification can help people more easily learn and translate sign language. Image classification accuracy is the main problem in this case. This research conducted a comparison of image classification algorithms, namely Convolutional Neural Network (CNN) and Multilayer Perceptron (MLP) to recognize American Sign Language (ASL) except the letters "J" and "Z" because movement is required for both. This is done to see the effect of the convolution and pooling stages on CNN on the resulting accuracy value and F1 Score in the ASL dataset. Based on the comparison, the use of CNN which begins with Gaussian Low Pass Filtering preprocessing gets the best accuracy of 96.93% and F1 Score 96.97%


2021 ◽  
Author(s):  
Matthew R Whiteway ◽  
Evan S Schaffer ◽  
Anqi Wu ◽  
E Kelly Buchanan ◽  
Omer F Onder ◽  
...  

A popular approach to quantifying animal behavior from video data is through discrete behavioral segmentation, wherein video frames are labeled as containing one or more behavior classes such as walking or grooming. Sequence models learn to map behavioral features extracted from video frames to discrete behaviors, and both supervised and unsupervised methods are common. However, each approach has its drawbacks: supervised models require a time-consuming annotation step where humans must hand label the desired behaviors; unsupervised models may fail to accurately segment particular behaviors of interest. We introduce a semi-supervised approach that addresses these challenges by constructing a sequence model loss function with (1) a standard supervised loss that classifies a sparse set of hand labels; (2) a weakly supervised loss that classifies a set of easy-to-compute heuristic labels; and (3) a self-supervised loss that predicts the evolution of the behavioral features. With this approach, we show that a large number of unlabeled frames can improve supervised segmentation in the regime of sparse hand labels and also show that a small number of hand labeled frames can increase the precision of unsupervised segmentation.


2018 ◽  
Vol 61 (9) ◽  
pp. 2196-2204
Author(s):  
S. Pravin Kumar ◽  
Jan G. Švec

Purpose Sound pressure level (SPL) and fundamental frequency (f o ) are very basic and important measures in the acoustical assessment of voice quality, and their variation influences also the vocal fold vibration characteristics. Most sophisticated laryngeal videostroboscopic systems therefore also measure and display the SPL and f o values directly over the video frames by means of a rather expensive special hardware setup. An alternative simple software-based method is presented here to obtain these measures as video subtitles. Method The software extracts acoustic data from the video recording, calculates the SPL and f o parameters, and saves their values in a separate subtitle file. To ensure the correct SPL values, the microphone signal is calibrated beforehand with a sound level meter. Results The new approach was tested on videokymographic recordings obtained laryngoscopically. The results of SPL and f o values calculated from the videokymographic recording, subtitles creation, and their display are presented. Conclusions This method is useful in integrating the acoustic measures with any kind of video recordings containing audio data when inbuilt hardware means are not available. However, calibration and other technical aspects related to data acquisition and synchronization described in this article should be properly taken care of during the recording.


Sign in / Sign up

Export Citation Format

Share Document