connectionist temporal classification
Recently Published Documents


TOTAL DOCUMENTS

51
(FIVE YEARS 29)

H-INDEX

7
(FIVE YEARS 3)

2021 ◽  
Vol 11 (19) ◽  
pp. 9106
Author(s):  
Zheying Huang ◽  
Pei Wang ◽  
Jian Wang ◽  
Haoran Miao ◽  
Ji Xu ◽  
...  

A Recurrent Neural Networks (RNN) based attention model has been used in code-switching speech recognition (CSSR). However, due to the sequential computation constraint of RNN, there are stronger short-range dependencies and weaker long-range dependencies, which makes it hard to immediately switch languages in CSSR. Firstly, to deal with this problem, we introduce the CTC-Transformer, relying entirely on a self-attention mechanism to draw global dependencies and adopting connectionist temporal classification (CTC) as an auxiliary task for better convergence. Secondly, we proposed two multi-task learning recipes, where a language identification (LID) auxiliary task is learned in addition to the CTC-Transformer automatic speech recognition (ASR) task. Thirdly, we study a decoding strategy to combine the LID into an ASR task. Experiments on the SEAME corpus demonstrate the effects of the proposed methods, achieving a mixed error rate (MER) of 30.95%. It obtains up to 19.35% relative MER reduction compared to the baseline RNN-based CTC-Attention system, and 8.86% relative MER reduction compared to the baseline CTC-Transformer system.


2021 ◽  
Vol 36 (1) ◽  
pp. 650-656
Author(s):  
M. Pranathi Sai Prathyusha ◽  
Dr.K. Malathi

Aim: Recognizing the Handwritten Digits to find the best accuracy using Machine learning methods such as Connectionist Temporal Classification (CTC) and Convolutional Neural Network (CNN). Methods and Materials: Accuracy and loss are performed with the MNIST dataset from the Keras library. The two groups Connectionist Temporal classification (N=20) and Convolutional Neural Network algorithms (N=20). Results: A CNN is used for recognizing the innovative handwritten digits. The accuracy is analysed based on correctness of the exact digits of 92.67% where the CTC has the accuracy of 89.07%. The two algorithms CNN and CTC are statistically satisfied with the independent sample T-Test (=.001) value (p<0.05) with confidence level of 95%. Conclusion: Recognizing the handwritten digits significantly seems to be better in CNN than CTC.


2021 ◽  
Vol 11 (11) ◽  
pp. 4954
Author(s):  
Lin Wang ◽  
Xingfu Wang ◽  
Ammar Hawbani ◽  
Yan Xiong

We solve the problem of how to densely align actions in videos at frame level, with only the order of occurring actions available, in order to save the time-consuming efforts to accurately annotate the temporal boundaries of each action. We propose three task-specific innovations under this scenario: (1) To encode fine-grained spatiotemporal local features and long-range temporal patterns simultaneously, we test three popular backbones and compare their accuracy and training times: (i) a recurrent LSTM; (ii) a fully convolutional model; and (iii) the recently proposed Transformer model. (2) To address the absence of ground truth frame-by-frame labels during training, we apply connectionist temporal classification (CTC) on top of the temporal encoder to recursively collect all theoretically valid alignments, and further weight these alignments with frame-wise visual similarities, in order to avoid a significant number of degenerated paths and improve both recognition accuracy and computation efficiency. (3) To quantitatively assess the quality of the learned alignment, we apply a comprehensive set of frame-level, segment-level, and video-level evaluation measurements. Extensive evaluations verify the effectiveness of our proposal, with performance comparable to that of fully supervised approaches across four benchmarks of different difficulty and data scale.


2021 ◽  
Author(s):  
Rehaan Sajjad Arai ◽  
Skanda Shanubog A ◽  
Rithik Jain ◽  
Pushkar Kumar ◽  
Krupashankari Sandyal

Offline Handwritten Text Recognition (HTR) is one of the most interesting challenges in today's date in the field of Image processing. This paper introduces a novel technique to recognize the handwritten text by using Convolutional Recurrent Neural Network along with Connectionist Temporal Classification. This model makes use of the IAM dataset. Offline Signature Verification (SV) is another challenging task in the field of biometrics. This paper demonstrates a novel technique to verify the signature as an original or forged one, and makes use of the Convolutional Siamese network.


2021 ◽  
Author(s):  
Rehaan Sajjad Arai ◽  
Skanda Shanubog A ◽  
Rithik Jain ◽  
Pushkar Kumar ◽  
Krupashankari Sandyal

Offline Handwritten Text Recognition (HTR) is one of the most interesting challenges in today's date in the field of Image processing. This paper introduces a novel technique to recognize the handwritten text by using Convolutional Recurrent Neural Network along with Connectionist Temporal Classification. This model makes use of the IAM dataset. Offline Signature Verification (SV) is another challenging task in the field of biometrics. This paper demonstrates a novel technique to verify the signature as an original or forged one, and makes use of the Convolutional Siamese network.


Information ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 62 ◽  
Author(s):  
Eshete Derb Emiru ◽  
Shengwu Xiong ◽  
Yaxing Li ◽  
Awet Fesseha ◽  
Moussa Diallo

Out-of-vocabulary (OOV) words are the most challenging problem in automatic speech recognition (ASR), especially for morphologically rich languages. Most end-to-end speech recognition systems are performed at word and character levels of a language. Amharic is a poorly resourced but morphologically rich language. This paper proposes hybrid connectionist temporal classification with attention end-to-end architecture and a syllabification algorithm for Amharic automatic speech recognition system (AASR) using its phoneme-based subword units. This algorithm helps to insert the epithetic vowel እ[ɨ], which is not included in our Grapheme-to-Phoneme (G2P) conversion algorithm developed using consonant–vowel (CV) representations of Amharic graphemes. The proposed end-to-end model was trained in various Amharic subwords, namely characters, phonemes, character-based subwords, and phoneme-based subwords generated by the byte-pair-encoding (BPE) segmentation algorithm. Experimental results showed that context-dependent phoneme-based subwords tend to result in more accurate speech recognition systems than the character-based, phoneme-based, and character-based subword counterparts. Further improvement was also obtained in proposed phoneme-based subwords with the syllabification algorithm and SpecAugment data augmentation technique. The word error rate (WER) reduction was 18.38% compared to character-based acoustic modeling with the word-based recurrent neural network language modeling (RNNLM) baseline. These phoneme-based subword models are also useful to improve machine and speech translation tasks.


Sign in / Sign up

Export Citation Format

Share Document