End-to-End Speech Recognition Model Based on Deep Learning for Albanian

Author(s):  
Amarildo Rista ◽  
Arbana Kadriu
Author(s):  
Ramy Mounir ◽  
Redwan Alqasemi ◽  
Rajiv Dubey

This work focuses on the research related to enabling individuals with speech impairment to use speech-to-text software to recognize and dictate their speech. Automatic Speech Recognition (ASR) tends to be a challenging problem for researchers because of the wide range of speech variability. Some of the variabilities include different accents, pronunciations, speeds, volumes, etc. It is very difficult to train an end-to-end speech recognition model on data with speech impediment due to the lack of large enough datasets, and the difficulty of generalizing a speech disorder pattern on all users with speech impediments. This work highlights the different techniques used in deep learning to achieve ASR and how it can be modified to recognize and dictate speech from individuals with speech impediments.


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Lejun Gong ◽  
Zhifei Zhang ◽  
Shiqi Chen

Background. Clinical named entity recognition is the basic task of mining electronic medical records text, which are with some challenges containing the language features of Chinese electronic medical records text with many compound entities, serious missing sentence components, and unclear entity boundary. Moreover, the corpus of Chinese electronic medical records is difficult to obtain. Methods. Aiming at these characteristics of Chinese electronic medical records, this study proposed a Chinese clinical entity recognition model based on deep learning pretraining. The model used word embedding from domain corpus and fine-tuning of entity recognition model pretrained by relevant corpus. Then BiLSTM and Transformer are, respectively, used as feature extractors to identify four types of clinical entities including diseases, symptoms, drugs, and operations from the text of Chinese electronic medical records. Results. 75.06% Macro-P, 76.40% Macro-R, and 75.72% Macro-F1 aiming at test dataset could be achieved. These experiments show that the Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition effect. Conclusions. These experiments show that the proposed Chinese clinical entity recognition model based on deep learning pretraining can effectively improve the recognition performance.


Author(s):  
Yun Jiang ◽  
Junyu Zhuo ◽  
Juan Zhang ◽  
Xiao Xiao

With the extensive attention and research of the scholars in deep learning, the convolutional restricted Boltzmann machine (CRBM) model based on restricted Boltzmann machine (RBM) is widely used in image recognition, speech recognition, etc. However, time consuming training still seems to be an unneglectable issue. To solve this problem, this paper mainly uses optimized parallel CRBM based on Spark, and proposes a parallel comparison divergence algorithm based on Spark and uses it to train the CRBM model to improve the training speed. The experiments show that the method is faster than traditional sequential algorithm. We train the CRBM with the method and apply it to breast X-ray image classification. The experiments show that it can improve the precision and the speed of training compared with traditional algorithm.


Electronics ◽  
2019 ◽  
Vol 8 (12) ◽  
pp. 1461 ◽  
Author(s):  
Taeheum Cho ◽  
Unang Sunarya ◽  
Minsoo Yeo ◽  
Bosun Hwang ◽  
Yong Seo Koo ◽  
...  

Sleep scoring is the first step for diagnosing sleep disorders. A variety of chronic diseases related to sleep disorders could be identified using sleep-state estimation. This paper presents an end-to-end deep learning architecture using wrist actigraphy, called Deep-ACTINet, for automatic sleep-wake detection using only noise canceled raw activity signals recorded during sleep and without a feature engineering method. As a benchmark test, the proposed Deep-ACTINet is compared with two conventional fixed model based sleep-wake scoring algorithms and four feature engineering based machine learning algorithms. The datasets were recorded from 10 subjects using three-axis accelerometer wristband sensors for eight hours in bed. The sleep recordings were analyzed using Deep-ACTINet and conventional approaches, and the suggested end-to-end deep learning model gained the highest accuracy of 89.65%, recall of 92.99%, and precision of 92.09% on average. These values were approximately 4.74% and 4.05% higher than those for the traditional model based and feature based machine learning algorithms, respectively. In addition, the neuron outputs of Deep-ACTINet contained the most significant information for separating the asleep and awake states, which was demonstrated by their high correlations with conventional significant features. Deep-ACTINet was designed to be a general model and thus has the potential to replace current actigraphy algorithms equipped in wristband wearable devices.


Sign in / Sign up

Export Citation Format

Share Document