automatic speech recognition Latest Research Papers

Improving Deep Learning based Automatic Speech Recognition for Gujarati

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3483446 ◽

2022 ◽

Vol 21 (3) ◽

pp. 1-18

Author(s):

Deepang Raval ◽

Vyom Pathak ◽

Muktan Patel ◽

Brijesh Bhatt

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Short Term Memory ◽

Language Model ◽

Recognition System ◽

Processing Technique ◽

Speech Corpus ◽

Novel Approach ◽

Asr System

We present a novel approach for improving the performance of an End-to-End speech recognition system for the Gujarati language. We follow a deep learning-based approach that includes Convolutional Neural Network, Bi-directional Long Short Term Memory layers, Dense layers, and Connectionist Temporal Classification as a loss function. To improve the performance of the system with the limited size of the dataset, we present a combined language model (Word-level language Model and Character-level language model)-based prefix decoding technique and Bidirectional Encoder Representations from Transformers-based post-processing technique. To gain key insights from our Automatic Speech Recognition (ASR) system, we used the inferences from the system and proposed different analysis methods. These insights help us in understanding and improving the ASR system as well as provide intuition into the language used for the ASR system. We have trained the model on the Microsoft Speech Corpus, and we observe a 5.87% decrease in Word Error Rate (WER) with respect to base-model WER.

Download Full-text

A Computational Look at Oral History Archives

Journal on Computing and Cultural Heritage ◽

10.1145/3477605 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-16

Author(s):

Francisca Pessanha ◽

Almila Akdag Salah

Keyword(s):

Signal Processing ◽

Natural Language Processing ◽

Speech Recognition ◽

Natural Language ◽

Oral History ◽

Language Processing ◽

Automatic Speech Recognition ◽

Visual Cues ◽

Social Signal Processing ◽

Social Signal

Computational technologies have revolutionized the archival sciences field, prompting new approaches to process the extensive data in these collections. Automatic speech recognition and natural language processing create unique possibilities for analysis of oral history (OH) interviews, where otherwise the transcription and analysis of the full recording would be too time consuming. However, many oral historians note the loss of aural information when converting the speech into text, pointing out the relevance of subjective cues for a full understanding of the interviewee narrative. In this article, we explore various computational technologies for social signal processing and their potential application space in OH archives, as well as neighboring domains where qualitative studies is a frequently used method. We also highlight the latest developments in key technologies for multimedia archiving practices such as natural language processing and automatic speech recognition. We discuss the analysis of both visual (body language and facial expressions), and non-visual cues (paralinguistics, breathing, and heart rate), stating the specific challenges introduced by the characteristics of OH collections. We argue that applying social signal processing to OH archives will have a wider influence than solely OH practices, bringing benefits for various fields from humanities to computer sciences, as well as to archival sciences. Looking at human emotions and somatic reactions on extensive interview collections would give scholars from multiple fields the opportunity to focus on feelings, mood, culture, and subjective experiences expressed in these interviews on a larger scale.

Download Full-text

Adapting Automatic Speech Recognition to the Radiology Domain for a Less-Resourced Language: The Case of Latvian

Intelligent Sustainable Systems - Lecture Notes in Networks and Systems ◽

10.1007/978-981-16-6309-3_27 ◽

2022 ◽

pp. 267-276

Author(s):

Normunds Gruzitis ◽

Roberts Dargis ◽

Viesturs Julijs Lasmanis ◽

Ginta Garkaje ◽

Didzis Gosko

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition

Download Full-text

Managing Transcription Data for Automatic Speech Recognition with Elpis

10.7551/mitpress/12200.003.0041 ◽

2022 ◽

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Transcription Data

Download Full-text

Voice Versus Keyboard and Mouse for Text Creation on Arabic User Interfaces

The International Arab Journal of Information Technology ◽

10.34028/iajit/19/1/15 ◽

2022 ◽

Author(s):

Khalid Majrashi

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

User Interfaces ◽

Voice Recognition ◽

User Interaction ◽

Learning Curves ◽

User Performance ◽

Voice User

Voice User Interfaces (VUIs) are increasingly popular owing to improvements in automatic speech recognition. However, the understanding of user interaction with VUIs, particularly Arabic VUIs, remains limited. Hence, this research compared user performance, learnability, and satisfaction when using voice and keyboard-and-mouse input modalities for text creation on Arabic user interfaces. A Voice-enabled Email Interface (VEI) and a Traditional Email Interface (TEI) were developed. Forty participants attempted pre-prepared and self-generated message creation tasks using voice on the VEI, and the keyboard-and-mouse modal on the TEI. The results showed that participants were faster (by 1.76 to 2.67 minutes) in pre-prepared message creation using voice than using the keyboard and mouse. Participants were also faster (by 1.72 to 2.49 minutes) in self-generated message creation using voice than using the keyboard and mouse. Although the learning curves were more efficient with the VEI, more participants were satisfied with the TEI. With the VEI, participants reported problems, such as misrecognitions and misspellings, but were satisfied about the visibility of possible executable commands and about the overall accuracy of voice recognition.

Download Full-text

Automation in the Intralingual Subtitling Process

Journal of Audiovisual Translation ◽

10.47476/jat.v4i3.2021.197 ◽

2021 ◽

Vol 4 (3) ◽

Author(s):

Kaisa Vitikainen ◽

Maarit Koponen

Keyword(s):

Speech Recognition ◽

Language Learning ◽

Automatic Speech Recognition ◽

Hard Of Hearing ◽

Target Audience ◽

Limited Time ◽

Video Content ◽

Video Clips ◽

Short Video ◽

Fully Automatic

The demand for intralingual subtitles for television and video content is increasing. In Finland, major broadcasting companies are required to provide intralingual subtitles for all or a portion of their programming in Finnish and Swedish, excluding certain live events. To meet this need, technology could offer solutions in the form of automatic speech recognition and subtitle generation. Although fully automatic subtitles may not be of sufficient quality to be accepted by the target audience, they can be a useful tool for the subtitler. This article presents research conducted as part of the MeMAD project, where automatically generated subtitles for Finnish were tested in professional workflows with four subtitlers. We discuss observations regarding the effect of automation on productivity based on experiments where participants subtitled short video clips from scratch, by respeaking and by post-editing automatically generated subtitles, as well as the subtitlers’ experience based on feedback collected with questionnaires and interviews. Lay summary This article discusses how technology can help create subtitles for television programmes and videos. Subtitles in the same language as the content help the Deaf and the hard-of-hearing to access television programmes and videos. They are also useful for example for language learning or watching videos in noisy places. Demand for subtitles is growing and many countries also have laws that demand same-language subtitles. For example, major broadcasters in Finland must offer same-language subtitles for some programmes in Finnish and Swedish. However, broadcasters usually have limited time and money for subtitling. One useful tool could be speech recognition technology, which automatically converts speech to text. Subtitles made with speech recognition alone are not good enough yet, and need to be edited. We used speech recognition to automatically produce same-language subtitles in Finnish. Four professional subtitlers edited them to create subtitles for short videos. We measured the time and the number of keystrokes they needed for this task and compared whether this made subtitling faster. We also asked how the participants felt about using automatic subtitles in their work. This study shows that speech recognition can be a useful tool for subtitlers, but the quality and usability of technology are important.

Download Full-text

Automatic Speech Recognition in Noise for Parkinson's Disease: A Pilot Study

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.809321 ◽

2021 ◽

Vol 4 ◽

Author(s):

Alireza Goudarzi ◽

Gemma Moya-Galé

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Speech Recognition ◽

Pilot Study ◽

Automatic Speech Recognition ◽

Speech Intelligibility ◽

Real Life ◽

The Past ◽

Noisy Signals ◽

Speech Recognition In Noise

The sophistication of artificial intelligence (AI) technologies has significantly advanced in the past decade. However, the observed unpredictability and variability of AI behavior in noisy signals is still underexplored and represents a challenge when trying to generalize AI behavior to real-life environments, especially for people with a speech disorder, who already experience reduced speech intelligibility. In the context of developing assistive technology for people with Parkinson's disease using automatic speech recognition (ASR), this pilot study reports on the performance of Google Cloud speech-to-text technology with dysarthric and healthy speech in the presence of multi-talker babble noise at different intensity levels. Despite sensitivities and shortcomings, it is possible to control the performance of these systems with current tools in order to measure speech intelligibility in real-life conditions.

Download Full-text

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems

Computational Linguistics ◽

10.1162/coli_a_00430 ◽

2021 ◽

pp. 1-12

Author(s):

Manaal Faruqui ◽

Dilek Hakkani-Tür

Keyword(s):

Speech Recognition ◽

Natural Language ◽

Automatic Speech Recognition ◽

Daily Life ◽

Natural Language Understanding ◽

Speech Understanding ◽

Language Understanding ◽

Dialog Systems ◽

Research Areas ◽

The World

Abstract As more users across the world are interacting with dialog agents in their daily life, there is a need for better speech understanding that calls for renewed attention to the dynamics between research in automatic speech recognition (ASR) and natural language understanding (NLU). We briefly review these research areas and lay out the current relationship between them. In light of the observations we make in this paper, we argue that (1) NLU should be cognizant of the presence of ASR models being used upstream in a dialog system’s pipeline, (2) ASR should be able to learn from errors found in NLU, (3) there is a need for end-to-end datasets that provide semantic annotations on spoken input, (4) there should be stronger collaboration between ASR and NLU research communities.

Download Full-text

QVR: Quranic Verses Recitation Recognition System using PocketSphinx

Journal of Quranic Sciences and Research ◽

10.30880/jqsr.2021.02.02.004 ◽

2021 ◽

Vol 02 (02) ◽

Author(s):

Hasan Ali Gamal Al-Kaf ◽

◽

Muhammad Suhaimi Sulong ◽

Ariffuddin Joret ◽

Nuramin Fitri Aminuddin ◽

...

Keyword(s):

Automatic Speech Recognition ◽

Graphical User Interface ◽

Visual Basic ◽

Recognition System ◽

Training Data ◽

Word Error Rate ◽

Application System ◽

Testing Data ◽

Engine System ◽

User Friendly

The recitation of Quran verses according to the actual tajweed is obligatory and it must be accurate and precise in pronunciation. Hence, it should always be reviewed by an expert on the recitation of the Quran. Through the latest technology, this recitation review can be implemented through an application system and it is most appropriate in this current Covid-19 pandemic situation where system application online is deemed to be developed. In this empirical study, a recognition system so-called the Quranic Verse Recitation Recognition (QVR) system using PocketSphinx to convert the Quranic verse from Arabic sound to Roman text and determine the accuracy of reciters, has been developed. The Graphical User Interface (GUI) of the system with a user-friendly environment was designed using Microsoft Visual Basic 6 in an Ubuntu platform. A verse of surah al-Ikhlas has been chosen in this study and the data were collected by recording 855 audios as training data recorded by professional reciters. Another 105 audios were collected as testing data, to test the accuracy of the system. The results indicate that the system obtained a 100% accuracy with a 0.00% of word error rate (WER) for both training and testing data of the said audios via Quran Roman text. The system with automatic speech recognition (ASR) engine system demonstrates that it has been successfully designed and developed, and is significant to be extended further. Added, it will be improved with the addition of other Quran surahs.

Download Full-text

REALIZATION OF ONLINE SYSTEMS FOR AUTOMATIC SPEECH RECOGNITION

NEWS OF THE NATIONAL ACADEMY OF SCIENCES OF THE REPUBLIC OF KAZAKHSTAN ◽

10.32014/2021.2518-1726.103 ◽

2021 ◽

Vol 6 (340) ◽

pp. 66-72

Author(s):

О.Zh. Mamyrbayev ◽

D.O. Oralbekova ◽

K. Alimhan ◽

M. Othman ◽

B. Zhumazhanov

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Online Systems

Download Full-text

automatic speech recognition
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Deep Learning based Automatic Speech Recognition for Gujarati

A Computational Look at Oral History Archives

Adapting Automatic Speech Recognition to the Radiology Domain for a Less-Resourced Language: The Case of Latvian

Managing Transcription Data for Automatic Speech Recognition with Elpis

Voice Versus Keyboard and Mouse for Text Creation on Arabic User Interfaces

Automation in the Intralingual Subtitling Process

Automatic Speech Recognition in Noise for Parkinson's Disease: A Pilot Study

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems

QVR: Quranic Verses Recitation Recognition System using PocketSphinx

REALIZATION OF ONLINE SYSTEMS FOR AUTOMATIC SPEECH RECOGNITION

Export Citation Format

automatic speech recognitionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Improving Deep Learning based Automatic Speech Recognition for Gujarati

A Computational Look at Oral History Archives

Adapting Automatic Speech Recognition to the Radiology Domain for a Less-Resourced Language: The Case of Latvian

Managing Transcription Data for Automatic Speech Recognition with Elpis

Voice Versus Keyboard and Mouse for Text Creation on Arabic User Interfaces

Automation in the Intralingual Subtitling Process

Automatic Speech Recognition in Noise for Parkinson's Disease: A Pilot Study

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems

QVR: Quranic Verses Recitation Recognition System using PocketSphinx

REALIZATION OF ONLINE SYSTEMS FOR AUTOMATIC SPEECH RECOGNITION

automatic speech recognition
Recently Published Documents