scholarly journals Automatic speech recognition: can you understand me?

Author(s):  
Susana Pérez Castillejo

What is it? Automatic Speech Recognition (ASR) is a digital communication method that transforms spoken discourse into written text. This rapidly evolving technology is used in email, text messaging, or live video captioning. Current ASR systems operate in conjunction with Natural Language Processing (NLP) technology to transform speech into text that people – and machines – can read. NLP refers to the methodologies and computational tools that analyze data produced in a natural language, such as English.

Author(s):  
Gregor Donaj ◽  
Mirjam Sepesy Maučec

This article presents the challenges of natural language processing applications when they are used with inflectional languages. Two typical applications are presented: automatic speech recognition and machine translation. An overview of those applications and the properties of inflectional languages is given as well as examples from the highly inflectional Slovene language. Then, an error classification with examples is given, also with an emphasis on inflectional languages, as well as some directions for further research in this area.


2022 ◽  
Vol 15 (1) ◽  
pp. 1-16
Author(s):  
Francisca Pessanha ◽  
Almila Akdag Salah

Computational technologies have revolutionized the archival sciences field, prompting new approaches to process the extensive data in these collections. Automatic speech recognition and natural language processing create unique possibilities for analysis of oral history (OH) interviews, where otherwise the transcription and analysis of the full recording would be too time consuming. However, many oral historians note the loss of aural information when converting the speech into text, pointing out the relevance of subjective cues for a full understanding of the interviewee narrative. In this article, we explore various computational technologies for social signal processing and their potential application space in OH archives, as well as neighboring domains where qualitative studies is a frequently used method. We also highlight the latest developments in key technologies for multimedia archiving practices such as natural language processing and automatic speech recognition. We discuss the analysis of both visual (body language and facial expressions), and non-visual cues (paralinguistics, breathing, and heart rate), stating the specific challenges introduced by the characteristics of OH collections. We argue that applying social signal processing to OH archives will have a wider influence than solely OH practices, bringing benefits for various fields from humanities to computer sciences, as well as to archival sciences. Looking at human emotions and somatic reactions on extensive interview collections would give scholars from multiple fields the opportunity to focus on feelings, mood, culture, and subjective experiences expressed in these interviews on a larger scale.


Author(s):  
Fredrik Johansson ◽  
Lisa Kaati ◽  
Magnus Sahlgren

The ability to disseminate information instantaneously over vast geographical regions makes the Internet a key facilitator in the radicalisation process and preparations for terrorist attacks. This can be both an asset and a challenge for security agencies. One of the main challenges for security agencies is the sheer amount of information available on the Internet. It is impossible for human analysts to read through everything that is written online. In this chapter we will discuss the possibility of detecting violent extremism by identifying signs of warning behaviours in written text – what we call linguistic markers – using computers, or more specifically, natural language processing.


2020 ◽  
Vol 17 (1) ◽  
pp. 488-491
Author(s):  
P. Lakshmi ◽  
S. Veena ◽  
D. K. Rahul ◽  
H. Lokesha

This paper focuses on the development of the speech interface for controlling a Micro Air Vehicle (MAV). A speech interface in such control applications will have two distinct modules. One is the Automatic Speech Recognition (ASR) module and the other is the Natural Language Processing (NLP) module. The ASR is developed using the models built using CMU Sphinx toolkit. The NLP scheme is proposed and developed using Natural Language Toolkit (NLTK). Understanding of the speech is very important in such kind of control applications. The NLP outcome is used to invoke the Ground Control Station (GCS) commands. The results are validated in a Flight Gear simulator using Mission Planner GCS configured for MAV.


Author(s):  
Oksana Chulanova

The article discusses the capabilities of artificial intelligence technologies - technologies based on the use of artificial intelligence, including natural language processing, intellectual decision support, computer vision, speech recognition and synthesis, and promising methods of artificial intelligence. The results of the author's study and the analysis of artificial intelligence technologies and their capabilities for optimizing work with staff are presented. A study conducted by the author allowed us to develop an author's concept of integrating artificial intelligence technologies into work with personnel in the digital paradigm.


Sign in / Sign up

Export Citation Format

Share Document