Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications

Voice communication is the main channel to exchange information between pilots and Air-Traffic Controllers (ATCos). Recently, several projects have explored the employment of speech recognition technology to automatically extract spoken key information such as call signs, commands, and values, which can be used to reduce ATCos’ workload and increase performance and safety in Air-Traffic Control (ATC)-related activities. Nevertheless, the collection of ATC speech data is very demanding, expensive, and limited to the intrinsic speakers’ characteristics. As a solution, this paper presents ATCO2, a project that aims to develop a unique platform to collect, organize, and pre-process ATC data collected from air space. Initially, the data are gathered directly through publicly accessible radio frequency channels with VHF receivers and LiveATC, which can be considered as an “unlimited-source” of low-quality data. The ATCO2 project explores employing context information such as radar and air surveillance data (collected with ADS-B and Mode S) from the OpenSky Network (OSN) to correlate call signs automatically extracted from voice communication with those available from ADS-B channels, to eventually increase the overall call sign detection rates. More specifically, the timestamp and location of the spoken command (issued by the ATCo by voice) are extracted, and a query is sent to the OSN server to retrieve the call sign tags in ICAO format for the airplanes corresponding to the given area. Then, a word sequence provided by an automatic speech recognition system is fed into a Natural Language Processing (NLP) based module together with the set of call signs available from the ADS-B channels. The NLP module extracts the call sign, command, and command arguments from the spoken utterance.

Download Full-text

Automatic Communication Error Detection Using Speech Recognition and Linguistic Analysis for Proactive Control of Loss of Separation

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198120983004 ◽

2021 ◽

pp. 036119812098300

Author(s):

Zhe Sun ◽

Pingbo Tang

Keyword(s):

Speech Recognition ◽

Language Processing ◽

Error Detection ◽

Traffic Control ◽

Air Traffic ◽

Communication Processes ◽

Traffic Control System ◽

Highly Correlated ◽

Operational Processes ◽

Communication Errors

Losses of separation (LoS) are breaches of regulations that specify the minimum distance between aircraft in controlled airspace. Erroneous communications between air traffic controllers (ATCs) and pilots are leading contributors to LoS that result in elevated risk of fatal accidents. An air traffic control system that could identify communication errors promptly would, therefore, be advantageous. Establishing such a system requires a systematic characterization of communication errors to reveal how various communication arrangements and errors influence the development of LoS. Such know-how could guide the ATCs and pilots in identifying the parts of their communication processes and content that most influence the occurrence of LoS. Existing studies of LoS focus on simulation of aircraft operation processes with little quantitative analysis about how communication issues arise and result in elevated risks of LoS. This paper presents a method for supporting automatic communication error detection through integrated use of speech recognition, text analysis, and formal modeling of airport operational processes. The proposed method focuses on: identifying communication features to guide the detection of vulnerable communications; characterizing communication errors; and Bayesian Network modeling for predicting communication errors and LoS using the features derived from ATC–pilot communications. Major findings show that incorrect read-backs by pilots are highly correlated with a majority of LoS. Results indicate the proposed method could form a basis for automating communication error detection and preventing LoS. The integrated Automatic Speech Recognition and Natural Language Processing functions may be incorporated into existing aviation applications for real-time ATC–pilot communication monitoring and preventive LoS control.

Download Full-text

The Airbus Air Traffic Control Speech Recognition 2018 Challenge: Towards ATC Automatic Transcription and Call Sign Detection

10.21437/interspeech.2019-1962 ◽

2019 ◽

Author(s):

Thomas Pellegrini ◽

Jérôme Farinas ◽

Estelle Delpech ◽

François Lancelot

Keyword(s):

Speech Recognition ◽

Traffic Control ◽

Air Traffic Control ◽

Air Traffic ◽

Sign Detection ◽

Automatic Transcription

Download Full-text

Acceptability of collecting speech samples from the elderly via the telephone

Digital Health ◽

10.1177/20552076211002103 ◽

2021 ◽

Vol 7 ◽

pp. 205520762110021

Author(s):

Catherine Diaz-Asper ◽

Chelsea Chandler ◽

R Scott Turner ◽

Brigid Reynolds ◽

Brita Elvevåg

Keyword(s):

Natural Language Processing ◽

Speech Recognition ◽

Natural Language ◽

Language Processing ◽

Telephone Interview ◽

Screening Tools ◽

Quality Data ◽

Older Individuals ◽

Speech Recognition Software ◽

Automated Speech Recognition

Objective There is a critical need to develop rapid, inexpensive and easily accessible screening tools for mild cognitive impairment (MCI) and Alzheimer’s disease (AD). We report on the efficacy of collecting speech via the telephone to subsequently develop sensitive metrics that may be used as potential biomarkers by leveraging natural language processing methods. Methods Ninety-one older individuals who were cognitively unimpaired or diagnosed with MCI or AD participated from home in an audio-recorded telephone interview, which included a standard cognitive screening tool, and the collection of speech samples. In this paper we address six questions of interest: (1) Will elderly people agree to participate in a recorded telephone interview? (2) Will they complete it? (3) Will they judge it an acceptable approach? (4) Will the speech that is collected over the telephone be of a good quality? (5) Will the speech be intelligible to human raters? (6) Will transcriptions produced by automated speech recognition accurately reflect the speech produced? Results Participants readily agreed to participate in the telephone interview, completed it in its entirety, and rated the approach as acceptable. Good quality speech was produced for further analyses to be applied, and almost all recorded words were intelligible for human transcription. Not surprisingly, human transcription outperformed off the shelf automated speech recognition software, but further investigation into automated speech recognition shows promise for its usability in future work. Conclusion Our findings demonstrate that collecting speech samples from elderly individuals via the telephone is well tolerated, practical, and inexpensive, and produces good quality data for uses such as natural language processing.

Download Full-text

Automatic Speech Recognition for Air Traffic Control Communications

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211036359 ◽

2021 ◽

pp. 036119812110363

Author(s):

Sandeep Badrinath ◽

Hamsa Balakrishnan

Keyword(s):

Speech Recognition ◽

Language Processing ◽

Automatic Speech Recognition ◽

Traffic Control ◽

Air Traffic Control ◽

Air Traffic ◽

Machine Learning Techniques ◽

Air Traffic Controller ◽

Learning Techniques ◽

Operational Information

A significant fraction of communications between air traffic controllers and pilots is through speech, via radio channels. Automatic transcription of air traffic control (ATC) communications has the potential to improve system safety, operational performance, and conformance monitoring, and to enhance air traffic controller training. We present an automatic speech recognition model tailored to the ATC domain that can transcribe ATC voice to text. The transcribed text is used to extract operational information such as call-sign and runway number. The models are based on recent improvements in machine learning techniques for speech recognition and natural language processing. We evaluate the performance of the model on diverse datasets.

Download Full-text

Hierarchical Phoneme Classification for Improved Speech Recognition

Applied Sciences ◽

10.3390/app11010428 ◽

2021 ◽

Vol 11 (1) ◽

pp. 428

Author(s):

Donghoon Oh ◽

Jeong-Sik Park ◽

Ji-Hwan Kim ◽

Gil-Jin Jang

Keyword(s):

Speech Recognition ◽

Language Processing ◽

Confusion Matrix ◽

Critical Factor ◽

Recognition System ◽

Classification Performance ◽

Language Models ◽

Successful Implementation ◽

Phoneme Classification ◽

Improved Performance

Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.

Download Full-text