Session 11: Some Research Projects: Application of Automatic Speech Recognition to Parcel Sorting

The ‘state of the art’ in speech recognition is reviewed with particular reference to the kind of problems that are likely to arise in a parcel sorting environment. Speech recognition equipment developed by the authors is described. To justify a speech recognition equipment for parcel sorting it must be shown to increase productivity. Simulations relevant to voice control of parcel sorting have been carried out to try to assess what this improvement might be, and the results are discussed.

Download Full-text

Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems

2009 IEEE Workshop on Automatic Speech Recognition & Understanding ◽

10.1109/asru.2009.5373530 ◽

2009 ◽

Cited By ~ 2

Author(s):

Gerasimos Potamianos

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

The State ◽

Open Problems

Download Full-text

Computer-Assisted Interpreting Tools (CAI) and options for automation with Automatic Speech Recognition

Tradterm ◽

10.11606/issn.2317-9511.v32i0p9-31 ◽

2018 ◽

Vol 32 ◽

pp. 9-31

Author(s):

Luis Eduardo Schild Ortiz ◽

Patrizia Cavallo

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

New Technologies ◽

State Of The Art ◽

The State ◽

Current Level ◽

Computer Assisted ◽

Future Perspectives ◽

Potential Benefits

In recent years, several studies have indicated interpreters resist adopting new technologies. Yet, such technologies have enabled the development of several tools to help those professionals. In this paper, using bibliographical and documental research, we briefly analyse the tools cited by several authors to identify which ones remain up to date and available on the market. Following that, we present concepts about automation, and observe the usage of automatic speech recognition (ASR), while analysing its potential benefits and the current level of maturity of such an approach, especially regarding Computer-Assisted Interpreting (CAI) tools. The goal of this paper is to present the community of interpreters and researchers with a view of the state of the art in technology for interpreting as well as some future perspectives for this area.

Download Full-text

Syllable-Based Indonesian Automatic Speech Recognition

International Journal on Electrical Engineering and Informatics ◽

10.15676/ijeei.2020.12.4.2 ◽

2020 ◽

Vol 12 (4) ◽

pp. 720-728

Author(s):

Danny Henry Galatang ◽

◽

Suyanto Suyanto ◽

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

The State ◽

Speech Corpus ◽

Advanced Method ◽

Acoustic Models ◽

The Future ◽

End To End ◽

Better Than

The syllable-based automatic speech recognition (ASR) systems commonly perform better than the phoneme-based ones. This paper focuses on developing an Indonesian monosyllable-based ASR (MSASR) system using an ASR engine called SPRAAK and comparing it to a phoneme-based one. The Mozilla DeepSpeech-based end-to-end ASR (MDSE2EASR), one of the state-of-the-art models based on character (similar to the phoneme-based model), is also investigated to confirm the result. Besides, a novel Kaituoxu SpeechTransformer (KST) E2EASR is also examined. Testing on the Indonesian speech corpus of 5,439 words shows that the proposed MSASR produces much higher word accuracy (76.57%) than the monophone-based one (63.36%). Its performance is comparable to the character-based MDS-E2EASR, which produces 76.90%, and the character-based KST-E2EASR (78.00%). In the future, this monosyllable-based ASR is possible to be improved to the bisyllable-based one to give higher word accuracy. Nevertheless, extensive bisyllable acoustic models must be handled using an advanced method.

Download Full-text

Automatic Prosody Labelling and Assessment

The Oxford Handbook of Language Prosody ◽

10.1093/oxfordhb/9780198832232.013.43 ◽

2020 ◽

pp. 645-656

Author(s):

Andrew Rosenberg ◽

Mark Hasegawa-Johnson

Keyword(s):

Speech Recognition ◽

Best Practices ◽

Automatic Speech Recognition ◽

State Of The Art ◽

Linguistic Analysis ◽

The State ◽

Speech Understanding ◽

Front End ◽

Open Questions

Automatic prosody labelling is a useful front-end for automatic speech recognition, for automatic speech understanding, and for the development of corpora used to create speech synthesizers. Automatic labelling of prosody has also proven to be quite useful in the linguistic analysis of new speaking styles in a known language. This chapter provides a survey of the state-of-the-art best practices and open questions in the automatic labelling of prosodic information and its assessment. It describes the major prosodic inventories that are used in prosody labelling. It then discusses the relevance of acoustics and syntax in automatic labelling. A brief description of AuToBI, a tool that performs automatic ToBI labelling of US English, is provided. The chapter concludes by discussing methods of evaluating automatic prosody labelling.

Download Full-text

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00217-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Alexandru-Lucian Georgescu ◽

Alessandro Pappalardo ◽

Horia Cucu ◽

Michaela Blott

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

Decision Makers ◽

Computing Power ◽

Trade Off ◽

Speech Features ◽

Commercial Applications ◽

Guided Tour ◽

Embedded Applications

AbstractThe last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the raw waveform directly into words using one deep neural network (DNN). The transcription accuracy greatly increased, leading to ASR technology being integrated into many commercial applications. However, few of the existing ASR technologies are suitable for integration in embedded applications, due to their hard constrains related to computing power and memory usage. This overview paper serves as a guided tour through the recent literature on speech recognition and compares the most popular ASR implementations. The comparison emphasizes the trade-off between ASR performance and hardware requirements, to further serve decision makers in choosing the system which fits best their embedded application. To the best of our knowledge, this is the first study to provide this kind of trade-off analysis for state-of-the-art ASR systems.

Download Full-text

Evaluation of the effectiveness and efficiency of state-of-the-art features and models for automatic speech recognition error detection

Journal Of Big Data ◽

10.1186/s40537-020-00391-w ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Asmaa El Hannani ◽

Rahhal Errattahi ◽

Fatima Zahra Salmam ◽

Thomas Hain ◽

Hassan Ouahmane

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Error Detection ◽

State Of The Art ◽

Rapid Development ◽

Unified Framework ◽

Human Machine Interaction ◽

Detection Analysis ◽

Extensive Evaluation ◽

Effectiveness And Efficiency

AbstractSpeech based human-machine interaction and natural language understanding applications have seen a rapid development and wide adoption over the last few decades. This has led to a proliferation of studies that investigate Error detection and classification in Automatic Speech Recognition (ASR) systems. However, different data sets and evaluation protocols are used, making direct comparisons of the proposed approaches (e.g. features and models) difficult. In this paper we perform an extensive evaluation of the effectiveness and efficiency of state-of-the-art approaches in a unified framework for both errors detection and errors type classification. We make three primary contributions throughout this paper: (1) we have compared our Variant Recurrent Neural Network (V-RNN) model with three other state-of-the-art neural based models, and have shown that the V-RNN model is the most effective classifier for ASR error detection in term of accuracy and speed, (2) we have compared four features’ settings, corresponding to different categories of predictor features and have shown that the generic features are particularly suitable for real-time ASR error detection applications, and (3) we have looked at the post generalization ability of our error detection framework and performed a detailed post detection analysis in order to perceive the recognition errors that are difficult to detect.

Download Full-text

Ex-Vessel Coolability Analysis for a Belgian NPP

Volume 2: Structural Integrity; Safety and Security; Advanced Applications of Nuclear Technology; Balance of Plant for Nuclear Applications ◽

10.1115/icone17-75456 ◽

2009 ◽

Author(s):

Jarne R. Verpoorten ◽

Miche`le Auglaire ◽

Frank Bertels

Keyword(s):

State Of The Art ◽

The State ◽

Severe Accident ◽

Research Projects ◽

Specific Data ◽

The Core ◽

Core Damage ◽

Accident Management ◽

Lower Head ◽

Core Cooling

During a hypothetical Severe Accident (SA), core damage is to be expected due to insufficient core cooling. If the lack of core cooling persists, the degradation of the core can continue and could lead to the presence of corium in the lower plenum. There, the thermo-mechanical attack of the lower head by the corium could eventually lead to vessel failure and corium release to the reactor cavity pit. In this paper, it is described how the international state-of-the-art knowledge has been applied in combination with plant-specific data in order to obtain a custom Severe Accident Management (SAM) approach and hardware adaptations for existing NPPs. Also the interest of Tractebel Engineering in future SA research projects related to this topic will be addressed from the viewpoint of keeping the analysis up-to-date with the state-of-the art knowledge.

Download Full-text

Dictation and voice control. Automatic speech recognition in the marketplace

10.1049/ic:19980961 ◽

1998 ◽

Author(s):

R.K. Moore

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Voice Control

Download Full-text

Generating Robust Audio Adversarial Examples with Temporal Dependency

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/438 ◽

2020 ◽

Author(s):

Hongting Zhang ◽

Pan Zhou ◽

Qiben Yan ◽

Xiao-Yang Liu

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Defense Mechanisms ◽

User Study ◽

State Of The Art ◽

Temporal Structure ◽

Human Perception ◽

Experimental Results ◽

Low Intensity ◽

Adversarial Examples

Audio adversarial examples, imperceptible to humans, have been constructed to attack automatic speech recognition (ASR) systems. However, the adversarial examples generated by existing approaches usually incorporate noticeable noises, especially during the periods of silences and pauses. Moreover, the added noises often break temporal dependency property of the original audio, which can be easily detected by state-of-the-art defense mechanisms. In this paper, we propose a new Iterative Proportional Clipping (IPC) algorithm that preserves temporal dependency in audios for generating more robust adversarial examples. We are motivated by an observation that the temporal dependency in audios imposes a significant effect on human perception. Following our observation, we leverage a proportional clipping strategy to reduce noise during the low-intensity periods. Experimental results and user study both suggest that the generated adversarial examples can significantly reduce human-perceptible noises and resist the defenses based on the temporal structure.

Download Full-text

Classification of emotions from the recognition of facial expressions applied to the prevention of secondary Alexithymia: an analysis of the state of the art

Journal of Scientific and Technical Applications ◽

10.35429/jsta.2020.17.6.14.17 ◽

2020 ◽

pp. 14-17

Author(s):

Christian PADILLA-NAVARRO ◽

Carlos ZARATE-TREJO ◽

Georges KHALAF ◽

Pascal FALLAVOLLITA

Keyword(s):

Artificial Intelligence ◽

Facial Expressions ◽

State Of The Art ◽

The State ◽

Research Projects ◽

Emotional Facial Expressions ◽

Toronto Alexithymia Scale ◽

Recognition Of Facial Expressions

Alexithymia is a condition that partially or completely deprives you of the ability to identify and describe emotions, and to show affective connotations in the actions of an individual. This problem has been taken to different research projects that seek to study its characteristics, forms of prevention, and implications, and that try to determine a measurement for the experience of an individual with this construct as well as the responses they provide to certain stimuli. Other studies that were reviewed aimed to find a connection between the responses of subjects diagnosed with alexithymia when facing a dynamic of emotional facial expressions to recognize and their assigned grade based on the Toronto Alexithymia Scale (TAS), a metric frequently used to evaluate the presence or absence of alexithymia in an individual. In this work, a review of the different articles that study this connection, as well as articles that describe the state of the art of the implementation of artificial intelligence algorithms applied to the treatment or prevention of secondary alexithymia is presented.

Download Full-text