Self-Supervised Speech Enhancement for Arabic Speech Recognition in Real-World Environments

Mobile speech recognition attracts much attention in the ubiquitous context, however, background noises, speech coding, and transmission errors are prone to corrupt the incoming speech. Therein, building a robust speech recognizer requires the availability of a large number of real-world speech samples. Arabic language, like many other languages, lacks such resources; to overcome this limitation, we propose a speech enhancement step, before the recognition begins. For the speech enhancement purpose, we suggest the use of a deep autoencoder (DAE) algorithm. A two-step procedure is suggested: in the first step, an overcomplete DAE is trained in an unsupervised way, and in the second one, a denoising DAE is trained in a supervised way leveraging the clean speech produced in the previous step. Experimental results performed on a real-life mobile database confirmed the potentials of the proposed approach and show a reduction of the WER (Word Error Rate) of a ubiquitous Arabic speech recognizer. Further experiments show an improvement of the perceptual evaluation of speech quality (PESQ), and the short-time objective intelligibility (STOI) as well.

Download Full-text

Neuropsychology in the Real World

Zeitschrift für Neuropsychologie ◽

10.1024/1016-264x/a000139 ◽

2014 ◽

Vol 25 (4) ◽

pp. 233-238 ◽

Cited By ~ 2

Author(s):

Martin Peper ◽

Simone N. Loeffler

Keyword(s):

Neuropsychological Assessment ◽

Real World ◽

Ecological Validity ◽

Real Life ◽

Emotional States ◽

Context Sensitive ◽

Traditional Assessment ◽

Life Data ◽

Real Life Data ◽

Assessment And Treatment

Current ambulatory technologies are highly relevant for neuropsychological assessment and treatment as they provide a gateway to real life data. Ambulatory assessment of cognitive complaints, skills and emotional states in natural contexts provides information that has a greater ecological validity than traditional assessment approaches. This issue presents an overview of current technological and methodological innovations, opportunities, problems and limitations of these methods designed for the context-sensitive measurement of cognitive, emotional and behavioral function. The usefulness of selected ambulatory approaches is demonstrated and their relevance for an ecologically valid neuropsychology is highlighted.

Download Full-text

Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition

10.21437/interspeech.2019-2032 ◽

2019 ◽

Author(s):

Shashi Kumar ◽

Shakti P. Rath

Keyword(s):

Speech Recognition ◽

Speech Enhancement ◽

Far Field

Download Full-text

A Cross-Entropy-Guided (CEG) Measure for Speech Enhancement Front-End Assessing Performances of Back-End Automatic Speech Recognition

10.21437/interspeech.2019-2511 ◽

2019 ◽

Author(s):

Li Chai ◽

Jun Du ◽

Chin-Hui Lee

Keyword(s):

Speech Recognition ◽

Speech Enhancement ◽

Automatic Speech Recognition ◽

Cross Entropy ◽

Front End

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

Dual Application of Speech Enhancement for Automatic Speech Recognition

2021 IEEE Spoken Language Technology Workshop (SLT) ◽

10.1109/slt48900.2021.9383624 ◽

2021 ◽

Author(s):

Ashutosh Pandey ◽

Chunxi Liu ◽

Yun Wang ◽

Yatharth Saraf

Keyword(s):

Speech Recognition ◽

Speech Enhancement ◽

Automatic Speech Recognition

Download Full-text

Amide-Type Substrates in the Synthesis of N-Protected 1-Aminomethylphosphonium Salts

Catalysts ◽

10.3390/catal11050552 ◽

2021 ◽

Vol 11 (5) ◽

pp. 552

Author(s):

Dominika Kozicka ◽

Paulina Zieleźny ◽

Karol Erfurt ◽

Jakub Adamek

Keyword(s):

Reaction Times ◽

The Other ◽

Hydroxyl Group ◽

Other Hand ◽

Step Procedure ◽

Higher Temperature ◽

Short Time ◽

Work Up

Herein we describe the development and optimization of a two-step procedure for the synthesis of N-protected 1-aminomethylphosphonium salts from imides, amides, carbamates, or lactams. Our “step-by-step” methodology involves the transformation of amide-type substrates to the corresponding hydroxymethyl derivatives, followed by the substitution of the hydroxyl group with a phosphonium moiety. The first step of the described synthesis was conducted based on well-known protocols for hydroxymethylation with formaldehyde or paraformaldehyde. In turn, the second (substitution) stage required optimization studies. In general, reactions of amide, carbamate, and lactam derivatives occurred at a temperature of 70 °C in a relatively short time (1 h). On the other hand, N-hydroxymethylimides reacted with triarylphosphonium salts at a much higher temperature (135 °C) and over longer reaction times (as much as 30 h). However, the proposed strategy is very efficient, especially when NaBr is used as a catalyst. Moreover, a simple work-up procedure involving only crystallization afforded good to excellent yields (up to 99%).

Download Full-text

Improving the performance of a radio-frequency localization system in adverse outdoor applications

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-021-02001-6 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Marcelo N. de Sousa ◽

Ricardo Sant’Ana ◽

Rigel P. Fernandes ◽

Julio Cesar Duarte ◽

José A. Apolinário ◽

...

Keyword(s):

Random Forest ◽

Ray Tracing ◽

Real World ◽

Practical Implication ◽

Real Life ◽

Simulated Data ◽

Real Data ◽

Gradient Boosting ◽

Real World Data ◽

Localization Accuracy

AbstractIn outdoor RF localization systems, particularly where line of sight can not be guaranteed or where multipath effects are severe, information about the terrain may improve the position estimate’s performance. Given the difficulties in obtaining real data, a ray-tracing fingerprint is a viable option. Nevertheless, although presenting good simulation results, the performance of systems trained with simulated features only suffer degradation when employed to process real-life data. This work intends to improve the localization accuracy when using ray-tracing fingerprints and a few field data obtained from an adverse environment where a large number of measurements is not an option. We employ a machine learning (ML) algorithm to explore the multipath information. We selected algorithms random forest and gradient boosting; both considered efficient tools in the literature. In a strict simulation scenario (simulated data for training, validating, and testing), we obtained the same good results found in the literature (error around 2 m). In a real-world system (simulated data for training, real data for validating and testing), both ML algorithms resulted in a mean positioning error around 100 ,m. We have also obtained experimental results for noisy (artificially added Gaussian noise) and mismatched (with a null subset of) features. From the simulations carried out in this work, our study revealed that enhancing the ML model with a few real-world data improves localization’s overall performance. From the machine ML algorithms employed herein, we also observed that, under noisy conditions, the random forest algorithm achieved a slightly better result than the gradient boosting algorithm. However, they achieved similar results in a mismatch experiment. This work’s practical implication is that multipath information, once rejected in old localization techniques, now represents a significant source of information whenever we have prior knowledge to train the ML algorithm.

Download Full-text

Robustness and Sensitivity Tuning of the Kalman Filter for Speech Enhancement

Signals ◽

10.3390/signals2030027 ◽

2021 ◽

Vol 2 (3) ◽

pp. 434-455

Author(s):

Sujan Kumar Roy ◽

Kuldip K. Paliwal

Keyword(s):

Kalman Filter ◽

Speech Enhancement ◽

Linear Prediction ◽

Real Life ◽

Model Parameters ◽

Noise Variance ◽

Noisy Speech ◽

Kalman Gain ◽

Whitening Filter ◽

Prediction Coefficient

Inaccurate estimates of the linear prediction coefficient (LPC) and noise variance introduce bias in Kalman filter (KF) gain and degrade speech enhancement performance. The existing methods propose a tuning of the biased Kalman gain, particularly in stationary noise conditions. This paper introduces a tuning of the KF gain for speech enhancement in real-life noise conditions. First, we estimate noise from each noisy speech frame using a speech presence probability (SPP) method to compute the noise variance. Then, we construct a whitening filter (with its coefficients computed from the estimated noise) to pre-whiten each noisy speech frame prior to computing the speech LPC parameters. We then construct the KF with the estimated parameters, where the robustness metric offsets the bias in KF gain during speech absence of noisy speech to that of the sensitivity metric during speech presence to achieve better noise reduction. The noise variance and the speech model parameters are adopted as a speech activity detector. The reduced-biased Kalman gain enables the KF to minimize the noise effect significantly, yielding the enhanced speech. Objective and subjective scores on the NOIZEUS corpus demonstrate that the enhanced speech produced by the proposed method exhibits higher quality and intelligibility than some benchmark methods.

Download Full-text

Dual-Mic Speech Enhancement Based on TF-GSC with Leakage Suppression and Signal Recovery

Applied Sciences ◽

10.3390/app11062816 ◽

2021 ◽

Vol 11 (6) ◽

pp. 2816

Author(s):

Hansol Kim ◽

Jong Won Shin

Keyword(s):

Speech Enhancement ◽

Wiener Filter ◽

Signal Recovery ◽

Gain Function ◽

Microphone Signal ◽

Perceptual Evaluation ◽

Blocking Matrix ◽

Adaptive Noise ◽

Adaptive Noise Canceller ◽

Sidelobe Canceller

The transfer function-generalized sidelobe canceller (TF-GSC) is one of the most popular structures for the adaptive beamformer used in multi-channel speech enhancement. Although the TF-GSC has shown decent performance, a certain amount of steering error is inevitable, which causes leakage of speech components through the blocking matrix (BM) and distortion in the fixed beamformer (FBF) output. In this paper, we propose to suppress the leaked signal in the output of the BM and restore the desired signal in the FBF output of the TF-GSC. To reduce the risk of attenuating speech in the adaptive noise canceller (ANC), the speech component in the output of the BM is suppressed by applying a gain function similar to the square-root Wiener filter, assuming that a certain portion of the desired speech should be leaked into the BM output. Additionally, we propose to restore the attenuated desired signal in the FBF output by adding some of the microphone signal components back, depending on how microphone signals are related to the FBF and BM outputs. The experimental results showed that the proposed TF-GSC outperformed conventional TF-GSC in terms of the perceptual evaluation of speech quality (PESQ) scores under various noise conditions and the direction of arrivals for the desired and interfering sources.

Download Full-text

Inferring Long-Term Demand of Newly Established Stations for Expansion Areas in Bike Sharing System

Applied Sciences ◽

10.3390/app11156748 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6748

Author(s):

Hsun-Ping Hsieh ◽

Fandel Lin ◽

Jiawei Jiang ◽

Tzu-Ying Kuo ◽

Yu-En Chang

Keyword(s):

New York ◽

Feature Extraction ◽

Real World ◽

Extraction Methods ◽

Real World Data ◽

Urban Dynamics ◽

Bike Sharing ◽

The Government ◽

Short Time

Research on flourishing public bike-sharing systems has been widely discussed in recent years. In these studies, many existing works focus on accurately predicting individual stations in a short time. This work, therefore, aims to predict long-term bike rental/drop-off demands at given bike station locations in the expansion areas. The real-world bike stations are mainly built-in batches for expansion areas. To address the problem, we propose LDA (Long-Term Demand Advisor), a framework to estimate the long-term characteristics of newly established stations. In LDA, several engineering strategies are proposed to extract discriminative and representative features for long-term demands. Moreover, for original and newly established stations, we propose several feature extraction methods and an algorithm to model the correlations between urban dynamics and long-term demands. Our work is the first to address the long-term demand of new stations, providing the government with a tool to pre-evaluate the bike flow of new stations before deployment; this can avoid wasting resources such as personnel expense or budget. We evaluate real-world data from New York City’s bike-sharing system, and show that our LDA framework outperforms baseline approaches.

Download Full-text