Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognition

Estimation and tracking of fundamental, 2nd and 3d harmonic frequencies for spectrogram normalization in speech recognitionA stable and accurate estimation of the fundamental frequency (pitch,F0) is an important requirement in speech and music signal analysis, in tasks like automatic speech recognition and extraction of target signal in noisy environment. In this paper, we propose a pitch-related spectrogram normalization scheme to improve the speaker - independency of standard speech features. A very accurate estimation of the fundamental frequency is a must. Hence, we develop a non-parametric recursive estimation method ofF0 and its 2nd and 3d harmonic frequencies in noisy circumstances. The proposed method is different from typical Kalman and particle filter methods in the way that no particular sum of sinusoidal model is used. Also we tend to estimate F0 and its lower harmonics by using novel likelihood function. Through experiments under various noise levels, the proposed method is proved to be more accurate than other conventional methods. The spectrogram normalization scheme makes a mapping of real harmonic structure to a normalized structure. Results obtained for voiced phonemes show an increase in stability of the standard speech features - the average within-phoneme distance of the MFCC features for voiced phonemes can be decreased by several percent.

Download Full-text

Performance vs. hardware requirements in state-of-the-art automatic speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00217-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Alexandru-Lucian Georgescu ◽

Alessandro Pappalardo ◽

Horia Cucu ◽

Michaela Blott

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

State Of The Art ◽

Decision Makers ◽

Computing Power ◽

Trade Off ◽

Speech Features ◽

Commercial Applications ◽

Guided Tour ◽

Embedded Applications

AbstractThe last decade brought significant advances in automatic speech recognition (ASR) thanks to the evolution of deep learning methods. ASR systems evolved from pipeline-based systems, that modeled hand-crafted speech features with probabilistic frameworks and generated phone posteriors, to end-to-end (E2E) systems, that translate the raw waveform directly into words using one deep neural network (DNN). The transcription accuracy greatly increased, leading to ASR technology being integrated into many commercial applications. However, few of the existing ASR technologies are suitable for integration in embedded applications, due to their hard constrains related to computing power and memory usage. This overview paper serves as a guided tour through the recent literature on speech recognition and compares the most popular ASR implementations. The comparison emphasizes the trade-off between ASR performance and hardware requirements, to further serve decision makers in choosing the system which fits best their embedded application. To the best of our knowledge, this is the first study to provide this kind of trade-off analysis for state-of-the-art ASR systems.

Download Full-text

Research on a novel data-driven aging estimation method for battery systems in real-world electric vehicles

Advances in Mechanical Engineering ◽

10.1177/16878140211027735 ◽

2021 ◽

Vol 13 (7) ◽

pp. 168781402110277

Author(s):

Yankai Hou ◽

Zhaosheng Zhang ◽

Peng Liu ◽

Chunbao Song ◽

Zhenpo Wang

Keyword(s):

Electric Vehicles ◽

Real World ◽

Regression Models ◽

Estimation Method ◽

Recursive Least Squares ◽

Data Driven ◽

Accurate Estimation ◽

Support Vector ◽

Battery Degradation ◽

Operational Data

Accurate estimation of the degree of battery aging is essential to ensure safe operation of electric vehicles. In this paper, using real-world vehicles and their operational data, a battery aging estimation method is proposed based on a dual-polarization equivalent circuit (DPEC) model and multiple data-driven models. The DPEC model and the forgetting factor recursive least-squares method are used to determine the battery system’s ohmic internal resistance, with outliers being filtered using boxplots. Furthermore, eight common data-driven models are used to describe the relationship between battery degradation and the factors influencing this degradation, and these models are analyzed and compared in terms of both estimation accuracy and computational requirements. The results show that the gradient descent tree regression, XGBoost regression, and light GBM regression models are more accurate than the other methods, with root mean square errors of less than 6.9 mΩ. The AdaBoost and random forest regression models are regarded as alternative groups because of their relative instability. The linear regression, support vector machine regression, and k-nearest neighbor regression models are not recommended because of poor accuracy or excessively high computational requirements. This work can serve as a reference for subsequent battery degradation studies based on real-time operational data.

Download Full-text

Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2006.876776 ◽

2007 ◽

Vol 15 (1) ◽

pp. 224-234 ◽

Cited By ~ 31

Author(s):

Satya Dharanipragada ◽

Umit H. Yapanel ◽

Bhaskar D. Rao

Keyword(s):

Feature Extraction ◽

Speech Recognition ◽

Estimation Method ◽

Spectrum Estimation ◽

Continuous Speech ◽

Continuous Speech Recognition ◽

Robust Feature Extraction

Download Full-text

Robust Directional Angle Estimation of Underwater Acoustic Sources Using a Marine Vehicle

Sensors ◽

10.3390/s18093062 ◽

2018 ◽

Vol 18 (9) ◽

pp. 3062 ◽

Cited By ~ 3

Author(s):

Jinwoo Choi ◽

Jeonghong Park ◽

Yoongeon Lee ◽

Jongdae Jung ◽

Hyun-Taek Choi

Keyword(s):

Source Localization ◽

Angular Displacement ◽

Estimation Method ◽

Time Delay Estimation ◽

Accurate Estimation ◽

Acoustic Source ◽

Angle Estimation ◽

Underwater Acoustic ◽

Acoustic Sources ◽

Marine Vehicle

Acoustic source localization is used in many underwater applications. Acquiring an accurate directional angle for an acoustic source is crucial for source localization. To achieve this purpose, this paper presents a method for directional angle estimation of underwater acoustic sources using a marine vehicle. It is assumed that the vehicle is equipped with two hydrophones and that the acoustic source transmits a specific signal repeatedly. The proposed method provides a probabilistic model for time delay estimation. The probability is recursively updated by prediction and update steps. The prediction step performs a probability transition using the angular displacement of the marine vehicle. The predicted probability is updated using a generalized cross correlation function with a verification process using entropy measurement. The proposed method can provide a reliable and accurate estimation of the directional angles of underwater acoustic sources. Experimental results demonstrate good performance of the proposed probabilistic directional angle estimation method in both an inland water environment and a harbor environment.

Download Full-text

Noise masking method based on an effective ratio mask estimation in Gammatone channels

APSIPA Transactions on Signal and Information Processing ◽

10.1017/atsip.2018.7 ◽

2018 ◽

Vol 7 ◽

Cited By ~ 1

Author(s):

Feng Bao ◽

Waleed H. Abdulla

Keyword(s):

Signal To Noise Ratio ◽

Estimation Method ◽

Wiener Filter ◽

Power Spectra ◽

Auditory Scene Analysis ◽

Accurate Estimation ◽

Noise Power ◽

Noise Masking ◽

Time Frequency ◽

Mask Estimation

In computational auditory scene analysis, the accurate estimation of binary mask or ratio mask plays a key role in noise masking. An inaccurate estimation often leads to some artifacts and temporal discontinuity in the synthesized speech. To overcome this problem, we propose a new ratio mask estimation method in terms of Wiener filtering in each Gammatone channel. In the reconstruction of Wiener filter, we utilize the relationship of the speech and noise power spectra in each Gammatone channel to build the objective function for the convex optimization of speech power. To improve the accuracy of estimation, the estimated ratio mask is further modified based on its adjacent time–frequency units, and then smoothed by interpolating with the estimated binary masks. The objective tests including the signal-to-noise ratio improvement, spectral distortion and intelligibility, and subjective listening test demonstrate the superiority of the proposed method compared with the reference methods.

Download Full-text

Noise robust speech recognition using Gaussian basis functions for non-linear likelihood function approximation

IEEE International Conference on Acoustics Speech and Signal Processing ◽

10.1109/icassp.2002.5743740 ◽

2002 ◽

Author(s):

Chris Pal ◽

Brendan Frey ◽

Trausti Kristjansson

Keyword(s):

Speech Recognition ◽

Function Approximation ◽

Likelihood Function ◽

Basis Functions ◽

Robust Speech Recognition ◽

Non Linear ◽

Noise Robust Speech Recognition ◽

Gaussian Basis Functions ◽

Noise Robust

Download Full-text

A Study on Sustainable Consumption of Fuel—An Estimation Method of Aircraft

Energies ◽

10.3390/en14227559 ◽

2021 ◽

Vol 14 (22) ◽

pp. 7559

Author(s):

Lisha Li ◽

Shuming Yuan ◽

Yue Teng ◽

Jing Shao

Keyword(s):

Neural Network ◽

Fuel Consumption ◽

Influencing Factors ◽

Carbon Emission ◽

Estimation Method ◽

Civil Aviation ◽

Accurate Estimation ◽

Low Carbon ◽

Fuel Price ◽

Aircraft Fuel

Though the development of China’s civil aviation and the improvement of control ability have strengthened the safety operation and support ability effectively, the airlines are under the pressure of operation costs due to the increase of aircraft fuel price. With the development of optimization controlling methods in flight management systems, it becomes increasingly challenging to cut down flight fuel consumption by control the flight status of the aircraft. Therefore, the airlines both at home and abroad mainly rely on the accurate estimation of aircraft fuel to reduce fuel consumption, and further reduce its carbon emission. The airlines have to take various potential factors into consideration and load more fuel to cope with possible negative situation during the flight. Therefore, the fuel for emergency use is called PBCF (Performance-Based Contingency Fuel). The existing PBCF forecasting method used by China Airlines is not accurate, which fails to take into account various influencing factors. This paper aims to find a method that could predict PBCF more accurately than the existing methods for China Airlines.This paper takes China Eastern Airlines as an example. The experimental data of flight fuel of China Eastern Airlines Co, Ltd. were collected to find out the relevant parameters affecting the fuel consumption, which is followed by the establishment of the LSTM neural network through the parameters and collected data. Finally, through the established neural network model, the PBCF addition required by the airline with different influencing factors is output. It can be seen from the results that the all the four models are available for the accurate prediction of fuel consumption. The amount of data of A319 is much larger than that of A320 and A330, which leads to higher accuracy of the model trained by A319. The study contributes to the calculation methods in the fuel-saving project, and helps the practitioners to learn about a particular fuel calculation method. The study brought insights for practitioners to achieve the goal of low carbon emission and further contributed to their progress towards circular economy.

Download Full-text

A New High Accurate Estimation Method for Evaluating the Daily Solar Energy by Nested Percentiles Algorithm

Asian Journal of Scientific Research ◽

10.3923/ajsr.2019.480.487 ◽

2019 ◽

Vol 12 (4) ◽

pp. 480-487

Author(s):

Mohammed Mohammed E ◽

Doaa Abd El-Shafi Abd El-Rah

Keyword(s):

Solar Energy ◽

Estimation Method ◽

Accurate Estimation

Download Full-text

Saliency of Vowel Features in Neural Responses of Cochlear Implant Users

Clinical EEG and Neuroscience ◽

10.1177/1550059418770051 ◽

2018 ◽

Vol 49 (6) ◽

pp. 388-397

Author(s):

François Prévost ◽

Alexandre Lehmann

Keyword(s):

Speech Recognition ◽

Cochlear Implant ◽

Fundamental Frequency ◽

Mismatch Negativity ◽

Free Field ◽

Event Related Potentials ◽

Oddball Paradigm ◽

Aural Rehabilitation ◽

Spectral Components ◽

Related Potentials

Cochlear implants restore hearing in deaf individuals, but speech perception remains challenging. Poor discrimination of spectral components is thought to account for limitations of speech recognition in cochlear implant users. We investigated how combined variations of spectral components along two orthogonal dimensions can maximize neural discrimination between two vowels, as measured by mismatch negativity. Adult cochlear implant users and matched normal-hearing listeners underwent electroencephalographic event-related potentials recordings in an optimum-1 oddball paradigm. A standard /a/ vowel was delivered in an acoustic free field along with stimuli having a deviant fundamental frequency (+3 and +6 semitones), a deviant first formant making it a /i/ vowel or combined deviant fundamental frequency and first formant (+3 and +6 semitones /i/ vowels). Speech recognition was assessed with a word repetition task. An analysis of variance between both amplitude and latency of mismatch negativity elicited by each deviant vowel was performed. The strength of correlations between these parameters of mismatch negativity and speech recognition as well as participants’ age was assessed. Amplitude of mismatch negativity was weaker in cochlear implant users but was maximized by variations of vowels’ first formant. Latency of mismatch negativity was later in cochlear implant users and was particularly extended by variations of the fundamental frequency. Speech recognition correlated with parameters of mismatch negativity elicited by the specific variation of the first formant. This nonlinear effect of acoustic parameters on neural discrimination of vowels has implications for implant processor programming and aural rehabilitation.

Download Full-text

A Study on Estimation Model of Incidence Factor of the Thermal Bridge Using In-Situ Measurement Infrared Thermography

ASME 2021 15th International Conference on Energy Sustainability ◽

10.1115/es2021-63750 ◽

2021 ◽

Author(s):

Eunho Kang ◽

Hyomoon Lee ◽

Dongsu Kim ◽

Jongho Yoon

Keyword(s):

Infrared Thermography ◽

Coefficient Of Variation ◽

Performance Indicator ◽

Likelihood Function ◽

Estimation Method ◽

Estimation Model ◽

Thermal Bridge ◽

Convergence Results ◽

Bridge Performance

Abstract Practical thermal bridge performance indicators (ITBs) of existing buildings may differ from calculated thermal bridge performance derived theoretically due to actual construction conditions, such as effect of irregular shapes and aging. To fill this gap in a practical manner, more realistic quantitative evaluation of thermal bridge at on-site needs to be considered to identify thermal behaviors throughout exterior walls and thus improve overall insulation performance of buildings. In this paper, the model of a thermal bridge performance indicator is developed based on an in-situ Infrared thermography method, and a case study is then carried out to evaluate thermal performance of an existing exterior wall using the developed model. For the estimation method in this study, the form of the likelihood function is used with the Bayesian method to constantly reflect the measured data. Subsequently, the coefficient of variation is applied to analyze required times for the assumed convergence. Results from the measurement for three days show that thermal bridge under the measurement has more heat losses, including 1.14 times, when compared to the non-thermal bridge. In addition, the results present that it takes about 40 hours to reach 1% of the variation coefficient. Comparison of the ITB estimated at coefficient of variation 1% (40 hours point) with the ITB estimated at end-of-experiment (72 hours point) results in 0.9% of a relative error.

Download Full-text