speech modeling
Recently Published Documents


TOTAL DOCUMENTS

60
(FIVE YEARS 3)

H-INDEX

12
(FIVE YEARS 1)

Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2720
Author(s):  
Abdelrahman Ahmed ◽  
Khaled Shaalan ◽  
Sergio Toral ◽  
Yasser Hifny

The paper proposes three modeling techniques to improve the performance evaluation of the call center agent. The first technique is speech processing supported by an attention layer for the agent’s recorded calls. The speech comprises 65 features for the ultimate determination of the context of the call using the Open-Smile toolkit. The second technique uses the Max Weights Similarity (MWS) approach instead of the Softmax function in the attention layer to improve the classification accuracy. MWS function replaces the Softmax function for fine-tuning the output of the attention layer for processing text. It is formed by determining the similarity in the distance of input weights of the attention layer to the weights of the max vectors. The third technique combines the agent’s recorded call speech with the corresponding transcribed text for binary classification. The speech modeling and text modeling are based on combinations of the Convolutional Neural Networks (CNNs) and Bi-directional Long-Short Term Memory (BiLSTMs). In this paper, the classification results for each model (text versus speech) are proposed and compared with the multimodal approach’s results. The multimodal classification provided an improvement of (0.22%) compared with acoustic model and (1.7%) compared with text model.


Mathematics ◽  
2019 ◽  
Vol 7 (7) ◽  
pp. 580 ◽  
Author(s):  
Tomas Skovranek ◽  
Vladimir Despotovic

Fractional linear prediction (FLP), as a generalization of conventional linear prediction (LP), was recently successfully applied in different fields of research and engineering, such as biomedical signal processing, speech modeling and image processing. The FLP model has a similar design as the conventional LP model, i.e., it uses a linear combination of “fractional terms” with different orders of fractional derivative. Assuming only one “fractional term” and using limited number of previous samples for prediction, FLP model with “restricted memory” is presented in this paper and the closed-form expressions for calculation of FLP coefficients are derived. This FLP model is fully comparable with the widely used low-order LP, as it uses the same number of previous samples, but less predictor coefficients, making it more efficient. Two different datasets, MIDI Aligned Piano Sounds (MAPS) and Orchset, were used for the experiments. Triads representing the chords composed of three randomly chosen notes and usual Western musical chords (both of them from MAPS dataset) served as the test signals, while the piano recordings from MAPS dataset and orchestra recordings from the Orchset dataset served as the musical signal. The results show enhancement of FLP over LP in terms of model complexity, whereas the performance is comparable.


2016 ◽  
Vol 13 (7) ◽  
pp. 4588-4605
Author(s):  
Osama Abdo Mohamed

2016 ◽  
Vol 28 (2) ◽  
pp. 175-202 ◽  
Author(s):  
William D. Raymond ◽  
Esther L. Brown ◽  
Alice F. Healy

AbstractWord production variability is widespread in speech, and rates of variant production correlate with many factors. Recent research suggests mental representation of both canonical word forms and distinct reduced variants, and that production and processing are sensitive to variant frequency. What factors lead to frequency-weighted variant representations? An experiment manipulated following context and word repetition for final t/d words in read, narrative English speech. Modeling the experimentally generated data statistically showed higher final-segment deletion in tokens followed by consonant-initial words, but no evidence of increased deletion with repetition, regardless of context. Deletion rates were also higher the greater a word's cumulative exposure to consonant contexts (measured from distributional statistics), but there was no effect of word frequency. Token effects are interpreted in terms of articulation processes. The type-level context effect is interpreted within exemplar and usage-based models of language to suggest that experiences with word variants in contexts register as frequency-weighted representations.


2015 ◽  
Vol 150 ◽  
pp. 392-401 ◽  
Author(s):  
K. López-de-Ipiña ◽  
J.B. Alonso-Hernández ◽  
J. Solé-Casals ◽  
C.M. Travieso-González ◽  
A. Ezeiza ◽  
...  

2014 ◽  
Vol 22 (5) ◽  
pp. 912-922 ◽  
Author(s):  
Daniele Giacobello ◽  
Mads Grasboll Christensen ◽  
Tobias Lindstrom Jensen ◽  
Manohar N. Murthi ◽  
Soren Holdt Jensen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document