speech modeling Latest Research Papers

The paper proposes three modeling techniques to improve the performance evaluation of the call center agent. The first technique is speech processing supported by an attention layer for the agent’s recorded calls. The speech comprises 65 features for the ultimate determination of the context of the call using the Open-Smile toolkit. The second technique uses the Max Weights Similarity (MWS) approach instead of the Softmax function in the attention layer to improve the classification accuracy. MWS function replaces the Softmax function for fine-tuning the output of the attention layer for processing text. It is formed by determining the similarity in the distance of input weights of the attention layer to the weights of the max vectors. The third technique combines the agent’s recorded call speech with the corresponding transcribed text for binary classification. The speech modeling and text modeling are based on combinations of the Convolutional Neural Networks (CNNs) and Bi-directional Long-Short Term Memory (BiLSTMs). In this paper, the classification results for each model (text versus speech) are proposed and compared with the multimodal approach’s results. The multimodal classification provided an improvement of (0.22%) compared with acoustic model and (1.7%) compared with text model.

Download Full-text

Individual differences across caregivers in acoustic implementation of infant-directed and adult-directed speech: Modeling impacts on intelligibility in children with cochlear implants

The Journal of the Acoustical Society of America ◽

10.1121/1.5137138 ◽

2019 ◽

Vol 146 (4) ◽

pp. 2921-2921

Author(s):

Meisam K. Arjmandi ◽

Derek Houston ◽

Mario Svirsky ◽

Yuanyuan Wang ◽

Matt Lehet ◽

...

Keyword(s):

Individual Differences ◽

Cochlear Implants ◽

Speech Modeling

Download Full-text

Audio Signal Processing Using Fractional Linear Prediction

Mathematics ◽

10.3390/math7070580 ◽

2019 ◽

Vol 7 (7) ◽

pp. 580 ◽

Cited By ~ 1

Author(s):

Tomas Skovranek ◽

Vladimir Despotovic

Keyword(s):

Signal Processing ◽

Linear Prediction ◽

Audio Signal ◽

Biomedical Signal Processing ◽

Model Complexity ◽

Biomedical Signal ◽

Speech Modeling ◽

Lp Model ◽

Musical Chords ◽

Fractional Terms

Fractional linear prediction (FLP), as a generalization of conventional linear prediction (LP), was recently successfully applied in different fields of research and engineering, such as biomedical signal processing, speech modeling and image processing. The FLP model has a similar design as the conventional LP model, i.e., it uses a linear combination of “fractional terms” with different orders of fractional derivative. Assuming only one “fractional term” and using limited number of previous samples for prediction, FLP model with “restricted memory” is presented in this paper and the closed-form expressions for calculation of FLP coefficients are derived. This FLP model is fully comparable with the widely used low-order LP, as it uses the same number of previous samples, but less predictor coefficients, making it more efficient. Two different datasets, MIDI Aligned Piano Sounds (MAPS) and Orchset, were used for the experiments. Triads representing the chords composed of three randomly chosen notes and usual Western musical chords (both of them from MAPS dataset) served as the test signals, while the piano recordings from MAPS dataset and orchestra recordings from the Orchset dataset served as the musical signal. The results show enhancement of FLP over LP in terms of model complexity, whereas the performance is comparable.

Download Full-text

Non Parametric Speech Modeling

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5324 ◽

2016 ◽

Vol 13 (7) ◽

pp. 4588-4605

Author(s):

Osama Abdo Mohamed

Keyword(s):

Speech Modeling ◽

Non Parametric

Download Full-text

Cumulative context effects and variant lexical representations: Word use and English final t/d deletion

Language Variation and Change ◽

10.1017/s0954394516000041 ◽

2016 ◽

Vol 28 (2) ◽

pp. 175-202 ◽

Cited By ~ 12

Author(s):

William D. Raymond ◽

Esther L. Brown ◽

Alice F. Healy

Keyword(s):

Context Effects ◽

Cumulative Exposure ◽

Word Use ◽

Lexical Representations ◽

Word Repetition ◽

Production Variability ◽

Word Forms ◽

Final Segment ◽

Variant Frequency ◽

Speech Modeling

AbstractWord production variability is widespread in speech, and rates of variant production correlate with many factors. Recent research suggests mental representation of both canonical word forms and distinct reduced variants, and that production and processing are sensitive to variant frequency. What factors lead to frequency-weighted variant representations? An experiment manipulated following context and word repetition for final t/d words in read, narrative English speech. Modeling the experimentally generated data statistically showed higher final-segment deletion in tokens followed by consonant-initial words, but no evidence of increased deletion with repetition, regardless of context. Deletion rates were also higher the greater a word's cumulative exposure to consonant contexts (measured from distributional statistics), but there was no effect of word frequency. Token effects are interpreted in terms of articulation processes. The type-level context effect is interpreted within exemplar and usage-based models of language to suggest that experiences with word variants in contexts register as frequency-weighted representations.

Download Full-text