acoustic feature
Recently Published Documents


TOTAL DOCUMENTS

260
(FIVE YEARS 89)

H-INDEX

20
(FIVE YEARS 4)

2022 ◽  
Vol 3 (4) ◽  
pp. 295-307
Author(s):  
Subarna Shakya

Personal computer-based data collection and analysis systems may now be more resilient due to the recent advances in digital signal processing technology. The signal processing approach known as Speaker Recognition, uses the specific information contained in voice waves to automatically identify the speaker. For a single source, this study examines systems that can recognize a wide range of emotional states in speech. Since it offers insight into human brain states, it's a hot issue in the development during the interface between human and computer arrangement for speech processing. Mostly, it is necessary to recognize the emotional state of people in the arrangement. This research analyses an effort to discern various emotional stages such as anger, joy, neutral, fear and sadness by classification methods. The acoustic feature, a measure of unpredictability, is used in conjunction with a non-linear signal quantification approach to identify emotions. The unpredictability of all the emotional signals is included in a feature vector constructed from the calculated entropy measurements. In the next step, the acoustic features through speech signal are used for the training in the proposed neural network that are given to linear discriminator analysis approach for further greater classification with acoustic feature extraction. Besides, this research article compares the proposed work with various modern classifiers such as K- nearest neighbor, support vector machine and linear discriminator approach. Moreover, this proposed algorithm is based on acoustic features in Linear Discriminant Analysis (LDA) with acoustic feature extraction machine algorithm. The great advantage of this proposed algorithm is that it separates negative and positive features of emotions and provides good results during classification. According to the results from efficient cross-validation in the proposed framework, accessible sample of dataset of Emotional Speech, a single-source LDA classifier can recognize emotions in speech signals with above 90 percent of accuracy for various emotional stages.


PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0259140
Author(s):  
Cihun-Siyong Alex Gong ◽  
Chih-Hui Simon Su ◽  
Kuo-Wei Chao ◽  
Yi-Chu Chao ◽  
Chin-Kai Su ◽  
...  

The research describes the recognition and classification of the acoustic characteristics of amphibians using deep learning of deep neural network (DNN) and long short-term memory (LSTM) for biological applications. First, original data is collected from 32 species of frogs and 3 species of toads commonly found in Taiwan. Secondly, two digital filtering algorithms, linear predictive coding (LPC) and Mel-frequency cepstral coefficient (MFCC), are respectively used to collect amphibian bioacoustic features and construct the datasets. In addition, principal component analysis (PCA) algorithm is applied to achieve dimensional reduction of the training model datasets. Next, the classification of amphibian bioacoustic features is accomplished through the use of DNN and LSTM. The Pytorch platform with a GPU processor (NVIDIA GeForce GTX 1050 Ti) realizes the calculation and recognition of the acoustic feature classification results. Based on above-mentioned two algorithms, the sound feature datasets are classified and effectively summarized in several classification result tables and graphs for presentation. The results of the classification experiment of the different features of bioacoustics are verified and discussed in detail. This research seeks to extract the optimal combination of the best recognition and classification algorithms in all experimental processes.


Processes ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 2286
Author(s):  
Ammar Amjad ◽  
Lal Khan ◽  
Hsien-Tsung Chang

Recently, identifying speech emotions in a spontaneous database has been a complex and demanding study area. This research presents an entirely new approach for recognizing semi-natural and spontaneous speech emotions with multiple feature fusion and deep neural networks (DNN). A proposed framework extracts the most discriminative features from hybrid acoustic feature sets. However, these feature sets may contain duplicate and irrelevant information, leading to inadequate emotional identification. Therefore, an support vector machine (SVM) algorithm is utilized to identify the most discriminative audio feature map after obtaining the relevant features learned by the fusion approach. We investigated our approach utilizing the eNTERFACE05 and BAUM-1s benchmark databases and observed a significant identification accuracy of 76% for a speaker-independent experiment with SVM and 59% accuracy with, respectively. Furthermore, experiments on the eNTERFACE05 and BAUM-1s dataset indicate that the suggested framework outperformed current state-of-the-art techniques on the semi-natural and spontaneous datasets.


Author(s):  
A. V. Ivanov

The article is devoted to the study of the Russian linguistic term “udareniye” (“stress” / “accent”) from the historicolinguistic, etymological and lexicographic viewpoints. Currently, there are hardly any studies analyzing the term’s semantic development or attempts to present its semantic structure in a dictionary, based on textual and lexicographic sources ranging from the end of the 16th to the end of the 19th century, that would make it possible that would make it possible to trace the semantic changes of the term in the course of time. The existing studies mainly address the the issues of terminology developed in individual authors’ grammars, or describe the terminology of grammars related to a certain period of time or a specific epoch. The purpose of this article is to systematize etymological information about the term “udareniye”, to conduct an historico-linguistic analysis of its semantics, starting with the first Russian grammars and lexicons. The research uses comparative, definitional, etymological, semantic and lexicographic methods. The semantic structure of the term “udareniye”, when taking into account its two main meanings (emphasizing of some syllable in the word with voice and reflection of such an emphasis in writing by graphical means, e.g. diacritics), begins to take shape from its first written fixation in 1591 in Adelphotes and ends by 1830, when grammarians finally stopped associating the Russian accent, uniform by nature, with the sound length and modulations of the tone, and focused their attention on the study of intensity (pronouncing power), which can be regarded as the main acoustic feature of the stressed syllable in languages with dynamic stress. By this time, the type of the Russian stress stops being identified with the Greek and Church Slavonic patterns and the attempts to uncritically borrow the corresponding terminological nominations with their forms and meanings from Greek into Russian come to an end. The term “udareniye” got probably for the first time its lexicographic entry at the beginning of the 18th century due to Polikarpov-Orlov who described its meaning 1 in his dictionary. In its meaning 2 – based on the Greek-Church Slavonic tradition, – the term is found in almost all the text sources examined by the author, starting from Adelphotes, that is from the end of the 16th century. As for its first lexicographic fixation, it took place in 1763 in Poletika’s dictionary.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Chuan-Yu Chang ◽  
Sweta Bhattacharya ◽  
P. M. Durai Raj Vincent ◽  
Kuruva Lakshmanna ◽  
Kathiravan Srinivasan

The cry is a loud, high pitched verbal communication of infants. The very high fundamental frequency and resonance frequency characterize a neonatal infant cry having certain sudden variations. Furthermore, in a tiny duration solitary utterance, the cry signal also possesses both voiced and unvoiced features. Mostly, infants communicate with their caretakers through cries, and sometimes, it becomes difficult for the caretakers to comprehend the reason behind the newborn infant cry. As a result, this research proposes a novel work for classifying the newborn infant cries under three groups such as hunger, sleep, and discomfort. For each crying frame, twelve features get extracted through acoustic feature engineering, and the variable selection using random forests was used for selecting the highly discriminative features among the twelve time and frequency domain features. Subsequently, the extreme gradient boosting-powered grouped-support-vector network is deployed for neonate cry classification. The empirical results show that the proposed method could effectively classify the neonate cries under three different groups. The finest experimental results showed a mean accuracy of around 91% for most scenarios, and this exhibits the potential of the proposed extreme gradient boosting-powered grouped-support-vector network in neonate cry classification. Also, the proposed method has a fast recognition rate of 27 seconds in the identification of these emotional cries.


2021 ◽  
Vol 11 (21) ◽  
pp. 10475
Author(s):  
Xiao Zhou ◽  
Zhenhua Ling ◽  
Yajun Hu ◽  
Lirong Dai

An encoder–decoder with attention has become a popular method to achieve sequence-to-sequence (Seq2Seq) acoustic modeling for speech synthesis. To improve the robustness of the attention mechanism, methods utilizing the monotonic alignment between phone sequences and acoustic feature sequences have been proposed, such as stepwise monotonic attention (SMA). However, the phone sequences derived by grapheme-to-phoneme (G2P) conversion may not contain the pauses at the phrase boundaries in utterances, which challenges the assumption of strictly stepwise alignment in SMA. Therefore, this paper proposes to insert hidden states into phone sequences to deal with the situation that pauses are not provided explicitly, and designs a semi-stepwise monotonic attention (SSMA) to model these inserted hidden states. In this method, hidden states are introduced that absorb the pause segments in utterances in an unsupervised way. Thus, the attention at each decoding frame has three options, moving forward to the next phone, staying at the same phone, or jumping to a hidden state. Experimental results show that SSMA can achieve better naturalness of synthetic speech than SMA when phrase boundaries are not available. Moreover, the pause positions derived from the alignment paths of SSMA matched the manually labeled phrase boundaries quite well.


Author(s):  
Hannah P. Rowe ◽  
Kaila L. Stipancic ◽  
Adam C. Lammert ◽  
Jordan R. Green

Purpose This study investigated the criterion (analytical and clinical) and construct (divergent) validity of a novel, acoustic-based framework composed of five key components of motor control: Coordination, Consistency, Speed, Precision, and Rate. Method Acoustic and kinematic analyses were performed on audio recordings from 22 subjects with amyotrophic lateral sclerosis during a sequential motion rate task. Perceptual analyses were completed by two licensed speech-language pathologists, who rated each subject's speech on the five framework components and their overall severity. Analytical and clinical validity were assessed by comparing performance on the acoustic features to their kinematic correlates and to clinician ratings of the five components, respectively. Divergent validity of the acoustic-based framework was then assessed by comparing performance on each pair of acoustic features to determine whether the features represent distinct articulatory constructs. Bivariate correlations and partial correlations with severity as a covariate were conducted for each comparison. Results Results revealed moderate-to-strong analytical validity for every acoustic feature, both with and without controlling for severity, and moderate-to-strong clinical validity for all acoustic features except Coordination, without controlling for severity. When severity was included as a covariate, the strong associations for Speed and Precision became weak. Divergent validity was supported by weak-to-moderate pairwise associations between all acoustic features except Speed (second-formant [F2] slope of consonant transition) and Precision (between-consonant variability in F2 slope). Conclusions This study demonstrated that the acoustic-based framework has potential as an objective, valid, and clinically useful tool for profiling articulatory deficits in individuals with speech motor disorders. The findings also suggest that compared to clinician ratings, instrumental measures are more sensitive to subtle differences in articulatory function. With further research, this framework could provide more accurate and reliable characterizations of articulatory impairment, which may eventually increase clinical confidence in the diagnosis and treatment of patients with different articulatory phenotypes.


2021 ◽  
Vol 11 ◽  
Author(s):  
Wu Zhou ◽  
Yong-Zhong Li ◽  
Li-Min Gao ◽  
Di-Ming Cai

ObjectivePrevious studies have mostly discussed the clinical manifestations and prognosis of mucinous breast carcinoma with a micropapillary pattern. The purposes of this study were to investigate the sonographic features of pure mucinous breast carcinoma with micropapillary pattern (MUMPC) and to identify the role of ultrasound in the differential diagnosis between MUMPC and conventional pure mucinous breast carcinoma (cPMBC).Materials and MethodsWe obtained written informed consent from all patients, and the Ethics Committee of West China Hospital approved this retrospective study. The study was conducted between May and August 2020. We enrolled 133 patients with 133 breast lesions confirmed as mucinous breast carcinoma (MBC) histopathologically between January 2014 and January 2020.We retrospectively assessed sonographic features (margin, shape, internal echogenicity, calcification, posterior acoustic feature, invasive growth, blood flow grade, and rate of missed diagnosis) and clinical characteristics (age, tumor size, tumor texture, initial symptom, and lymph node metastasis). Bivariable analyses were performed using SPSS version 19.0.ResultsThe 133 lesions included 11 MUMPCs, 65 cPMBCs, and 57 mixed MBCs (MMBCs). There were significant differences in margin, shape, calcification, posterior acoustic feature, invasive growth, rate of missed diagnosis, average tumor size, and lymph node metastasis among the three groups (p < 0.05). The subsequent pairwise comparisons showed that there were significant differences in lymph node metastasis, margin, and invasive growth between MUMPC and cPMBC (p < 0.05). In patients aged >45 years, there was a significant difference in tumor size among the three groups (p = 0.045), and paired comparison showed that the average tumor size in the cPMBC group was larger than that in the MMBC group (p = 0.014).ConclusionMUMPC showed a non-circumscribed margin and invasive growth more frequently than cPMBC did. Lymphatic metastasis was more likely to occur in MUMPC than cPMBC. Ultrasound is helpful to distinguish MUMPC from cPMBC.


2021 ◽  
Author(s):  
Basil C Preisig ◽  
Lars Riecke ◽  
Alexis Hervais-Adelman

What processes lead to categorical perception of speech sounds? Investigation of this question is hampered by the fact that categorical speech perception is normally confounded by acoustic differences in the stimulus. By using ambiguous sounds, however, it is possible to dissociate acoustic from perceptual stimulus representations. We used a binaural integration task, where the inputs to the two ears were complementary so that phonemic identity emerged from their integration into a single percept. Twenty-seven normally hearing individuals took part in an fMRI study in which they were presented with an ambiguous syllable (intermediate between /da/ and /ga/) in one ear and with a meaning-differentiating acoustic feature (third formant) in the other ear. Multi-voxel pattern searchlight analysis was used to identify brain areas that consistently differentiated between response patterns associated with different syllable reports. By comparing responses to different stimuli with identical syllable reports and identical stimuli with different syllable reports, we disambiguated whether these regions primarily differentiated the acoustics of the stimuli or the syllable report. We found that BOLD activity patterns in the left anterior insula (AI), the left supplementary motor cortex, the left ventral motor cortex and the right motor and somatosensory cortex (M1/S1) represent listeners' syllable report irrespective of stimulus acoustics. The same areas have been previously implicated in decision-making (AI), response selection (SMA), and response initiation and feedback (M1/S1). Our results indicate that the emergence of categorical speech sounds implicates decision-making mechanisms and auditory-motor transformations acting on sensory inputs.


Sign in / Sign up

Export Citation Format

Share Document