Knowledge-Based Features for Place Classification of Unvoiced Stops

AbstractThe classification of unvoiced stops in consonant–vowel (CV) syllables, segmented from continuous speech, is investigated by features related to speech production. As burst and vocalic transitions contribute to identification of stops in the CV context, features are computed from both regions. Although formants are the truly discriminating articulatory features, their estimation from the speech signal is a challenge especially in unvoiced regions like the release burst of stops. This may be compensated partially by sub-band energy-based features. In this work, formant features from the vocalic region are combined with features from the burst region comprising sub-band energies, as well as features from a formant tracking method developed for unvoiced regions. The overall combination of features at the classifier level obtains an accuracy of 84.4%, which is significantly better than that obtained with solely sub-band features on unvoiced stops in CV syllables of TIMIT.

Download Full-text

“I Can See What You’re Saying”: Clinical Utility of Spectral Moment Analysis

Perspectives on Speech Science and Orofacial Disorders ◽

10.1044/ssod21.2.44 ◽

2011 ◽

Vol 21 (2) ◽

pp. 44-54

Author(s):

Kerry Callahan Mandulak

Keyword(s):

Speech Production ◽

Speech Signal ◽

Clinical Utility ◽

Acoustic Analysis ◽

Moment Analysis ◽

Analysis Tool ◽

Spectral Moment ◽

Clinical Measure ◽

Perceptual Analysis ◽

Disordered Speech

Spectral moment analysis (SMA) is an acoustic analysis tool that shows promise for enhancing our understanding of normal and disordered speech production. It can augment auditory-perceptual analysis used to investigate differences across speakers and groups and can provide unique information regarding specific aspects of the speech signal. The purpose of this paper is to illustrate the utility of SMA as a clinical measure for both clinical speech production assessment and research applications documenting speech outcome measurements. Although acoustic analysis has become more readily available and accessible, clinicians need training with, and exposure to, acoustic analysis methods in order to integrate them into traditional methods used to assess speech production.

Download Full-text

NLOS Multipath Classification of GNSS Signal Correlation Output Using Machine Learning

Sensors ◽

10.3390/s21072503 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2503

Author(s):

Taro Suzuki ◽

Yoshiharu Amano

Keyword(s):

Machine Learning ◽

Satellite System ◽

Training Data ◽

Support Vector ◽

Positioning Errors ◽

Automated Method ◽

Global Navigation Satellite ◽

Better Than ◽

Signal Correlation

This paper proposes a method for detecting non-line-of-sight (NLOS) multipath, which causes large positioning errors in a global navigation satellite system (GNSS). We use GNSS signal correlation output, which is the most primitive GNSS signal processing output, to detect NLOS multipath based on machine learning. The shape of the multi-correlator outputs is distorted due to the NLOS multipath. The features of the shape of the multi-correlator are used to discriminate the NLOS multipath. We implement two supervised learning methods, a support vector machine (SVM) and a neural network (NN), and compare their performance. In addition, we also propose an automated method of collecting training data for LOS and NLOS signals of machine learning. The evaluation of the proposed NLOS detection method in an urban environment confirmed that NN was better than SVM, and 97.7% of NLOS signals were correctly discriminated.

Download Full-text

Deep Learning Methods for Classification of Certain Abnormalities in Echocardiography

Electronics ◽

10.3390/electronics10040495 ◽

2021 ◽

Vol 10 (4) ◽

pp. 495

Author(s):

Imayanmosha Wahlang ◽

Arnab Kumar Maji ◽

Goutam Saha ◽

Prasun Chakrabarti ◽

Michal Jasinski ◽

...

Keyword(s):

Deep Learning ◽

Short Term Memory ◽

Support Vector ◽

Variational Autoencoder ◽

Different Types ◽

Static Images ◽

Long Short Term Memory ◽

2D And 3D ◽

Better Than

This article experiments with deep learning methodologies in echocardiogram (echo), a promising and vigorously researched technique in the preponderance field. This paper involves two different kinds of classification in the echo. Firstly, classification into normal (absence of abnormalities) or abnormal (presence of abnormalities) has been done, using 2D echo images, 3D Doppler images, and videographic images. Secondly, based on different types of regurgitation, namely, Mitral Regurgitation (MR), Aortic Regurgitation (AR), Tricuspid Regurgitation (TR), and a combination of the three types of regurgitation are classified using videographic echo images. Two deep-learning methodologies are used for these purposes, a Recurrent Neural Network (RNN) based methodology (Long Short Term Memory (LSTM)) and an Autoencoder based methodology (Variational AutoEncoder (VAE)). The use of videographic images distinguished this work from the existing work using SVM (Support Vector Machine) and also application of deep-learning methodologies is the first of many in this particular field. It was found that deep-learning methodologies perform better than SVM methodology in normal or abnormal classification. Overall, VAE performs better in 2D and 3D Doppler images (static images) while LSTM performs better in the case of videographic images.

Download Full-text

Speech Emotional Features Extraction Based on Electroglottograph

Neural Computation ◽

10.1162/neco_a_00523 ◽

2013 ◽

Vol 25 (12) ◽

pp. 3294-3317 ◽

Cited By ~ 7

Author(s):

Lijiang Chen ◽

Xia Mao ◽

Pengfei Wei ◽

Angelo Compare

Keyword(s):

Emotion Recognition ◽

Speech Signal ◽

Vocal Tract ◽

Vocal Folds ◽

Distribution Coefficients ◽

Speech Emotion Recognition ◽

Support Vector ◽

Power Law Distribution ◽

Transform Coefficients ◽

Better Than

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition.

Download Full-text

A Method for Classification of Transient Events in EEG Recordings: Application to Epilepsy Diagnosis

Methods of Information in Medicine ◽

10.1055/s-0038-1634122 ◽

2006 ◽

Vol 45 (06) ◽

pp. 610-621 ◽

Cited By ~ 24

Author(s):

A. T. Tzallas ◽

P. S. Karvelis ◽

C. D. Katsis ◽

S. Giannopoulos ◽

S. Konitsiotis ◽

...

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Epileptic Activity ◽

Knowledge Based System ◽

Knowledge Based ◽

Epilepsy Diagnosis ◽

Transient Events ◽

Artificial Neural ◽

Eeg Recordings

Summary Objectives: The aim of the paper is to analyze transient events in inter-ictal EEG recordings, and classify epileptic activity into focal or generalized epilepsy using an automated method. Methods: A two-stage approach is proposed. In the first stage the observed transient events of a single channel are classified into four categories: epileptic spike (ES), muscle activity (EMG), eye blinking activity (EOG), and sharp alpha activity (SAA). The process is based on an artificial neural network. Different artificial neural network architectures have been tried and the network having the lowest error has been selected using the hold out approach. In the second stage a knowledge-based system is used to produce diagnosis for focal or generalized epileptic activity. Results: The classification of transient events reported high overall accuracy (84.48%), while the knowledge-based system for epilepsy diagnosis correctly classified nine out of ten cases. Conclusions: The proposed method is advantageous since it effectively detects and classifies the undesirable activity into appropriate categories and produces a final outcome related to the existence of epilepsy.

Download Full-text

Knowledge Based Classification of Circulation Patterns for Stochastic Precipitation Modeling

Stochastic and Statistical Methods in Hydrology and Environmental Engineering - Water Science and Technology Library ◽

10.1007/978-94-017-3083-9_2 ◽

1994 ◽

pp. 19-32 ◽

Cited By ~ 3

Author(s):

A. Bárdossy ◽

H. Muster ◽

L. Duckstein ◽

I. Bogardi

Keyword(s):

Knowledge Based ◽

Circulation Patterns ◽

Precipitation Modeling

Download Full-text

Classification of CV transitions in continuous speech using neural network models

Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks ◽

10.1109/sipnn.1994.344956 ◽

2002 ◽

Cited By ~ 2

Author(s):

C.C. Sekhar ◽

B. Yegnanarayana

Keyword(s):

Neural Network ◽

Network Models ◽

Continuous Speech ◽

Neural Network Models

Download Full-text

Automated speech signal analysis based on feature extraction and classification of spasmodic dysphonia: a performance comparison of different classifiers

International Journal of Speech Technology ◽

10.1007/s10772-017-9471-8 ◽

2017 ◽

Vol 21 (1) ◽

pp. 9-18 ◽

Cited By ~ 2

Author(s):

Snekhalatha Umapathy ◽

Shamila Rachel ◽

Rajalakshmi Thulasi

Keyword(s):

Feature Extraction ◽

Speech Signal ◽

Signal Analysis ◽

Performance Comparison ◽

Spasmodic Dysphonia ◽

A Performance

Download Full-text

Distinctive Features in Speech Pathology: Phonology or Phonemics?

Journal of Speech and Hearing Disorders ◽

10.1044/jshd.4101.23 ◽

1976 ◽

Vol 41 (1) ◽

pp. 23-39 ◽

Cited By ~ 11

Author(s):

Frank Parker

Keyword(s):

Speech Production ◽

Speech Signal ◽

Speech Pathology ◽

Direct Relation ◽

Crucial Point ◽

Distinctive Features ◽

Phonological Structure ◽

Generative Theory ◽

Speech Science ◽

The One

Distinctive feature is not a unique concept within linguistic theory. It has two distinct theoretical bases: phonemic theory and generative theory. Phonemic theory assumes a direct correspondence between distinctive features (the elements of phonemes) and the speech signal. Although this assumption can be shown to be incorrect, it seems to be the one most widely held in speech science. Generative theory, on the other hand, assumes no such direct relation and consequently can account for certain linguistic phenomena that phonemic theory cannot. This theory then seems to be preferable to phonemic theory for a featural analysis of misarticulation. However, there is a problem. Chomsky and Halle’s system (generative theory) as it stands does not deal with the link between what it conceives to be the lowest level of linguistic structure (the phonetic matrix) and speech production. Therefore, Chomsky and Halle’s distinctive features cannot be applied fruitfully to all instances of misarticulation. The discrepancy that exists between phonological structure and the speech signal must be accounted for in a theory of speech production. This can be accomplished by recognizing a production matrix below the phonetic matrix, where segments are described in terms of production features. The crucial point is that no one-to-one relationship necessarily exists between distinctive features and production features.

Download Full-text

Classification of coarse phonetic categories in continuous speech: statistical classifiers vs. temporal flow connectionist network

10.1109/icassp.1990.115544 ◽

2002 ◽

Cited By ~ 1

Author(s):

A. Aktas ◽

O. Schmidbauer ◽

K.H. Maier ◽

W.H. Feix

Keyword(s):

Continuous Speech ◽

Connectionist Network ◽

Statistical Classifiers ◽

Phonetic Categories ◽

Temporal Flow

Download Full-text