Co-occurrence Based Approach for Differentiation of Speech and Song

Proceedings of Intelligent Computing and Technologies Conference ◽

10.21467/proceedings.115.17 ◽

2021 ◽

Author(s):

Arijit Ghosal ◽

Ranjit Ghoshal

Keyword(s):

Speech Signal ◽

Audio Signal ◽

Auditory Signal ◽

Acoustic Feature ◽

Supervised Classifiers ◽

Genre Identification ◽

Song Discrimination ◽

Occurrence Matrix ◽

Short Time ◽

Short Time Energy

Discrimination of speech and song through auditory signal is an exciting topic of research. Preceding efforts were mainly discrimination of speech and non-speech but moderately fewer efforts were carried out to discriminate speech and song. Discrimination of speech and song is one of the noteworthy fragments of automatic sorting of audio signal because this is considered to be the fundamental step of hierarchical approach towards genre identification, audio archive generation. The previous efforts which were carried out to discriminate speech and song, have involved frequency domain and perceptual domain aural features. This work aims to propose an acoustic feature which is small dimensional as well as easy to compute. It is observed that energy level of speech signal and song signal differs largely due to absence of instrumental part as a background in case of speech signal. Short Time Energy (STE) is the best acoustic feature which can echo this scenario. For precise study of energy variation co-occurrence matrix of STE is generated and statistical features are extracted from it. For classification resolution, some well-known supervised classifiers have been engaged in this effort. Performance of proposed feature set has been compared with other efforts to mark the supremacy of the feature set.

Download Full-text

Automatic syllabification of speech signal using short time energy and vowel onset points

International Journal of Speech Technology ◽

10.1007/s10772-018-9517-6 ◽

2018 ◽

Vol 21 (3) ◽

pp. 571-579 ◽

Cited By ~ 2

Author(s):

Leena Mary ◽

Anil P. Antony ◽

Ben P. Babu ◽

S. R. Mahadeva Prasanna

Keyword(s):

Speech Signal ◽

Vowel Onset Points ◽

Short Time ◽

Automatic Syllabification ◽

Short Time Energy

Download Full-text

Speech file compression by eliminating unvoiced/silence components

Sustainable Engineering and Innovation, ISSN 2712-0562 ◽

10.37868/sei.v3i1.119 ◽

2021 ◽

Vol 3 (1) ◽

pp. 11-14

Author(s):

Arda Şahin ◽

Mehmet Zübeyir Ünlü

Keyword(s):

User Interface ◽

Speech Signal ◽

Noise Component ◽

Zero Crossing ◽

Zero Crossing Rate ◽

Short Time ◽

Short Time Energy ◽

Voice Recording

The main objective of this study is to have noise component of a speech signal eliminated and compressing it by storing the locations and durations of silence regions. The separation between voiced, unvoiced, and silence regions are done by using the Short Time Energy (STE) and Zero Crossing Rate (ZCR) methodologies. All operations in this study have been performed by using the User Interface (UI) developed on MATLAB®. These operations include voice recording, playing the recording, eliminating the unwanted regions, playing the modified recording, saving of original and compressed files and loading the recording compressed.

Download Full-text

Modified GSC Method to Reduce the Distortion of the Enhanced Speech Signal Using Cross-Correlation and Sidelobe Neutralization

Applied Sciences ◽

10.3390/app11146288 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6288

Author(s):

Hang Su ◽

Chang-Myung Lee

Keyword(s):

Speech Signal ◽

Output Signal ◽

Cross Correlation ◽

Acoustic Noise ◽

Audio Signal ◽

Noise Signal ◽

Lms Algorithm ◽

Least Mean Square ◽

Experiment Data ◽

Noise Component

The generalized sidelobe canceller (GSC) method is a common algorithm to enhance audio signals using a microphone array. Distortion of the enhanced audio signal consists of two parts: the residual acoustic noise and the distortion of the desired audio signal, which means that the desired audio signal is damaged. This paper proposes a modified GSC method to reduce both kinds of distortion when the desired audio signal is a non-stationary speech signal. First, the cross-correlation coefficient between the canceling signal and the error signal of the least mean square (LMS) algorithm was added to the adaptive process of the GSC method to reduce the distortion of the enhanced signal while the energy of the desired signal frame was increased suddenly. The sidelobe pattern of beamforming was then presented to estimate the noise signal in the beamforming output signal of the GSC method. The noise component of the beamforming output signal was decreased by subtracting the estimated noise signal to improve the denoising performance of the GSC method. Finally, the GSC-SN-MCC method was proposed by merging the above two methods. The experiment was performed in an anechoic chamber to validate the proposed method in various SNR conditions. Furthermore, the simulated calculation with inaccurate noise directions was conducted based on the experiment data to inspect the robustness of the proposed method to the error of the estimated noise direction. The experiment data and calculation results indicated that the proposed method could reduce the distortion effectively under various SNR conditions and would not cause more distortion if the estimated noise direction is far from the actual noise direction.

Download Full-text

Über Energien von Drahtexplosionsstoßwellen / Energies of Shock Waves Produced bv Wire Explosions

Zeitschrift für Naturforschung A ◽

10.1515/zna-1973-0118 ◽

1973 ◽

Vol 28 (1) ◽

pp. 105-109 ◽

Cited By ~ 1

Author(s):

H. Jäger ◽

R. Schöfer

Keyword(s):

Shock Wave ◽

Shock Waves ◽

Energy Input ◽

Discharge Circuit ◽

Expansion Velocity ◽

Input Condition ◽

Wire Material ◽

Short Time ◽

The Waves ◽

Short Time Energy

For shock waves produced by special wire explosions the short time energy input condition of the theories of Lin, Sakurai and Vlases-Jones is fairly good fulfilled. In these cases the shock wave energies can be easily determined from the expansion velocity of the waves. Variation of the parameters of the discharge circuit show, how these parameters should be chosen in order to get a maximum transfer of energy either to the shock waves or to the wire material.

Download Full-text

Voice activity detection based on short-time energy and noise spectrum adaptation

6th International Conference on Signal Processing, 2002. ◽

10.1109/icosp.2002.1181092 ◽

2003 ◽

Cited By ~ 9

Author(s):

Dong Enqing ◽

Liu Guizhong ◽

Zhou Yatong ◽

Cai Yu

Keyword(s):

Noise Spectrum ◽

Voice Activity Detection ◽

Activity Detection ◽

Short Time ◽

Spectrum Adaptation ◽

Voice Activity ◽

Short Time Energy

Download Full-text

Packet Loss Concealment Using Dual-Side Waveform Similarity Overlap-and-Add

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.284-287.2867 ◽

2013 ◽

Vol 284-287 ◽

pp. 2867-2871 ◽

Cited By ~ 1

Author(s):

Jui Feng Yeh ◽

Min Da Kuo ◽

Zhong Hua Hsu

Keyword(s):

Packet Loss ◽

Speech Signal ◽

Side Information ◽

Audio Signal ◽

Data Communication ◽

Speech Communication ◽

Waveform Similarity ◽

Voice Data ◽

Traditional Approaches

Packet loss is one of the most essential problems in speech communication. It will cause the information loss and uncomfortable for listeners in voice over IP. This investment proposed an approach based on waveform similarity measure using overlap-and-Add algorithm. The waveform similarity overlap-and-add (WSOLA) technique is an effective algorithm to deal with packet loss concealment (PLC). For real-time time communication, the WSOLA algorithm is widely used to deal with the length adaptation and packet loss concealment of speech signal. Time scale modification of audio signal is one of the most essential research topics in data communication, especially in voice of IP (VoIP). Herein, we proposed the dual-side WSOLA that is derived by standard WSOLA. Instead of only exploitation one direction speech data, the proposed method will reconstruct the lost voice data according to the preceding and cascading voice. The dual-side WSOLA can use both the past and future speech signal waveform to reconstruction voice waveform of lost packet. The evaluations show that the quality of the reconstructed speech signal of the dual-side WSOLA is higher than that of the standard WSOLA and GWSOLA on different packet loss rate and length using the metrics: PESQ and MOS. The significant improvement is obtained by dual side information in the proposed method. The proposed dual-side waveform similarity overlap-and-add (DSWSOLA) outperforms the traditional approaches.

Download Full-text

Pitch-Synchronous Linear Prediction Analysis of High-Pitched Speech Using Weighted Short-Time Energy Function

Journal of Signal Processing ◽

10.2299/jsp.19.55 ◽

2015 ◽

Vol 19 (2) ◽

pp. 55-66 ◽

Cited By ~ 3

Author(s):

Liqing Liu ◽

Tetsuya Shimamura

Keyword(s):

Energy Function ◽

Linear Prediction ◽

Prediction Analysis ◽

Short Time ◽

Short Time Energy ◽

Linear Prediction Analysis

Download Full-text

The Immediate Processing of Sentences

Quarterly Journal of Experimental Psychology ◽

10.1080/00335557743000099 ◽

1977 ◽

Vol 29 (1) ◽

pp. 135-146 ◽

Cited By ~ 12

Author(s):

D. W. Green

Keyword(s):

Reaction Time ◽

Speech Signal ◽

Semantic Processing ◽

Voluntary Control ◽

Auditory Signal ◽

Speech Comprehension ◽

Integrative Process

Two independent groups of subjects, under instruction orienting them towards understanding or towards memorizing sentences were timed to respond to a brief auditory signal which occurred at some point during the course of a sentence. Latency appeared to be primarily a function of the task, such that the deeper the semantic processing of the sentence the longer the reaction time. Together with other aspects of the data, it is argued that such tasks affect the extent to which a subject retrieves the meanings of the words in a sentence and integrates them at the end of it. Concrete and abstract sentences were processed in fundamentally the same way. The conclusion drawn is that speech comprehension is an integrative process, under voluntary control, which collates together different aspects of the speech signal.

Download Full-text

Modular Toroidal Copper Coil for the Investigation of Inductive Pulsed Power Generators in the MJ-Range

10.36227/techrxiv.10732424 ◽

2019 ◽

Author(s):

Oliver Liebfried ◽

Volker Brommer ◽

Harald Scharf ◽

Matthias Schacherer ◽

Paul Frings

Keyword(s):

Energy Storage ◽

Stress Test ◽

Electrical Characterization ◽

Pulsed Power ◽

Special Issue ◽

Power Generators ◽

Short Time ◽

Copper Coil ◽

Poster Contribution ◽

Short Time Energy

<div>Poster contribution to the 26th International Conference on Magnet Technology (MT26) in Vancouver, Canada, September 22-27, 2019. paper was submitted to the MT26 special issue of the IEEE Transactions on Applied Superconductivity.</div><div><br></div><div>Abstract: Inductive pulsed power generators apply coils as<br>powerful short time energy storage which is an ordinary mean to deliver pulses of high power to loads like electromagnetic accelerators. This article deals with the design, simulation, construction, electrical characterization and a pulsed stress test of a modular toroidal coil. The coil was made from 180 D-shaped copper discs and has an approximate inductance of 1mH (f > 50 Hz) and frequency dependent resistance according to 3.88 mOhm Sqrt(f) + 5 mOhm. Its height, diameter and weight is 0.4 m, 1 m and 1 ton respectively. It is designed to store more than 1 MJ<br>of energy.<br></div>

Download Full-text

Detection of Activities During Newborn Resuscitation Based on Short-Time Energy of Acceleration Signal

Lecture Notes in Computer Science - Image and Signal Processing ◽

10.1007/978-3-319-33618-3_27 ◽

2016 ◽

pp. 262-270 ◽

Cited By ~ 1

Author(s):

Huyen Vu ◽

Trygve Eftestøl ◽

Kjersti Engan ◽

Joar Eilevstjønn ◽

Ladislaus Blacy Yarrot ◽

...

Keyword(s):

Newborn Resuscitation ◽

Acceleration Signal ◽

Short Time ◽

Short Time Energy

Download Full-text