Recovery of missing speech packets using the short-time energy and zero-crossing measurements

Abstract. Voice recognition technology is currently experiencing growth, especially in the case of speech processing. Speech processing is a way to extract the desired information from a voice signal. This study discusses the classification of human voice system male and female. Extract the characteristics of the voice signal in each frame time domain and frequency domain is to help simplify and speed calculations. The features for voice or other audio between Short Time Energy, Zero Crossing Rate, Spectral Centroid, and others. Test results show that the classification system the human voice using the backpropagation neural network and Levenberg-Marquadt algorithm to change matrix weight is very good because of the complexity and rapid calculation which is not too high. Database voice sample of 40 voices with the test data as much as 5 votes. The output of the system is the result of the classification that has been identified with a similarity value>=0.5 for male and <0.5 as a female. Testing using artificial neural network produced an average success rate in voice classification amounted to 91%.Keywords: Feature Extraction, Classification, Backpropagation, Levenberg-Marquadt Algorithm, Human VoiceÂ Abstrak. Teknologi pengenalan suara saat ini telah mengalami perkembangan terutama dalam hal speech processing. Speech processing merupakan suatu cara untuk mengekstrak informasi yang diinginkan dari sebuah sinyal suara. Penelitian ini membahas sistem klasifikasi suara manusia male dan female. Mengekstrak ciri dari sinyal suara setiap frame pada kawasan waktu dan kawasan frekuensi sangat membantu untukÂ menyederhanakan dan mempercepat perhitungan. Adapun fitur-fitur untuk suara atau audio antara lain Short Time Energy, Zero Crossing Rate, Spectral Centroid dan lain-lain. Hasil pengujian sistem menunjukkan bahwa klasifikasi suara manusia dengan menggunakan jaringan saraf tiruan backpropagation dan algoritma Levenberg-Marquadt untuk perubahan matriks bobot, sangat baik dan cepat karena kompleksitas perhitungan yang tidak terlalu tinggi. Database sample suara sebanyak 40 buah dengan data test sebanyak 5 suara. Output dari sistem adalah hasil klasifikasi yang telah dikenali dengan nilai kemiripan >= 0,5 sebagai pria dan < 0,5 sebagai wanita. Pengujian dengan menggunakan jaringan saraf tiruan dihasilkan rata-rata tingkat keberhasilan dalam klasifikasi suara adalah sebesar 91 %.Kata Kunci: Feature Extraction, Klasifikasi, Backpropagation, Algoritma Levenberg-Marquadt, Suara Manusia

Download Full-text

The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection

2017 20th International Conference of Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2017.8281794 ◽

2017 ◽

Cited By ~ 3

Author(s):

Thein Htay Zaw ◽

Nu War

Keyword(s):

Prediction Error ◽

Linear Prediction ◽

Activity Detection ◽

Spectral Entropy ◽

Zero Crossing ◽

Zero Crossing Rate ◽

Entropy Zero ◽

Short Time ◽

Voice Activity ◽

Short Time Energy

Download Full-text

Optimization of K Parameters on KNN in Gamelan Jegog Title Classification Using Time Domain Features

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v10.i01.p12 ◽

2021 ◽

Vol 10 (1) ◽

pp. 91

Author(s):

Adis Luh Sankhya Artayani ◽

Luh Arida Ayu Rahning Putri

Keyword(s):

Time Domain ◽

Nearest Neighbor ◽

Classification Method ◽

K Nearest Neighbor ◽

Zero Crossing ◽

Nearest Neighbor Classifier ◽

Zero Crossing Rate ◽

Short Time ◽

Neighbor Classifier ◽

Short Time Energy

Bali is one of the provinces in Indonesia which has a lot of culture and arts, one of which is the Gamelan Jegog Bali. The technology nowadays can make it easier for humans to search for the title of a song that was previously unknown. This technology can be applied to the unknown title of Gamelan Jegog. The features used in this system are Short Time Energy and Zero Crossing Rate. The feature is extracted from Gamelan Jegog and then used to find the best k parameter from the K-Nearest Neighbor classifier. The results showed that the highest accuracy was 45% when the k parameter is 9. The amount of data used and the classification method used has an effect on the accuracy of this system when compared to similar studies.

Download Full-text

Voiced and unvoiced separation in malay speech using zero crossing rate and energy

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i2.pp775-780 ◽

2019 ◽

Vol 16 (2) ◽

pp. 775

Author(s):

Rafizah Mohd Hanifa ◽

Khalid Isa ◽

Shamsul Mohamad ◽

Shaharil Mohd Shah ◽

Shelena Soosay Nathan ◽

...

Keyword(s):

Real Time ◽

English Language ◽

Voice Recognition ◽

Time Data ◽

Zero Crossing ◽

Real Time Data ◽

Zero Crossing Rate ◽

Basic Characteristics ◽

Short Time ◽

Short Time Energy

<p>This paper contributes to the literature on voice-recognition in the context of non-English language. Specifically, it aims to validate the techniques used to present the basic characteristics of speech, viz. voiced and unvoiced, that need to be evaluated when analysing speech signals. Zero Crossing Rate (ZCR) and Short Time Energy (STE) are used in this paper to perform signal pre-processing of continuous Malay speech to separate the voiced and unvoiced parts. The study is based on non-real time data which was developed from a collection of audio speeches. The signal is assessed using ZCR and STE for comparison purposes. The results revealed that ZCR are low for voiced part and high for unvoiced part whereas the STE is high for voiced part and low for unvoiced part. Thus, these two techniques can be used effectively for separating voiced and unvoiced for continuous Malay speech.</p>

Download Full-text

Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals

2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE) ◽

10.1109/taeece.2013.6557272 ◽

2013 ◽

Cited By ~ 46

Author(s):

Madiha Jalil ◽

Faran Awais Butt ◽

Ahmed Malik

Keyword(s):

Speech Signals ◽

Zero Crossing ◽

Zero Crossing Rate ◽

Short Time ◽

Short Time Energy ◽

Autocorrelation Measurement

Download Full-text

Speech file compression by eliminating unvoiced/silence components

Sustainable Engineering and Innovation, ISSN 2712-0562 ◽

10.37868/sei.v3i1.119 ◽

2021 ◽

Vol 3 (1) ◽

pp. 11-14

Author(s):

Arda Şahin ◽

Mehmet Zübeyir Ünlü

Keyword(s):

User Interface ◽

Speech Signal ◽

Noise Component ◽

Zero Crossing ◽

Zero Crossing Rate ◽

Short Time ◽

Short Time Energy ◽

Voice Recording

The main objective of this study is to have noise component of a speech signal eliminated and compressing it by storing the locations and durations of silence regions. The separation between voiced, unvoiced, and silence regions are done by using the Short Time Energy (STE) and Zero Crossing Rate (ZCR) methodologies. All operations in this study have been performed by using the User Interface (UI) developed on MATLAB®. These operations include voice recording, playing the recording, eliminating the unwanted regions, playing the modified recording, saving of original and compressed files and loading the recording compressed.

Download Full-text

Über Energien von Drahtexplosionsstoßwellen / Energies of Shock Waves Produced bv Wire Explosions

Zeitschrift für Naturforschung A ◽

10.1515/zna-1973-0118 ◽

1973 ◽

Vol 28 (1) ◽

pp. 105-109 ◽

Cited By ~ 1

Author(s):

H. Jäger ◽

R. Schöfer

Keyword(s):

Shock Wave ◽

Shock Waves ◽

Energy Input ◽

Discharge Circuit ◽

Expansion Velocity ◽

Input Condition ◽

Wire Material ◽

Short Time ◽

The Waves ◽

Short Time Energy

For shock waves produced by special wire explosions the short time energy input condition of the theories of Lin, Sakurai and Vlases-Jones is fairly good fulfilled. In these cases the shock wave energies can be easily determined from the expansion velocity of the waves. Variation of the parameters of the discharge circuit show, how these parameters should be chosen in order to get a maximum transfer of energy either to the shock waves or to the wire material.

Download Full-text

Estimation of Most Probable Maximum From Short-Duration or Undersampled Time-Series Data

Volume 3: Structures, Safety and Reliability ◽

10.1115/omae2015-41701 ◽

2015 ◽

Author(s):

Puneet Agarwal ◽

William Walker ◽

Kenneth Bhalla

Keyword(s):

Time Series ◽

Gaussian Process ◽

Short Duration ◽

Time Series Data ◽

Extreme Value ◽

Series Data ◽

Sampled Data ◽

Zero Crossing ◽

Short Time ◽

Undersampled Data

The most probable maximum (MPM) is the extreme value statistic commonly used in the offshore industry. The extreme value of vessel motions, structural response, and environment are often expressed using the MPM. For a Gaussian process, the MPM is a function of the root-mean square and the zero-crossing rate of the process. Accurate estimates of the MPM may be obtained in frequency domain from spectral moments of the known power spectral density. If the MPM is to be estimated from the time-series of a random process, either from measurements or from simulations, the time series data should be of long enough duration, sampled at an adequate rate, and have an ensemble of multiple realizations. This is not the case when measured data is recorded for an insufficient duration, or one wants to make decisions (requiring an estimate of the MPM) in real-time based on observing the data only for a short duration. Sometimes, the instrumentation system may not be properly designed to measure the dynamic vessel motions with a fine sampling rate, or it may be a legacy instrumentation system. The question then becomes whether the short-duration and/or the undersampled data is useful at all, or if some useful information (i.e., an estimate of MPM) can be extracted, and if yes, what is the accuracy and uncertainty of such estimates. In this paper, a procedure for estimation of the MPM from the short-time maxima, i.e., the maximum value from a time series of short duration (say, 10 or 30 minutes), is presented. For this purpose pitch data is simulated from the vessel RAOs (response amplitude operators). Factors to convert the short-time maxima to the MPM are computed for various non-exceedance levels. It is shown that the factors estimated from simulation can also be obtained from the theory of extremes of a Gaussian process. Afterwards, estimation of the MPM from the short-time maxima is explored for an undersampled process; however, undersampled data must not be used and only the adequately sampled data should be utilized. It is found that the undersampled data can be somewhat useful and factors to convert the short-time maxima to the MPM can be derived for an associated non-exceedance level. However, compared to the adequately sampled data, the factors for the undersampled data are less useful since they depend on more variables and have more uncertainty. While the vessel pitch data was the focus of this paper, the results and conclusions are valid for any adequately sampled narrow-banded Gaussian process.

Download Full-text