Speech file compression by eliminating unvoiced/silence components

The main objective of this study is to have noise component of a speech signal eliminated and compressing it by storing the locations and durations of silence regions. The separation between voiced, unvoiced, and silence regions are done by using the Short Time Energy (STE) and Zero Crossing Rate (ZCR) methodologies. All operations in this study have been performed by using the User Interface (UI) developed on MATLAB®. These operations include voice recording, playing the recording, eliminating the unwanted regions, playing the modified recording, saving of original and compressed files and loading the recording compressed.

Download Full-text

An effective age detection method based on short time energy and zero crossing rate

2014 2nd International Conference on Business and Information Management (ICBIM) ◽

10.1109/icbim.2014.6970942 ◽

2014 ◽

Cited By ~ 4

Author(s):

Dipen Nath ◽

Sanjib Kr. Kalita

Keyword(s):

Detection Method ◽

Zero Crossing ◽

Effective Age ◽

Zero Crossing Rate ◽

Short Time ◽

Short Time Energy

Download Full-text

Musical Instrument Recognition using Zero Crossing Rate and Short-time Energy

International Journal of Applied Information Systems ◽

10.5120/ijais12-450131 ◽

2012 ◽

Vol 1 (3) ◽

pp. 16-19 ◽

Cited By ~ 4

Author(s):

Sumit KumarBanchhor ◽

Arif Khan

Keyword(s):

Musical Instrument ◽

Zero Crossing ◽

Instrument Recognition ◽

Zero Crossing Rate ◽

Short Time ◽

Short Time Energy

Download Full-text

Penerapan Algoritma Levenberg-Marquadt dan Backpropagation Neural Network Untuk Klasifikasi Suara Manusia

Jurnal Buana Informatika ◽

10.24002/jbi.v4i1.327 ◽

2012 ◽

Vol 4 (1) ◽

Author(s):

David David

Keyword(s):

Neural Network ◽

Speech Processing ◽

Backpropagation Neural Network ◽

Zero Crossing ◽

Spectral Centroid ◽

Human Voice ◽

Voice Signal ◽

Zero Crossing Rate ◽

Short Time ◽

Short Time Energy

Abstract. Voice recognition technology is currently experiencing growth, especially in the case of speech processing. Speech processing is a way to extract the desired information from a voice signal. This study discusses the classification of human voice system male and female. Extract the characteristics of the voice signal in each frame time domain and frequency domain is to help simplify and speed calculations. The features for voice or other audio between Short Time Energy, Zero Crossing Rate, Spectral Centroid, and others. Test results show that the classification system the human voice using the backpropagation neural network and Levenberg-Marquadt algorithm to change matrix weight is very good because of the complexity and rapid calculation which is not too high. Database voice sample of 40 voices with the test data as much as 5 votes. The output of the system is the result of the classification that has been identified with a similarity value>=0.5 for male and <0.5 as a female. Testing using artificial neural network produced an average success rate in voice classification amounted to 91%.Keywords: Feature Extraction, Classification, Backpropagation, Levenberg-Marquadt Algorithm, Human VoiceÂ Abstrak. Teknologi pengenalan suara saat ini telah mengalami perkembangan terutama dalam hal speech processing. Speech processing merupakan suatu cara untuk mengekstrak informasi yang diinginkan dari sebuah sinyal suara. Penelitian ini membahas sistem klasifikasi suara manusia male dan female. Mengekstrak ciri dari sinyal suara setiap frame pada kawasan waktu dan kawasan frekuensi sangat membantu untukÂ menyederhanakan dan mempercepat perhitungan. Adapun fitur-fitur untuk suara atau audio antara lain Short Time Energy, Zero Crossing Rate, Spectral Centroid dan lain-lain. Hasil pengujian sistem menunjukkan bahwa klasifikasi suara manusia dengan menggunakan jaringan saraf tiruan backpropagation dan algoritma Levenberg-Marquadt untuk perubahan matriks bobot, sangat baik dan cepat karena kompleksitas perhitungan yang tidak terlalu tinggi. Database sample suara sebanyak 40 buah dengan data test sebanyak 5 suara. Output dari sistem adalah hasil klasifikasi yang telah dikenali dengan nilai kemiripan >= 0,5 sebagai pria dan < 0,5 sebagai wanita. Pengujian dengan menggunakan jaringan saraf tiruan dihasilkan rata-rata tingkat keberhasilan dalam klasifikasi suara adalah sebesar 91 %.Kata Kunci: Feature Extraction, Klasifikasi, Backpropagation, Algoritma Levenberg-Marquadt, Suara Manusia

Download Full-text

The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection

2017 20th International Conference of Computer and Information Technology (ICCIT) ◽

10.1109/iccitechn.2017.8281794 ◽

2017 ◽

Cited By ~ 3

Author(s):

Thein Htay Zaw ◽

Nu War

Keyword(s):

Prediction Error ◽

Linear Prediction ◽

Activity Detection ◽

Spectral Entropy ◽

Zero Crossing ◽

Zero Crossing Rate ◽

Entropy Zero ◽

Short Time ◽

Voice Activity ◽

Short Time Energy

Download Full-text

Optimization of K Parameters on KNN in Gamelan Jegog Title Classification Using Time Domain Features

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v10.i01.p12 ◽

2021 ◽

Vol 10 (1) ◽

pp. 91

Author(s):

Adis Luh Sankhya Artayani ◽

Luh Arida Ayu Rahning Putri

Keyword(s):

Time Domain ◽

Nearest Neighbor ◽

Classification Method ◽

K Nearest Neighbor ◽

Zero Crossing ◽

Nearest Neighbor Classifier ◽

Zero Crossing Rate ◽

Short Time ◽

Neighbor Classifier ◽

Short Time Energy

Bali is one of the provinces in Indonesia which has a lot of culture and arts, one of which is the Gamelan Jegog Bali. The technology nowadays can make it easier for humans to search for the title of a song that was previously unknown. This technology can be applied to the unknown title of Gamelan Jegog. The features used in this system are Short Time Energy and Zero Crossing Rate. The feature is extracted from Gamelan Jegog and then used to find the best k parameter from the K-Nearest Neighbor classifier. The results showed that the highest accuracy was 45% when the k parameter is 9. The amount of data used and the classification method used has an effect on the accuracy of this system when compared to similar studies.

Download Full-text

Voiced and unvoiced separation in malay speech using zero crossing rate and energy

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v16.i2.pp775-780 ◽

2019 ◽

Vol 16 (2) ◽

pp. 775

Author(s):

Rafizah Mohd Hanifa ◽

Khalid Isa ◽

Shamsul Mohamad ◽

Shaharil Mohd Shah ◽

Shelena Soosay Nathan ◽

...

Keyword(s):

Real Time ◽

English Language ◽

Voice Recognition ◽

Time Data ◽

Zero Crossing ◽

Real Time Data ◽

Zero Crossing Rate ◽

Basic Characteristics ◽

Short Time ◽

Short Time Energy

<p>This paper contributes to the literature on voice-recognition in the context of non-English language. Specifically, it aims to validate the techniques used to present the basic characteristics of speech, viz. voiced and unvoiced, that need to be evaluated when analysing speech signals. Zero Crossing Rate (ZCR) and Short Time Energy (STE) are used in this paper to perform signal pre-processing of continuous Malay speech to separate the voiced and unvoiced parts. The study is based on non-real time data which was developed from a collection of audio speeches. The signal is assessed using ZCR and STE for comparison purposes. The results revealed that ZCR are low for voiced part and high for unvoiced part whereas the STE is high for voiced part and low for unvoiced part. Thus, these two techniques can be used effectively for separating voiced and unvoiced for continuous Malay speech.</p>

Download Full-text

Short-time energy, magnitude, zero crossing rate and autocorrelation measurement for discriminating voiced and unvoiced segments of speech signals

2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE) ◽

10.1109/taeece.2013.6557272 ◽

2013 ◽

Cited By ~ 46

Author(s):

Madiha Jalil ◽

Faran Awais Butt ◽

Ahmed Malik

Keyword(s):

Speech Signals ◽

Zero Crossing ◽

Zero Crossing Rate ◽

Short Time ◽

Short Time Energy ◽

Autocorrelation Measurement

Download Full-text

Speaker‐independent word recognition method and system based upon zero‐crossing rate and energy measurement of analog speech signal

The Journal of the Acoustical Society of America ◽

10.1121/1.399796 ◽

1990 ◽

Vol 88 (2) ◽

pp. 1196-1196

Author(s):

Periagaram K. Rajasekaran

Keyword(s):

Word Recognition ◽

Speech Signal ◽

Energy Measurement ◽

Recognition Method ◽

Zero Crossing ◽

Speaker Independent ◽

Zero Crossing Rate

Download Full-text

Stop gap removal using spectral parameters for stuttered speech signal

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/521032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1862-1866

Keyword(s):

Speech Signal ◽

Spectral Energy ◽

Spectral Parameters ◽

Zero Crossing ◽

Syllable Repetition ◽

Zero Crossing Rate ◽

Automatic Removal

Stuttering is an involuntary disturbance in the fluent flow of speech characterized by disfluencies such as stop gaps, sound or syllable repetition or prolongation. There are high proportion of stop gaps in stuttering. This work presents automatic removal of stop gaps using combination of spectral parameters such as spectral energy, centroid, Entropy and Zero crossing rate. A method for detecting and removing stop gaps based on threshold is discussed in this paper

Download Full-text

Research on Speech Recognition Method in Multi Layer Perceptual Network Environment

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2021.15.107 ◽

2021 ◽

Vol 15 ◽

pp. 996-1004

Author(s):

Kai Zhao ◽

Dan Wang

Keyword(s):

Speech Recognition ◽

Speech Signal ◽

Recognition Performance ◽

Recognition Rate ◽

Average Energy ◽

Signal Recognition ◽

Network Environment ◽

Recognition Method ◽

Zero Crossing ◽

Zero Crossing Rate

Aiming at the problem of low recognition rate in speech recognition methods, a speech recognition method in multi-layer perceptual network environment is proposed. In the multi-layer perceptual network environment, the speech signal is processed in the filter by using the transfer function of the filter. According to the framing process, the speech signal is windowed and framing processed to remove the silence segment of the speech signal. At the same time, the average energy of the speech signal is calculated and the zero crossing rate is calculated to extract the characteristics of the speech signal. By analyzing the principle of speech signal recognition, the process of speech recognition is designed, and the speech recognition in multi-layer perceptual network environment is realized. The experimental results show that the speech recognition method designed in this paper has good speech recognition performance

Download Full-text