scholarly journals Speech file compression by eliminating unvoiced/silence components

2021 ◽  
Vol 3 (1) ◽  
pp. 11-14
Author(s):  
Arda Şahin ◽  
Mehmet Zübeyir Ünlü

The main objective of this study is to have noise component of a speech signal eliminated and compressing it by storing the locations and durations of silence regions. The separation between voiced, unvoiced, and silence regions are done by using the Short Time Energy (STE) and Zero Crossing Rate (ZCR) methodologies. All operations in this study have been performed by using the User Interface (UI) developed on MATLAB®. These operations include voice recording, playing the recording, eliminating the unwanted regions, playing the modified recording, saving of original and compressed files and loading the recording compressed.

2012 ◽  
Vol 4 (1) ◽  
Author(s):  
David David

Abstract. Voice recognition technology is currently experiencing growth, especially in the case of speech processing. Speech processing is a way to extract the desired information from a voice signal. This study discusses the classification of human voice system male and female. Extract the characteristics of the voice signal in each frame time domain and frequency domain is to help simplify and speed calculations. The features for voice or other audio between Short Time Energy, Zero Crossing Rate, Spectral Centroid, and others. Test results show that the classification system the human voice using the backpropagation neural network and Levenberg-Marquadt algorithm to change matrix weight is very good because of the complexity and rapid calculation which is not too high. Database voice sample of 40 voices with the test data as much as 5 votes. The output of the system is the result of the classification that has been identified with a similarity value>=0.5 for male and <0.5 as a female. Testing using artificial neural network produced an average success rate in voice classification amounted to 91%.Keywords: Feature Extraction, Classification, Backpropagation, Levenberg-Marquadt Algorithm, Human Voice Abstrak. Teknologi pengenalan suara saat ini telah mengalami perkembangan terutama dalam hal speech processing. Speech processing merupakan suatu cara untuk mengekstrak informasi yang diinginkan dari sebuah sinyal suara. Penelitian ini membahas sistem klasifikasi suara manusia male dan female. Mengekstrak ciri dari sinyal suara setiap frame pada kawasan waktu dan kawasan frekuensi sangat membantu untuk  menyederhanakan dan mempercepat perhitungan. Adapun fitur-fitur untuk suara atau audio antara lain Short Time Energy, Zero Crossing Rate, Spectral Centroid dan lain-lain. Hasil pengujian sistem menunjukkan bahwa klasifikasi suara manusia dengan menggunakan jaringan saraf tiruan backpropagation dan algoritma Levenberg-Marquadt untuk perubahan matriks bobot, sangat baik dan cepat karena kompleksitas perhitungan yang tidak terlalu tinggi. Database sample suara sebanyak 40 buah dengan data test sebanyak 5 suara. Output dari sistem adalah hasil klasifikasi yang telah dikenali dengan nilai kemiripan >= 0,5 sebagai pria dan < 0,5 sebagai wanita. Pengujian dengan menggunakan jaringan saraf tiruan dihasilkan rata-rata tingkat keberhasilan dalam klasifikasi suara adalah sebesar 91 %.Kata Kunci: Feature Extraction, Klasifikasi, Backpropagation, Algoritma Levenberg-Marquadt, Suara Manusia


2021 ◽  
Vol 10 (1) ◽  
pp. 91
Author(s):  
Adis Luh Sankhya Artayani ◽  
Luh Arida Ayu Rahning Putri

Bali is one of the provinces in Indonesia which has a lot of culture and arts, one of which is the Gamelan Jegog Bali.  The technology nowadays can make it easier for humans to search for the title of a song that was previously unknown. This technology can be applied to the unknown title of Gamelan Jegog. The features used in this system are Short Time Energy and Zero Crossing Rate. The feature is extracted from Gamelan Jegog and then used to find the best k parameter from the K-Nearest Neighbor classifier. The results showed that the highest accuracy was 45% when the k parameter is 9. The amount of data used and the classification method used has an effect on the accuracy of this system when compared to similar studies.


Author(s):  
Rafizah Mohd Hanifa ◽  
Khalid Isa ◽  
Shamsul Mohamad ◽  
Shaharil Mohd Shah ◽  
Shelena Soosay Nathan ◽  
...  

<p>This paper contributes to the literature on voice-recognition in the context of non-English language. Specifically, it aims to validate the techniques used to present the basic characteristics of speech, viz. voiced and unvoiced, that need to be evaluated when analysing speech signals. Zero Crossing Rate (ZCR) and Short Time Energy (STE) are used in this paper to perform signal pre-processing of continuous Malay speech to separate the voiced and unvoiced parts. The study is based on non-real time data which was developed from a collection of audio speeches. The signal is assessed using ZCR and STE for comparison purposes. The results revealed that ZCR are low for voiced part and high for unvoiced part whereas the STE is high for voiced part and low for unvoiced part. Thus, these two techniques can be used effectively for separating voiced and unvoiced for continuous Malay speech.</p>


Stuttering is an involuntary disturbance in the fluent flow of speech characterized by disfluencies such as stop gaps, sound or syllable repetition or prolongation. There are high proportion of stop gaps in stuttering. This work presents automatic removal of stop gaps using combination of spectral parameters such as spectral energy, centroid, Entropy and Zero crossing rate. A method for detecting and removing stop gaps based on threshold is discussed in this paper


Author(s):  
Kai Zhao ◽  
Dan Wang

Aiming at the problem of low recognition rate in speech recognition methods, a speech recognition method in multi-layer perceptual network environment is proposed. In the multi-layer perceptual network environment, the speech signal is processed in the filter by using the transfer function of the filter. According to the framing process, the speech signal is windowed and framing processed to remove the silence segment of the speech signal. At the same time, the average energy of the speech signal is calculated and the zero crossing rate is calculated to extract the characteristics of the speech signal. By analyzing the principle of speech signal recognition, the process of speech recognition is designed, and the speech recognition in multi-layer perceptual network environment is realized. The experimental results show that the speech recognition method designed in this paper has good speech recognition performance


Sign in / Sign up

Export Citation Format

Share Document