scholarly journals Analisis Forman Frekuensi pada Suara Manusia dengan Menggunakan Linear Prediction

MIND Journal ◽  
2021 ◽  
Vol 5 (1) ◽  
pp. 39-53
Author(s):  
IRMA AMELIA DEWI ◽  
MUHAMMAD ICHWAN ◽  
SALMA SILFIANA

AbstrakSuara manusia merupakan natural language sebagai salah satu gaya interaksi dengan komputer. Manusia mempunyai ragam suara yang berbeda, dapat dilihat dari formant, pitch dan volume suara. Masukan perintah suara yang baik bagi komputer dibutuhkan proses pencarian kualitas suara berdasarkan forman frekuensi. Pada penelitian ini tahapan proses diawali dengan pre-processing yaitu preemphasis, frame blocking dan windowing kemudian dilanjutkan pencarian nilai forman menggunakan Linear Prediction. Hasil nilai forman yang didapatkan dicocokan dengan nilai forman data latih yang berada pada database. Terdapat 2700 data suara uji dengan durasi perekaman suara dilakukan selama 1 detik. Berdasarkan hasil pengujian nilai forman yang diperoleh untuk F0 kisaran 0–423, nilai forman F1 kisaran 572-1678, nilai forman F2 kisaran 1536-2583, nilai forman F3 kisaran 2676-3384, nilai forman F4 kisaran 3519-4947.  Kata kunci: Forman, Frekuensi, Pitch , Linear PredictionAbstractHuman voice is a natural language as a style of interaction with computers. Humans have a variety of sounds can be seen from the formant, pitch and volume. Entering voice commands that are good for the computer requires the search for sound quality based on formant frequencies. In this study, the process stages begin pre-emphasis, frame blocking and windowing for noise reduction and searching formant values using Linear Prediction. The formant value obtained is matched with the formant value of the training data in database. There are 2700 test sound data with recording duration is 1 second. Based on test results obtained formant values for F0 range of 0-423, value range 572-1678 formants F1, F2 formant values range from 1536 to 2583, the value of the range of 2676-3384 formant F3, F4 formant values of the range of 3519-4947.Keywords: Formant, Frequency, Pitch, Linear prediction

1983 ◽  
Vol 26 (1) ◽  
pp. 89-97 ◽  
Author(s):  
Randall B. Monsen ◽  
A. Maynard Engebretson

The accuracy of spectrographic techniques and of linear prediction analysis in measuring formant frequencies is compared. The first three formant frequencies 90 synthetic speech tokens were measured by three experienced spectrographic readers and by linear prediction analysis. For fundamental frequencies between 100 and 300 Hz, both methods are accurate to within approximatey , ±60 Hz for both first and second formants. The third formant can be measured with the same degree of accuracy by linear prediction, but only to within ± 110 by spectrographic means. The accuracy of both methods decreases greatly when fundamental frequency is 350 Hz or greater. These limits of measurement appear to be within the range of the difference limens for formant frequencies


ALQALAM ◽  
2015 ◽  
Vol 32 (2) ◽  
pp. 284
Author(s):  
Muhammad Subali ◽  
Miftah Andriansyah ◽  
Christanto Sinambela

This article aims to look at the similarities and differences in the fundamental frequency and formant frequencies using the autocorrelation function and LPCfunction in GUI MATLAB 2012b on sound hijaiyah letters for adult male speaker beginner and expert based on makhraj pronunciation and both of speaker will be analysis on matching distance of the sound use DTW method on cepstrum. Subject for speech beginner makhraj pronunciation are taken from college student of Universitas Gunadarma and SITC aged 22 years old Data of the speech beginner makhraj pronunciation is recorded using MATLAB algorithm on GUI Subject for speech expert makhraj pronunciation are taken from previous research. They are 20-30 years old from the time of taking data. The sound will be extracted to get the value of the fundamental frequency and formant frequency. After getting both frequencies, it will be obtained analysis of the similarities and differences in the fundamental frequency and formant frequencies of speech beginner and expert and it will shows matching distance of both speech. The result is all of speech beginner and expert based on makhraj pronunciation have different values of fundamental frequency and formant frequency. Then the results of the analysis matching distance using method DTW showed that obtained in the range of 28.9746 to 136.4 between speech beginner and expert based on makhraj pronunciation. Keywords: fundamental frequency, formant frequency, hijaiyah letters, makhraj


2013 ◽  
Vol 134 (2) ◽  
pp. 1295-1313 ◽  
Author(s):  
Paavo Alku ◽  
Jouni Pohjalainen ◽  
Martti Vainio ◽  
Anne-Maria Laukkanen ◽  
Brad H. Story

2022 ◽  
pp. 1-13
Author(s):  
Denis Paperno

Abstract Can recurrent neural nets, inspired by human sequential data processing, learn to understand language? We construct simplified datasets reflecting core properties of natural language as modeled in formal syntax and semantics: recursive syntactic structure and compositionality. We find LSTM and GRU networks to generalise to compositional interpretation well, but only in the most favorable learning settings, with a well-paced curriculum, extensive training data, and left-to-right (but not right-to-left) composition.


2020 ◽  
Vol 34 (05) ◽  
pp. 8504-8511
Author(s):  
Arindam Mitra ◽  
Ishan Shrivastava ◽  
Chitta Baral

Natural Language Inference (NLI) plays an important role in many natural language processing tasks such as question answering. However, existing NLI modules that are trained on existing NLI datasets have several drawbacks. For example, they do not capture the notion of entity and role well and often end up making mistakes such as “Peter signed a deal” can be inferred from “John signed a deal”. As part of this work, we have developed two datasets that help mitigate such issues and make the systems better at understanding the notion of “entities” and “roles”. After training the existing models on the new dataset we observe that the existing models do not perform well on one of the new benchmark. We then propose a modification to the “word-to-word” attention function which has been uniformly reused across several popular NLI architectures. The resulting models perform as well as their unmodified counterparts on the existing benchmarks and perform significantly well on the new benchmarks that emphasize “roles” and “entities”.


2021 ◽  
Vol 8 (4) ◽  
pp. 787
Author(s):  
Moechammad Sarosa ◽  
Nailul Muna

<p class="Abstrak">Bencana alam merupakan suatu peristiwa yang dapat menyebabkan kerusakan dan menciptakan kekacuan. Bangunan yang runtuh dapat menyebabkan cidera dan kematian pada korban. Lokasi dan waktu kejadian bencana alam yang tidak dapat diprediksi oleh manusia berpotensi memakan korban yang tidak sedikit. Oleh karena itu, untuk mengurangi korban yang banyak, setelah kejadian bencana alam, pertama yang harus dilakukan yaitu menemukan dan menyelamatkan korban yang terjebak. Penanganan evakuasi yang cepat harus dilakukan tim SAR untuk membantu korban. Namun pada kenyataannya, tim SAR mengalami kendala selama proses evakuasi korban. Mulai dari sulitnya medan yang dijangkau hingga terbatasnya peralatan yang dibutuhkan. Pada penelitian ini sistem diimplementasikan untuk deteksi korban bencana alam yang bertujuan untuk membantu mengembangkan peralatan tim SAR untuk menemukan korban bencana alam yang berbasis pengolahan citra. Algoritma yang digunakan untuk mendeteksi ada atau tidaknya korban pada gambar adalah <em>You Only Look Once</em> (YOLO). Terdapat dua macam algoritma YOLO yang diimplementasikan pada sistem yaitu YOLOv3 dan YOLOv3 Tiny. Dari hasil pengujian yang telah dilakukan didapatkan <em>F1 Score</em> mencapai 95.3% saat menggunakan YOLOv3 dengan menggunakan 100 data latih dan 100 data uji.</p><p class="Abstrak"> </p><p class="Abstrak"><strong><em>Abstract</em></strong></p><p class="Abstrak"> </p><p class="Abstract"><em>Natural disasters are events that can cause damage and create havoc. Buildings that collapse and can cause injury and death to victims. Humans can not predict the location and timing of natural disasters. After the natural disaster, the first thing to do is find and save trapped victims. The handling of rapid evacuation must be done by the SAR team to help victims to reduce the amount of loss due to natural disasters. But in reality, the process of evacuating victims of natural disasters is still a lot of obstacles experienced by the SAR team. It was starting from the difficulty of the terrain that is reached to the limited equipment needed. In this study, a natural disaster victim detection system was designed using image processing that aims to help find victims in difficult or vulnerable locations when directly reached by humans. In this study, a detection system for victims of natural disasters was implemented which aims to help develop equipment for the SAR team to find victims of natural disasters based on image processing. The algorithm used is You Only Look Once (YOLO). In this study, two types of YOLO algorithms were compared, namely YOLOv3 and YOLOv3 Tiny. From the test results that have been obtained, the F1 Score reaches 95.3% when using YOLOv3 with 100 training data and 100 test data.</em></p>


2021 ◽  
Author(s):  
Wilson Wongso ◽  
Henry Lucky ◽  
Derwin Suhartono

Abstract The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use.


2012 ◽  
Vol 23 (08) ◽  
pp. 606-615 ◽  
Author(s):  
HaiHong Liu ◽  
Hua Zhang ◽  
Ruth A. Bentler ◽  
Demin Han ◽  
Luo Zhang

Background: Transient noise can be disruptive for people wearing hearing aids. Ideally, the transient noise should be detected and controlled by the signal processor without disrupting speech and other intended input signals. A technology for detecting and controlling transient noises in hearing aids was evaluated in this study. Purpose: The purpose of this study was to evaluate the effectiveness of a transient noise reduction strategy on various transient noises and to determine whether the strategy has a negative impact on sound quality of intended speech inputs. Research Design: This was a quasi-experimental study. The study involved 24 hearing aid users. Each participant was asked to rate the parameters of speech clarity, transient noise loudness, and overall impression for speech stimuli under the algorithm-on and algorithm-off conditions. During the evaluation, three types of stimuli were used: transient noises, speech, and background noises. The transient noises included “knife on a ceramic board,” “mug on a tabletop,” “office door slamming,” “car door slamming,” and “pen tapping on countertop.” The speech sentences used for the test were presented by a male speaker in Mandarin. The background noises included “party noise” and “traffic noise.” All of these sounds were combined into five listening situations: (1) speech only, (2) transient noise only, (3) speech and transient noise, (4) background noise and transient noise, and (5) speech and background noise and transient noise. Results: There was no significant difference on the ratings of speech clarity between the algorithm-on and algorithm-off (t-test, p = 0.103). Further analysis revealed that speech clarity was significant better at 70 dB SLP than 55 dB SPL (p < 0.001). For transient noise loudness: under the algorithm-off condition, the percentages of subjects rating the transient noise to be somewhat soft, appropriate, somewhat loud, and too loud were 0.2, 47.1, 29.6, and 23.1%, respectively. The corresponding percentages under the algorithm-on were 3.0, 72.6, 22.9, and 1.4%, respectively. A significant difference on the ratings of the transient noise loudness was found between the algorithm-on and algorithm-off (t-test, p < 0.001). For overall impression for speech stimuli: under the algorithm-off condition, the percentage of subjects rating the algorithm to be not helpful at all, somewhat helpful, helpful, and very helpful for speech stimuli were 36.5, 20.8, 33.9, and 8.9%, respectively. Under the algorithm-on condition, the corresponding percentages were 35.0, 19.3, 30.7, and 15.0%, respectively. Statistical analysis revealed there was a significant difference on the ratings of overall impression on speech stimuli. The ratings under the algorithm-on condition were significantly more helpful for speech understanding than the ratings under algorithm-off (t-test, p < 0.001). Conclusions: The transient noise reduction strategy appropriately controlled the loudness for most of the transient noises and did not affect the sound quality, which could be beneficial to hearing aid wearers.


Sign in / Sign up

Export Citation Format

Share Document