QVR: Quranic Verses Recitation Recognition System using PocketSphinx

2021 ◽  
Vol 02 (02) ◽  
Author(s):  
Hasan Ali Gamal Al-Kaf ◽  
◽  
Muhammad Suhaimi Sulong ◽  
Ariffuddin Joret ◽  
Nuramin Fitri Aminuddin ◽  
...  

The recitation of Quran verses according to the actual tajweed is obligatory and it must be accurate and precise in pronunciation. Hence, it should always be reviewed by an expert on the recitation of the Quran. Through the latest technology, this recitation review can be implemented through an application system and it is most appropriate in this current Covid-19 pandemic situation where system application online is deemed to be developed. In this empirical study, a recognition system so-called the Quranic Verse Recitation Recognition (QVR) system using PocketSphinx to convert the Quranic verse from Arabic sound to Roman text and determine the accuracy of reciters, has been developed. The Graphical User Interface (GUI) of the system with a user-friendly environment was designed using Microsoft Visual Basic 6 in an Ubuntu platform. A verse of surah al-Ikhlas has been chosen in this study and the data were collected by recording 855 audios as training data recorded by professional reciters. Another 105 audios were collected as testing data, to test the accuracy of the system. The results indicate that the system obtained a 100% accuracy with a 0.00% of word error rate (WER) for both training and testing data of the said audios via Quran Roman text. The system with automatic speech recognition (ASR) engine system demonstrates that it has been successfully designed and developed, and is significant to be extended further. Added, it will be improved with the addition of other Quran surahs.

2019 ◽  
Vol 8 (2) ◽  
pp. 1822-1827 ◽  

This paper presents a computer vision based emotion recognition system for the identification of six basic emotions among Filipino Gamers using deep learning techniques. In particular, the proposed system utilized deep learning through the Inception Network and Long-Short Term Memory (LSTM). The researchers gathered a database for Filipino Facial Expressions consisting of 74 gamers for the training data and 4 gamer subjects for the testing data. The system was able to produce a maximum categorical validation accuracy of .9983 and a test accuracy of .9940 for the six basic emotions using the Filipino database. The cross-database analysis results using the well-known Cohn -Kanade+ database showed that the proposed Inception-LSTM system has accuracy on a par with the current existing systems. The results demonstrated the feasibility of the proposed system and showed sample computations of empathy and engagement based on the six basic emotions as a proof of concept


JURTEKSI ◽  
2021 ◽  
Vol 7 (2) ◽  
pp. 195-202
Author(s):  
Sri Ayu Rizky ◽  
Rolly Yesputra ◽  
Santoso Santoso

Abstract: In this research, a prediction system has been successfully developed to predict whether or not a prospective money borrower will run smoothly. Prospective borrowers who will borrow, some of the data that meet the criteria will be inputted by the office clerk into a prediction application system interface to be processed using the Data Mining method, namely the K-Nearest Neighbor Algorithm with the Codeigniter programming language 3. The results of the Euclidean calculation process are based on predetermined criteria Between training data (training) to testing data (test) will be displayed with a table that has been sorted from smallest to largest containing 9 closest neighbors according to the K value that has been determined, namely 9. The nine neighbors will be taken the dominant category. This dominant category can be used as a guideline that makes it easier for the leader to make a decision on the next borrower.            Keywords: Data Mining; Euclidean; K-Nearest Neighbor; Prospective Borrowers;  Abstrak: Dalam penelitian ini telah berhasil dibuat sebuah sistem prediksi untuk memprediksi lancar atau tidak lancarnya seorang calon peminjam uang. Calon peminjam uang yang akan meminjam, sebagian datanya yang memenuhi kriteria akan diinputkan petugas kantor ke dalam sebuah interface sistem aplikasi prediksi untuk diolah menggunakan metode Data Mining yaitu Algoritma K-Nearest Neighbor dengan bahasa pemrograman Codeigniter 3. Hasil proses perhitungan Euclidean berdasarkan kriteria yang sudah ditentukan antara data training (latih) ke data testing (uji) tersebut akan ditampilkan dengan sebuah tabel yang sudah diurutkan dari yang terkecil ke terbesar berisi 9 tetangga terdekat sesuai dengan nilai K yang sudah ditentukan yaitu 9.  Sembilan tetangga tersebut akan diambil kategori yang dominan. Kategori yang dominan tersebut bisa dijadikan suatu pedoman yang memudahkan pimpinan dalam mengambil sebuah keputusan kepada calon peminjam selanjutnya. Kata kunci: Debitur; Data Mining; Euclidean; K-Nearest Neighbor


2019 ◽  
Vol 2 (2) ◽  
pp. 149-153
Author(s):  
Zulkarnaen Hatala

Dipaparkan prosedur untuk mengembangkan Sistem Pengenalan Suara otomatis, Automatic Speech Recognition System (ASR) untuk kasus online recognition. Prosedur ini  secara cepat dan efisien membangun ASR menggunakan Hidden Markov Toolkit (HTK). Langkah-langkah praktis ini dipaparkan secara jelas untuk mengimplementasikan ASR dengan daftar kata sedikit (Small Vocabulary) dalam contoh kasus pengenalan digit Bahasa Indonesia. Dijelaskan beberapa teknik meningkatkan performansi seperti cara mengatasi noise, pengejaan ganda dan penerapan Principle Component Analysis. Hasil akhir berupa Word Error Rate


2019 ◽  
Vol 8 (2S11) ◽  
pp. 2350-2352

the dissimilarity in recognizing the word sequence and their ground truth in different channels can be absorbed by implementing Automatic Speech Recognition which is the standard evaluation metric and is encountered with the phenomena of Word Error Rate for various measures. In the model of 1ch, the track is trained without any preprocessing and study on multichannel end-to-end Automatic Speech Recognition envisaged that the function can be integrated into (Deep Neural network) – based system and lead to multiple experimental results. More so, when the Word Error Rate (WER) is not directly differentiable, it is pertinent to adopt Encoder – Decoder gradient objective function which has been clear in CHiME-4 system. In this study, we examine that the sequence level evaluation metric is a fair choice for optimizing Encoder – Decoder model for which many training algorithms is designed to reduce sequence level error. The study incorporates the scoring of multiple hypotheses in decoding stage for improving the decoding result to optimum. By this, the mismatch between the objectives is resulted in a feasible form to the maxim. Hence, the study finds the result of voice recognition which is most effective for adaptation.


2019 ◽  
Vol 63 (5) ◽  
pp. 50402-1-50402-9 ◽  
Author(s):  
Ing-Jr Ding ◽  
Chong-Min Ruan

Abstract The acoustic-based automatic speech recognition (ASR) technique has been a matured technique and widely seen to be used in numerous applications. However, acoustic-based ASR will not maintain a standard performance for the disabled group with an abnormal face, that is atypical eye or mouth geometrical characteristics. For governing this problem, this article develops a three-dimensional (3D) sensor lip image based pronunciation recognition system where the 3D sensor is efficiently used to acquire the action variations of the lip shapes of the pronunciation action from a speaker. In this work, two different types of 3D lip features for pronunciation recognition are presented, 3D-(x, y, z) coordinate lip feature and 3D geometry lip feature parameters. For the 3D-(x, y, z) coordinate lip feature design, 18 location points, each of which has 3D-sized coordinates, around the outer and inner lips are properly defined. In the design of 3D geometry lip features, eight types of features considering the geometrical space characteristics of the inner lip are developed. In addition, feature fusion to combine both 3D-(x, y, z) coordinate and 3D geometry lip features is further considered. The presented 3D sensor lip image based feature evaluated the performance and effectiveness using the principal component analysis based classification calculation approach. Experimental results on pronunciation recognition of two different datasets, Mandarin syllables and Mandarin phrases, demonstrate the competitive performance of the presented 3D sensor lip image based pronunciation recognition system.


Author(s):  
Jianfeng Jiang

Objective: In order to diagnose the analog circuit fault correctly, an analog circuit fault diagnosis approach on basis of wavelet-based fractal analysis and multiple kernel support vector machine (MKSVM) is presented in the paper. Methods: Time responses of the circuit under different faults are measured, and then wavelet-based fractal analysis is used to process the collected time responses for the purpose of generating features for the signals. Kernel principal component analysis (KPCA) is applied to reduce the features’ dimensionality. Afterwards, features are divided into training data and testing data. MKSVM with its multiple parameters optimized by chaos particle swarm optimization (CPSO) algorithm is utilized to construct an analog circuit fault diagnosis model based on the testing data. Results: The proposed analog diagnosis approach is revealed by a four opamp biquad high-pass filter fault diagnosis simulation. Conclusion: The approach outperforms other commonly used methods in the comparisons.


Author(s):  
Xinyi Li ◽  
Liqiong Chang ◽  
Fangfang Song ◽  
Ju Wang ◽  
Xiaojiang Chen ◽  
...  

This paper focuses on a fundamental question in Wi-Fi-based gesture recognition: "Can we use the knowledge learned from some users to perform gesture recognition for others?". This problem is also known as cross-target recognition. It arises in many practical deployments of Wi-Fi-based gesture recognition where it is prohibitively expensive to collect training data from every single user. We present CrossGR, a low-cost cross-target gesture recognition system. As a departure from existing approaches, CrossGR does not require prior knowledge (such as who is currently performing a gesture) of the target user. Instead, CrossGR employs a deep neural network to extract user-agnostic but gesture-related Wi-Fi signal characteristics to perform gesture recognition. To provide sufficient training data to build an effective deep learning model, CrossGR employs a generative adversarial network to automatically generate many synthetic training data from a small set of real-world examples collected from a small number of users. Such a strategy allows CrossGR to minimize the user involvement and the associated cost in collecting training examples for building an accurate gesture recognition system. We evaluate CrossGR by applying it to perform gesture recognition across 10 users and 15 gestures. Experimental results show that CrossGR achieves an accuracy of over 82.6% (up to 99.75%). We demonstrate that CrossGR delivers comparable recognition accuracy, but uses an order of magnitude less training samples collected from the end-users when compared to state-of-the-art recognition systems.


Sign in / Sign up

Export Citation Format

Share Document