scholarly journals Dataset Suara dan Teks Berbahasa Indonesia Pada Rekaman Podcast dan Talk show

2021 ◽  
Vol 11 (2) ◽  
pp. 61-66
Author(s):  
Martin Novela ◽  
T. Basaruddin

Salah satu faktor keberhasilan suatu model pembelajaran dalam machine learning atau deep learning adalah dataset yang digunakan. Pada tulisan ini menyajikan dataset suara dari rekaman podcast dan talk show beserta transkripsi berbahasa Indonesia. Dataset ini disajikan karena belum adanya ketersediaan dataset berbahasa Indonesia yang dapat diakses secara publik untuk digunakan pada pembelajaran model Text-to-Speech ataupun Audio Speech Recognition. Dataset terdiri dari 3270 rekaman yang diproses untuk mendapatkan transkripsi berupa teks atau kalimat berbahasa Indonesia. Dalam pembuatan dataset ini dilakukan beberapa tahapan seperti pra-pemrosesan, tahapan translasi, tahapan validasi pertama dan tahapan validasi kedua. Dataset dibuat dengan format yang mengikuti format dari dataset LJSpeech untuk memudahkan pemrosesan dataset ketika digunakan dalam suatu model sebagai input. Dataset ini diharapkan dapat membantu meningkatkan kualitas pembelajaran untuk pemrosesan Text-to-Speech seperti pada model Tacotron2 ataupun pada pemrosesan Audio Speech Recognition untuk bahasa Indonesia.

2021 ◽  
pp. 477-485
Author(s):  
Vu Thanh Nguyen ◽  
Mai Viet Tiep ◽  
Phu Phuoc Huy ◽  
Nguyen Thai Nho ◽  
Luong The Dung ◽  
...  

2020 ◽  
Vol 10 (19) ◽  
pp. 6882
Author(s):  
Kostadin Mishev ◽  
Aleksandra Karovska Ristovska ◽  
Dimitar Trajanov ◽  
Tome Eftimov ◽  
Monika Simjanoska

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.


Over the past few decades, designers are considering a range of applications ranging from mobile communications to automatic machine learning. Speeches are less commonly used in the electronic and computer field due to the complexity and variety of signals and sounds. By the use of modern algorithms and methods, speech signals are processed to recognize text. In this project, we will build a online speech to text engine. The program receives speech during the run through the microphone and uses sample speech to recognize the text. The known text can be saved to a file. This is being developed on Java platform using the eclipse workbench. Our speech-to-text program directly gets and converts speech into text. It can add other great applications, giving users a different choice of data input. A text-to-speech system can also improve accessibility of the system by providing data access options for users who are blind, deaf or disabled. Voice SMS application allows the user to record and convert spoken messages into an text message. The user can send messages to the phone number which is entered. Speech recognition is done through the Internet, connecting to Google's server. The application is based on input messages in English. Speech recognition uses a technique based on hidden Markov models (HMM - Hidden Markov Model). It is currently the most effective and flexible method of speech recognition.


Author(s):  
Apurv Singh Yadav

Over the past few decades speech recognition has been researched and developed tremendously. However in the past few years use of the Internet of things has been significantly increased and with it the essence of efficient speech recognition is beneficial more than ever. With the significant improvement in Machine Learning and Deep learning, speech recognition has become more efficient and applicable. This paper focuses on developing an efficient Speech recognition system using Deep Learning.


2020 ◽  
Vol 3 (1) ◽  
pp. 1-24
Author(s):  
Hammam Riza ◽  
Anto Satriyo Nugroho ◽  
Gunarso

BPPT mulai melakukan penelitian dan pengembangan di bidang kecerdasan buatan sejak tahun 1987 yaitu dengan keterlibatannya dalam proyek sistem mesin penerjemah multi bahasa yang disponsori oleh pemerintah Jepang. Penelitian di bidang mesin penerjemah ini terus berlanjut seiring dengan keterlibatan BPPT dalam beberapa proyek sesudahnya, antara lain proyek UNL, PAN Localization, ASEAN-MT, dan U-STAR.Beberapa metode pun telah digunakan dalam pembuatan sistem mesin penerjemah, dari penggunaan metode Interlingua yang berbasis aturan, berbasis statistik, sampai dengan metode sequence-to-sequenceyang menggunakan deep learning. Di bidang pemrosesan bahasa alami lainnya, BPPT juga melakukan riset dalam bidang pengenalan wicara atau ASR (Automatic Speech Recognition) yang telah menghasilkan produk komersial Perisalah yang berfungsi untuk mencatat segala bentuk pembicaraan di dalam rapat dan membuat notulensi secara cepat. Di bidang pembangkit wicara atau TTS (Text-to-Speech) BPPT telah memulai risetnya sejak tahun 2001 yang saat itu masih menggunakan metode diphone concatenation hingga saat ini menggunakan metode end-to-end.Selain riset di bidang teknologi pemrosesan alami, BPPT juga melakukan penelitian aplikasi kecerdasan buatan dalam pengolahan citra. Antara lain pengembangan sistem diagnosis Malaria, identifikasi individu memakai sidik jari, selaput pelangi, maupun wajah. Penelitiandi bidang biometrik ini seiring dengan tugas BPPT melakukan pendampingan Kementrian Dalam Negeri dalam implementasi KTP elektronik. Selain itu BPPT juga melakukan layanan pengujian KTP-elektronik bagi industri dalam negeri dari sisi teknologi kartu cerdas dan teknologi biometrik. BPPT juga turut mempersiapkan perancangan standar nasional biometrik (SNI) untuk pertukaran data, misalnya format penyimpanan data sidik jari pada chip KTP elektronik. Pada tahun 2019 BPPT memiliki Pusat Unggulan Iptek Biometrik yang menggalang kegiatan litbangyasa maupun layanan teknologi di bidang biometrik untuk kemandirian bangsa.


Author(s):  
Nazik O’mar Balula ◽  
Mohsen Rashwan ◽  
Shrief Abdou

This paper provides a literature survey about Automatic Speech Recognition (ASR) systems for learning Arabic language and Al-Quran Recitation. The growth in communication technologies and AI (specially Machine learning and Deep learning) led researchers in ASR field to thinking of and developing ASR systems which mimic humans in their understand of natural speech and recognition. One of the most important applications in ASR is natural language processing (NLP). Arabic language is one of these languages. ASR systems which developed for Arabic language help Arabs and non-Arabs in learning Arabic language and so Al-Quran recitation and memorization in proper way according to recitation rules (Tajweed). This paper concentrate on ASR systems in general, challenges, PROS, CONS, Arabic language ASR systems and challenges faced them and finally Al-Quran recitation verification systems.


2020 ◽  
pp. 1-12
Author(s):  
Li Dongmei

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.


Author(s):  
Sumit Kaur

Abstract- Deep learning is an emerging research area in machine learning and pattern recognition field which has been presented with the goal of drawing Machine Learning nearer to one of its unique objectives, Artificial Intelligence. It tries to mimic the human brain, which is capable of processing and learning from the complex input data and solving different kinds of complicated tasks well. Deep learning (DL) basically based on a set of supervised and unsupervised algorithms that attempt to model higher level abstractions in data and make it self-learning for hierarchical representation for classification. In the recent years, it has attracted much attention due to its state-of-the-art performance in diverse areas like object perception, speech recognition, computer vision, collaborative filtering and natural language processing. This paper will present a survey on different deep learning techniques for remote sensing image classification. 


2020 ◽  
Author(s):  
Saeed Nosratabadi ◽  
Amir Mosavi ◽  
Puhong Duan ◽  
Pedram Ghamisi ◽  
Ferdinand Filip ◽  
...  

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.


Author(s):  
Lery Sakti Ramba

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.


Sign in / Sign up

Export Citation Format

Share Document