Dataset Suara dan Teks Berbahasa Indonesia Pada Rekaman Podcast dan Talk show

Salah satu faktor keberhasilan suatu model pembelajaran dalam machine learning atau deep learning adalah dataset yang digunakan. Pada tulisan ini menyajikan dataset suara dari rekaman podcast dan talk show beserta transkripsi berbahasa Indonesia. Dataset ini disajikan karena belum adanya ketersediaan dataset berbahasa Indonesia yang dapat diakses secara publik untuk digunakan pada pembelajaran model Text-to-Speech ataupun Audio Speech Recognition. Dataset terdiri dari 3270 rekaman yang diproses untuk mendapatkan transkripsi berupa teks atau kalimat berbahasa Indonesia. Dalam pembuatan dataset ini dilakukan beberapa tahapan seperti pra-pemrosesan, tahapan translasi, tahapan validasi pertama dan tahapan validasi kedua. Dataset dibuat dengan format yang mengikuti format dari dataset LJSpeech untuk memudahkan pemrosesan dataset ketika digunakan dalam suatu model sebagai input. Dataset ini diharapkan dapat membantu meningkatkan kualitas pembelajaran untuk pemrosesan Text-to-Speech seperti pada model Tacotron2 ataupun pada pemrosesan Audio Speech Recognition untuk bahasa Indonesia.

Download Full-text

Using Machine Learning Algorithms Combined with Deep Learning in Speech Recognition

10.1007/978-981-16-8062-5_35 ◽

2021 ◽

pp. 477-485

Author(s):

Vu Thanh Nguyen ◽

Mai Viet Tiep ◽

Phu Phuoc Huy ◽

Nguyen Thai Nho ◽

Luong The Dung ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Speech Recognition ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

Applied Sciences ◽

10.3390/app10196882 ◽

2020 ◽

Vol 10 (19) ◽

pp. 6882

Author(s):

Kostadin Mishev ◽

Aleksandra Karovska Ristovska ◽

Dimitar Trajanov ◽

Tome Eftimov ◽

Monika Simjanoska

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Speech Synthesis ◽

Feature Engineering ◽

Learning Approach ◽

Acoustic Model ◽

Text To Speech ◽

Text To Speech Synthesis ◽

Smooth Transitions ◽

Deep Learning Model

This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.

Download Full-text

A Modern Approach for Speech to Text and Text to Speech Conversion Application Using Machine Learning Techniques

International Journal for Research in Engineering Application & Management ◽

10.35291/2454-9150.2020.0303 ◽

2020 ◽

pp. 302-306

Keyword(s):

Machine Learning ◽

Speech Recognition ◽

Mobile Communications ◽

Markov Models ◽

Hidden Markov ◽

Text Message ◽

Data Access ◽

Machine Learning Techniques ◽

Text To Speech ◽

Modern Approach

Over the past few decades, designers are considering a range of applications ranging from mobile communications to automatic machine learning. Speeches are less commonly used in the electronic and computer field due to the complexity and variety of signals and sounds. By the use of modern algorithms and methods, speech signals are processed to recognize text. In this project, we will build a online speech to text engine. The program receives speech during the run through the microphone and uses sample speech to recognize the text. The known text can be saved to a file. This is being developed on Java platform using the eclipse workbench. Our speech-to-text program directly gets and converts speech into text. It can add other great applications, giving users a different choice of data input. A text-to-speech system can also improve accessibility of the system by providing data access options for users who are blind, deaf or disabled. Voice SMS application allows the user to record and convert spoken messages into an text message. The user can send messages to the phone number which is entered. Speech recognition is done through the Internet, connecting to Google's server. The application is based on input messages in English. Speech recognition uses a technique based on hidden Markov models (HMM - Hidden Markov Model). It is currently the most effective and flexible method of speech recognition.

Download Full-text

Keyword Recognition Device Cloud Based

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.37296 ◽

2021 ◽

Vol 9 (VIII) ◽

pp. 87-89

Author(s):

Apurv Singh Yadav

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Speech Recognition ◽

Internet Of Things ◽

Recognition System ◽

The Internet ◽

Speech Recognition System ◽

The Past ◽

Use Of The Internet ◽

The Internet Of Things

Over the past few decades speech recognition has been researched and developed tremendously. However in the past few years use of the Internet of things has been significantly increased and with it the essence of efficient speech recognition is beneficial more than ever. With the significant improvement in Machine Learning and Deep learning, speech recognition has become more efficient and applicable. This paper focuses on developing an efficient Speech recognition system using Deep Learning.

Download Full-text

Kaji Terap Kecerdasan Buatan di Badan Pengkajian dan Penerapan Teknologi

Jurnal Sistem Cerdas ◽

10.37396/jsc.v3i1.60 ◽

2020 ◽

Vol 3 (1) ◽

pp. 1-24

Author(s):

Hammam Riza ◽

Anto Satriyo Nugroho ◽

Gunarso

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Text To Speech

BPPT mulai melakukan penelitian dan pengembangan di bidang kecerdasan buatan sejak tahun 1987 yaitu dengan keterlibatannya dalam proyek sistem mesin penerjemah multi bahasa yang disponsori oleh pemerintah Jepang. Penelitian di bidang mesin penerjemah ini terus berlanjut seiring dengan keterlibatan BPPT dalam beberapa proyek sesudahnya, antara lain proyek UNL, PAN Localization, ASEAN-MT, dan U-STAR.Beberapa metode pun telah digunakan dalam pembuatan sistem mesin penerjemah, dari penggunaan metode Interlingua yang berbasis aturan, berbasis statistik, sampai dengan metode sequence-to-sequenceyang menggunakan deep learning. Di bidang pemrosesan bahasa alami lainnya, BPPT juga melakukan riset dalam bidang pengenalan wicara atau ASR (Automatic Speech Recognition) yang telah menghasilkan produk komersial Perisalah yang berfungsi untuk mencatat segala bentuk pembicaraan di dalam rapat dan membuat notulensi secara cepat. Di bidang pembangkit wicara atau TTS (Text-to-Speech) BPPT telah memulai risetnya sejak tahun 2001 yang saat itu masih menggunakan metode diphone concatenation hingga saat ini menggunakan metode end-to-end.Selain riset di bidang teknologi pemrosesan alami, BPPT juga melakukan penelitian aplikasi kecerdasan buatan dalam pengolahan citra. Antara lain pengembangan sistem diagnosis Malaria, identifikasi individu memakai sidik jari, selaput pelangi, maupun wajah. Penelitiandi bidang biometrik ini seiring dengan tugas BPPT melakukan pendampingan Kementrian Dalam Negeri dalam implementasi KTP elektronik. Selain itu BPPT juga melakukan layanan pengujian KTP-elektronik bagi industri dalam negeri dari sisi teknologi kartu cerdas dan teknologi biometrik. BPPT juga turut mempersiapkan perancangan standar nasional biometrik (SNI) untuk pertukaran data, misalnya format penyimpanan data sidik jari pada chip KTP elektronik. Pada tahun 2019 BPPT memiliki Pusat Unggulan Iptek Biometrik yang menggalang kegiatan litbangyasa maupun layanan teknologi di bidang biometrik untuk kemandirian bangsa.

Download Full-text

Automatic Speech Recognition (ASR) Systems for Learning Arabic Language and Al-Quran Recitation: A Review

International Journal of Computer Science and Mobile Computing ◽

10.47760/ijcsmc.2021.v10i07.013 ◽

2021 ◽

Vol 10 (7) ◽

pp. 91-100

Author(s):

Nazik O’mar Balula ◽

Mohsen Rashwan ◽

Shrief Abdou

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Natural Language Processing ◽

Speech Recognition ◽

Language Processing ◽

Automatic Speech Recognition ◽

Communication Technologies ◽

Arabic Language ◽

Literature Survey ◽

Verification Systems

This paper provides a literature survey about Automatic Speech Recognition (ASR) systems for learning Arabic language and Al-Quran Recitation. The growth in communication technologies and AI (specially Machine learning and Deep learning) led researchers in ASR field to thinking of and developing ASR systems which mimic humans in their understand of natural speech and recognition. One of the most important applications in ASR is natural language processing (NLP). Arabic language is one of these languages. ASR systems which developed for Arabic language help Arabs and non-Arabs in learning Arabic language and so Al-Quran recitation and memorization in proper way according to recitation rules (Tajweed). This paper concentrate on ASR systems in general, challenges, PROS, CONS, Arabic language ASR systems and challenges faced them and finally Al-Quran recitation verification systems.

Download Full-text

Design of English text-to-speech conversion algorithm based on machine learning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189238 ◽

2020 ◽

pp. 1-12

Author(s):

Li Dongmei

Keyword(s):

Machine Learning ◽

Speech Synthesis ◽

Feature Recognition ◽

Learning Algorithm ◽

Morphological Structure ◽

English Text ◽

Text To Speech ◽

Part Of Speech ◽

Modern Computer ◽

Conversion Algorithm

English text-to-speech conversion is the key content of modern computer technology research. Its difficulty is that there are large errors in the conversion process of text-to-speech feature recognition, and it is difficult to apply the English text-to-speech conversion algorithm to the system. In order to improve the efficiency of the English text-to-speech conversion, based on the machine learning algorithm, after the original voice waveform is labeled with the pitch, this article modifies the rhythm through PSOLA, and uses the C4.5 algorithm to train a decision tree for judging pronunciation of polyphones. In order to evaluate the performance of pronunciation discrimination method based on part-of-speech rules and HMM-based prosody hierarchy prediction in speech synthesis systems, this study constructed a system model. In addition, the waveform stitching method and PSOLA are used to synthesize the sound. For words whose main stress cannot be discriminated by morphological structure, label learning can be done by machine learning methods. Finally, this study evaluates and analyzes the performance of the algorithm through control experiments. The results show that the algorithm proposed in this paper has good performance and has a certain practical effect.

Download Full-text

Deep Learning Based High-Resolution Remote Sensing Image classification

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i10.384 ◽

2017 ◽

Vol 7 (10) ◽

pp. 22

Author(s):

Sumit Kaur

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Deep Learning ◽

Image Classification ◽

Language Processing ◽

Object Perception ◽

Remote Sensing Image ◽

Research Area ◽

Remote Sensing Image Classification ◽

Unsupervised Algorithms

Abstract- Deep learning is an emerging research area in machine learning and pattern recognition field which has been presented with the goal of drawing Machine Learning nearer to one of its unique objectives, Artificial Intelligence. It tries to mimic the human brain, which is capable of processing and learning from the complex input data and solving different kinds of complicated tasks well. Deep learning (DL) basically based on a set of supervised and unsupervised algorithms that attempt to model higher level abstractions in data and make it self-learning for hierarchical representation for classification. In the recent years, it has attracted much attention due to its state-of-the-art performance in diverse areas like object perception, speech recognition, computer vision, collaborative filtering and natural language processing. This paper will present a survey on different deep learning techniques for remote sensing image classification.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text

Design Of A Voice Controlled Home Automation System Using Deep Learning Convolutional Neural Network (DL-CNN)

Telekontran : Jurnal Ilmiah Telekomunikasi, Kendali dan Elektronika Terapan ◽

10.34010/telekontran.v8i1.3078 ◽

2020 ◽

Vol 8 (1) ◽

pp. 57-73

Author(s):

Lery Sakti Ramba

Keyword(s):

Deep Learning ◽

Speech Recognition ◽

Background Noise ◽

Electronic Devices ◽

Recognition System ◽

Background Intensity ◽

Automation System ◽

Home Automation ◽

Speech Recognition System ◽

Home Automation System

The purpose of this research is to design home automation system that can be controlled using voice commands. This research was conducted by studying other research related to the topics in this research, discussing with competent parties, designing systems, testing systems, and conducting analyzes based on tests that have been done. In this research voice recognition system was designed using Deep Learning Convolutional Neural Networks (DL-CNN). The CNN model that has been designed will then be trained to recognize several kinds of voice commands. The result of this research is a speech recognition system that can be used to control several electronic devices connected to the system. The speech recognition system in this research has a 100% success rate in room conditions with background intensity of 24dB (silent), 67.67% in room conditions with 42dB background noise intensity, and only 51.67% in room conditions with background intensity noise 52dB (noisy). The percentage of the success of the speech recognition system in this research is strongly influenced by the intensity of background noise in a room. Therefore, to obtain optimal results, the speech recognition system in this research is more suitable for use in rooms with low intensity background noise.

Download Full-text

Dataset Suara dan Teks Berbahasa Indonesia Pada Rekaman Podcast dan Talk show

Using Machine Learning Algorithms Combined with Deep Learning in Speech Recognition

MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language

A Modern Approach for Speech to Text and Text to Speech Conversion Application Using Machine Learning Techniques

Keyword Recognition Device Cloud Based

Kaji Terap Kecerdasan Buatan di Badan Pengkajian dan Penerapan Teknologi

Automatic Speech Recognition (ASR) Systems for Learning Arabic Language and Al-Quran Recitation: A Review﻿

Design of English text-to-speech conversion algorithm based on machine learning

Deep Learning Based High-Resolution Remote Sensing Image classification

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

Design Of A Voice Controlled Home Automation System Using Deep Learning Convolutional Neural Network (DL-CNN)

Automatic Speech Recognition (ASR) Systems for Learning Arabic Language and Al-Quran Recitation: A Review