scholarly journals Requirements for Distributed Control of Automatic Speech Recognition (ASR), Speaker Identification/Speaker Verification (SI/SV), and Text-to-Speech (TTS) Resources

2005 ◽  
Author(s):  
D. Oran
Author(s):  
Keshav Sinha ◽  
Rasha Subhi Hameed ◽  
Partha Paul ◽  
Karan Pratap Singh

In recent years, the advancement in voice-based authentication leads in the field of numerous forensic voice authentication technology. For verification, the speech reference model is collected from various open-source clusters. In this chapter, the primary focus is on automatic speech recognition (ASR) technique which stores and retrieves the data and processes them in a scalable manner. There are the various conventional techniques for speech recognition such as BWT, SVD, and MFCC, but for automatic speech recognition, the efficiency of these conventional recognition techniques degrade. So, to overcome this problem, the authors propose a speech recognition system using E-SVD, D3-MFCC, and dynamic time wrapping (DTW). The speech signal captures its important qualities while discarding the unimportant and distracting features using D3-MFCC.


Author(s):  
Ms Pratheeksha ◽  
Pratheeksha Rai ◽  
Ms Vijetha

The system used in Language to Language Translation is the phrases spoken in one language are immediately spoken in other language by the device. Language to Language Translation is a three steps software process which includes Automatic Speech Recognition, Machine Translation and Voice Synthesis. Language to Language system includes the major speech translation projects using different approaches for Speech Recognition, Translation and Text to Speech synthesis highlighting the major pros and cons for the approach being used. Language translation is a process that takes the conversational phrase in one language as an input and translated speech phrases in another language as the output. The three components of language-to-language translation are connected in a sequential order. Automatic Speech Recognition (ASR) is responsible for converting the spoken phrases of source language to the text in the same language followed by machine translation which translates the source language to next target language text and finally the speech synthesizer is responsible for text to speech conversion of target language.


2020 ◽  
Vol 3 (1) ◽  
pp. 1-24
Author(s):  
Hammam Riza ◽  
Anto Satriyo Nugroho ◽  
Gunarso

BPPT mulai melakukan penelitian dan pengembangan di bidang kecerdasan buatan sejak tahun 1987 yaitu dengan keterlibatannya dalam proyek sistem mesin penerjemah multi bahasa yang disponsori oleh pemerintah Jepang. Penelitian di bidang mesin penerjemah ini terus berlanjut seiring dengan keterlibatan BPPT dalam beberapa proyek sesudahnya, antara lain proyek UNL, PAN Localization, ASEAN-MT, dan U-STAR.Beberapa metode pun telah digunakan dalam pembuatan sistem mesin penerjemah, dari penggunaan metode Interlingua yang berbasis aturan, berbasis statistik, sampai dengan metode sequence-to-sequenceyang menggunakan deep learning. Di bidang pemrosesan bahasa alami lainnya, BPPT juga melakukan riset dalam bidang pengenalan wicara atau ASR (Automatic Speech Recognition) yang telah menghasilkan produk komersial Perisalah yang berfungsi untuk mencatat segala bentuk pembicaraan di dalam rapat dan membuat notulensi secara cepat. Di bidang pembangkit wicara atau TTS (Text-to-Speech) BPPT telah memulai risetnya sejak tahun 2001 yang saat itu masih menggunakan metode diphone concatenation hingga saat ini menggunakan metode end-to-end.Selain riset di bidang teknologi pemrosesan alami, BPPT juga melakukan penelitian aplikasi kecerdasan buatan dalam pengolahan citra. Antara lain pengembangan sistem diagnosis Malaria, identifikasi individu memakai sidik jari, selaput pelangi, maupun wajah. Penelitiandi bidang biometrik ini seiring dengan tugas BPPT melakukan pendampingan Kementrian Dalam Negeri dalam implementasi KTP elektronik. Selain itu BPPT juga melakukan layanan pengujian KTP-elektronik bagi industri dalam negeri dari sisi teknologi kartu cerdas dan teknologi biometrik. BPPT juga turut mempersiapkan perancangan standar nasional biometrik (SNI) untuk pertukaran data, misalnya format penyimpanan data sidik jari pada chip KTP elektronik. Pada tahun 2019 BPPT memiliki Pusat Unggulan Iptek Biometrik yang menggalang kegiatan litbangyasa maupun layanan teknologi di bidang biometrik untuk kemandirian bangsa.


Author(s):  
Askars Salimbajevs

Automatic Speech Recognition (ASR) requires huge amounts of real user speech data to reach state-of-the-art performance. However, speech data conveys sensitive speaker attributes like identity that can be inferred and exploited for malicious purposes. Therefore, there is an interest in the collection of anonymized speech data that is processed by some voice conversion method. In this paper, we evaluate one of the voice conversion methods on Latvian speech data and also investigate if privacy-transformed data can be used to improve ASR acoustic models. Results show the effectiveness of voice conversion against state-of-the-art speaker verification models on Latvian speech and the effectiveness of using privacy-transformed data in ASR training.


Sign in / Sign up

Export Citation Format

Share Document