speech recognition engine Latest Research Papers

Performance of Forced-Alignment Algorithms on Children's Speech

Journal of Speech Language and Hearing Research ◽

10.1044/2020_jslhr-20-00268 ◽

2021 ◽

pp. 1-10

Author(s):

Tristan J. Mahr ◽

Visar Berisha ◽

Kan Kawabata ◽

Julie Liss ◽

Katherine C. Hustad

Keyword(s):

Gold Standard ◽

Acoustic Measurement ◽

Manual Segmentation ◽

Speech Sample ◽

Older Children ◽

Adaptive Training ◽

Alignment Algorithms ◽

Child Speech ◽

Speech Recognition Engine ◽

Children's Speech

Purpose Acoustic measurement of speech sounds requires first segmenting the speech signal into relevant units (words, phones, etc.). Manual segmentation is cumbersome and time consuming. Forced-alignment algorithms automate this process by aligning a transcript and a speech sample. We compared the phoneme-level alignment performance of five available forced-alignment algorithms on a corpus of child speech. Our goal was to document aligner performance for child speech researchers. Method The child speech sample included 42 children between 3 and 6 years of age. The corpus was force-aligned using the Montreal Forced Aligner with and without speaker adaptive training, triphone alignment from the Kaldi speech recognition engine, the Prosodylab-Aligner, and the Penn Phonetics Lab Forced Aligner. The sample was also manually aligned to create gold-standard alignments. We evaluated alignment algorithms in terms of accuracy (whether the interval covers the midpoint of the manual alignment) and difference in phone-onset times between the automatic and manual intervals. Results The Montreal Forced Aligner with speaker adaptive training showed the highest accuracy and smallest timing differences. Vowels were consistently the most accurately aligned class of sounds across all the aligners, and alignment accuracy increased with age for fricative sounds across the aligners too. Conclusion The best-performing aligner fell just short of human-level reliability for forced alignment. Researchers can use forced alignment with child speech for certain classes of sounds (vowels, fricatives for older children), especially as part of a semi-automated workflow where alignments are later inspected for gross errors. Supplemental Material https://doi.org/10.23641/asha.14167058

A speech recognition approach for an industrial training station

MATEC Web of Conferences ◽

10.1051/matecconf/202134304003 ◽

2021 ◽

Vol 343 ◽

pp. 04003

Author(s):

Valentin-Cătălin Govoreanu ◽

Adrian-Nicolae Ţocu ◽

Alin-Marius Cruceat ◽

Dragoş Circa

Keyword(s):

Speech Recognition ◽

Learning Process ◽

Software Components ◽

Training Procedure ◽

Industrial Training ◽

Speech Recognition Engine ◽

The Many ◽

Microservice Architecture ◽

Training Station

This paper presents a speech recognition service used in the context of commanding and guiding the activities around an industrial training station. The entire concept is built on a decentralized microservice architecture and one of the many hardware and software components is the speech recognition engine. This engine grants users the possibility to interact seamlessly with other components in order to ensure a gradual and productive learning process. By working with different API’s for both English and Romanian languages, the presented approach manages to obtain good speech recognition for defining task phrases aiding the training procedure and to reduce the recognition required time by almost 50%.

Performance of forced-alignment algorithms on children’s speech

10.31234/osf.io/97jp4 ◽

2020 ◽

Author(s):

Tristan Mahr ◽

Visar Berisha ◽

Kan Kawabata ◽

Julie Liss ◽

Katherine Hustad

Keyword(s):

Speech Recognition ◽

Gold Standard ◽

Alignment Accuracy ◽

Speech Sample ◽

Older Children ◽

Adaptive Training ◽

Alignment Algorithms ◽

Child Speech ◽

Speech Recognition Engine ◽

Children's Speech

Aim. We compared the performance of five forced-alignment algorithms on a corpus of child speech.Method. The child speech sample included 42 children between 3 and 6 years of age. The corpus was force-aligned using the Montreal Forced Aligner with and without speaker adaptive training, triphone alignment from the Kaldi speech recognition engine, the Prosodylab Aligner, and the Penn Phonetics Lab Forced Aligner. The sample was also manually aligned to create gold-standard alignments. We evaluated alignment algorithms in terms of accuracy (whether the interval covers the midpoint of the manual alignment) and difference in phone-onset times between the automatic and manual intervals.Results. The Montreal Forced Aligner with speaker adaptive training showed the highest accuracy and smallest timing differences. Vowels were consistently the most accurately aligned class of sounds across all the aligners, and alignment accuracy increased with age for fricative sounds across the aligners too. Interpretation. The best-performing aligner fell just short of human-level reliability for forced alignment. Researchers can use forced alignment with child speech for certain classes of sounds (vowels, fricatives for older children), especially as part of a semi-automated workflow where alignments are later inspected for gross errors.

Assistive Mobile Application for Visually Impaired People

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v14i16.15295 ◽

2020 ◽

Vol 14 (16) ◽

pp. 52

Author(s):

Sriraksha Nayak ◽

Chandrakala C B

Keyword(s):

Text Analysis ◽

Mobile Application ◽

Visually Impaired ◽

World Health ◽

Blind People ◽

Schedule Management ◽

Visually Impaired People ◽

Impaired People ◽

Speech Recognition Engine ◽

Health Organization

According to the World Health Organization estimation, globally the number of people with some visual impairment is estimated to be 285 million, of whom 39 million are blind. The inability to use features such as sending and reading of email, schedule management, pathfinding or outdoor navigation, and reading SMS is a disadvantage for blind people in many professional and educational situations. Speech or text analysis can help improve support for visually-impaired people. Users can speak a command to perform a task. The spoken command will be interpreted by the Speech Recognition Engine (SRE) and can be converted into text or perform suitable actions. In this paper, an application that allows schedule management, emailing, and SMS reading completely based on voice command is proposed, implemented, and validated. The System hopes to provide blind people to simply speak the desired functionality and be guided thereby the system’s audio instructions. The proposed and designed app is implemented to support three languages which are English, Hindi, and Kannada.

Attention-Sharing Initiative of Multimodal Processing in Simultaneous Interpreting

International Journal of Translation Interpretation and Applied Linguistics ◽

10.4018/ijtial.20200701.oa4 ◽

2020 ◽

Vol 2 (2) ◽

pp. 42-53

Author(s):

Tianyun Li ◽

Bicheng Fan

Keyword(s):

Working Memory ◽

Real Time ◽

Facial Expressions ◽

Low Frequency ◽

Simultaneous Interpreting ◽

Dual Channel ◽

Optimization Principle ◽

Multimodal Processing ◽

Channel Information ◽

Speech Recognition Engine

This study sets out to describe simultaneous interpreters' attention-sharing initiatives when exposed under input from both videotaped speech recording and real-time transcriptions. Separation of mental energy in acquiring visual input accords with the human brain's statistic optimization principle where the same property of an object is presented through diverse fashions. In examining professional interpreters' initiatives, the authors invited five professional English-Chinese conference interpreters to simultaneously interpret a videotaped speech with real-time captions generated by speech recognition engine while meanwhile monitoring their eye movements. The results indicate the professional interpreters' preferences in referring to visually presented captions along with the speaker's facial expressions, where low-frequency words, proper names, and numbers gained greater attention than words with higher frequency. This phenomenon might be explained by the working memory theory in which the central executive enables redundancy gains retrieved from dual-channel information.

Design and Control of Dual-Arm Cooperative Manipulator using Speech Commands

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1025.1291s319 ◽

2019 ◽

Vol 9 (1S3) ◽

pp. 122-129

Keyword(s):

End User ◽

Robot Arms ◽

Simultaneous Control ◽

Android App ◽

The Subject ◽

Servo Controller ◽

Speech Recognition Engine ◽

And Control ◽

Dual Arm ◽

Cooperative Manipulators

Cooperative manipulators are among the subject of interest in the scientific community for the last few years. Here an overview of the design and control of such cooperative manipulators using Speech Commands in English, Hindi, and Tamil is discussed. Here we choose two identical Robot arms from lynxmotion, and both manipulators move in conjunction with one another to achieve more payload while grasping or handling the object by the end effector. The simultaneous control of identical robot manipulators could be performed by pronouncing simple speech commands by the end user using a smartphone, which then is converted into text format using a speech recognition engine and this text fed to servo controller helps in actuating the joints of identical robot arms. Cooperative manipulators are used for handling radioactive elements and also in the field of medicine as rehabilitation aid and also in surgeries. An Android app specifically built for this purpose communicates through Bluetooth technology makes the interface for end-user simple to control both identical robot arms simultaneously.

Isolated Word based Spoken Dialogue System using Odia Phones

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666191218114024 ◽

2019 ◽

Vol 13 ◽

Author(s):

Basanta Kuamr Swain ◽

Sanghamitra Mohanty ◽

Chiranji Lal Chowdhary

Keyword(s):

Speech Recognition ◽

Speaker Verification ◽

Dialogue System ◽

Accuracy Rate ◽

Spoken Dialogue ◽

Spoken Dialogue System ◽

Fuzzy C Means ◽

Average Accuracy ◽

Isolated Word ◽

Speech Recognition Engine

: In this research paper, we have developed a spoken dialogue system using Odia phone set. We have also added additional security feature to our developed spoken dialogue system by integrating with speaker verification module, which allows the services to only genuine users. The spoken dialogue system can give the bouquet of services relating to opening of frequently usage applications, files and folders that are either installed or stored in user’s computers. The spoken dialogue system also responds to the users in synthesized speech mode relating to the service. The spoken dialogue system can be used to keep the desktop of computer from free of clutter. We have used HMM based Odia isolated word speech recognition engine and fuzzy c-means based speaker verification module in development of spoken dialogue system. The accuracy of Odia speech recognition engine is found as 78.22 % and 62.31% for seen and unseen users respectively and the average accuracy rate of speaker verification module is found as 66.2%.

Challenges in using a Standard Speech Recognition Engine in Small Vocabulary Domain

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1182.0782s619 ◽

2019 ◽

Vol 8 (2S6) ◽

pp. 953-958

Keyword(s):

Speech Recognition ◽

Air Traffic ◽

Standard English ◽

Air Traffic Controller ◽

Recognition Model ◽

Named Entity ◽

Domain Specific ◽

Recognition Systems ◽

Speech Recognition Engine ◽

The Given

This paper discusses the challenges and proposes recommendations on using a standard speech recognition engine for a small vocabulary Air Traffic Controller Pilot communication domain. With the given challenges in transcribing the Air Traffic Communication due to the inherent radio issues in cockpit and the con-troller room, gathering the corpus for training the speech recognition model is another important problem. Taking advantage of the maturity of today’s speech recognition systems for the standard English words used in the communication, this paper focusses on the challenges in decoding the domain specific named entity words used in the communication.

Speech Recognition Engine using ConvNet for the development of a Voice Command Controller for Fixed Wing Unmanned Aerial Vehicle (UAV)

2019 12th International Conference on Information & Communication Technology and System (ICTS) ◽

10.1109/icts.2019.8850961 ◽

2019 ◽

Author(s):

Cherry Mae J. Galangque ◽

Sherwin A. Guirnaldo

Keyword(s):

Speech Recognition ◽

Unmanned Aerial Vehicle ◽

Voice Command ◽

Aerial Vehicle ◽

Speech Recognition Engine

Artificial Neural Network based Automatic Speech Recognition Engine for Voice Controlled Micro Air Vehicles

2019 4th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT) ◽

10.1109/rteict46194.2019.9016983 ◽

2019 ◽

Author(s):

Sushma. M. Gowda ◽

D.K Rahul ◽

Anush Anand ◽

S. Veena ◽

Vinod B Durdi

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Micro Air Vehicles ◽

Artificial Neural ◽

Speech Recognition Engine ◽

Air Vehicles

speech recognition engine
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Performance of Forced-Alignment Algorithms on Children's Speech

A speech recognition approach for an industrial training station

Performance of forced-alignment algorithms on children’s speech

Assistive Mobile Application for Visually Impaired People

Attention-Sharing Initiative of Multimodal Processing in Simultaneous Interpreting

Design and Control of Dual-Arm Cooperative Manipulator using Speech Commands

Isolated Word based Spoken Dialogue System using Odia Phones

Challenges in using a Standard Speech Recognition Engine in Small Vocabulary Domain

Speech Recognition Engine using ConvNet for the development of a Voice Command Controller for Fixed Wing Unmanned Aerial Vehicle (UAV)

Artificial Neural Network based Automatic Speech Recognition Engine for Voice Controlled Micro Air Vehicles

Export Citation Format

speech recognition engineRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Performance of Forced-Alignment Algorithms on Children's Speech

A speech recognition approach for an industrial training station

Performance of forced-alignment algorithms on children’s speech

Assistive Mobile Application for Visually Impaired People

Attention-Sharing Initiative of Multimodal Processing in Simultaneous Interpreting

Design and Control of Dual-Arm Cooperative Manipulator using Speech Commands

Isolated Word based Spoken Dialogue System using Odia Phones

Challenges in using a Standard Speech Recognition Engine in Small Vocabulary Domain

Speech Recognition Engine using ConvNet for the development of a Voice Command Controller for Fixed Wing Unmanned Aerial Vehicle (UAV)

Artificial Neural Network based Automatic Speech Recognition Engine for Voice Controlled Micro Air Vehicles

speech recognition engine
Recently Published Documents