Visual speech recognition for small scale dataset using VGG16 convolution neural network

Convolution Neural Network Based Visual Speech Recognition System for Syllable Identification

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200917142628 ◽

2020 ◽

Vol 13 ◽

Author(s):

Hunny Pahuja ◽

Priya Ranjan ◽

Amit Ujlayan ◽

Ayush Goyal

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Real Time ◽

Acoustic Property ◽

Recognition System ◽

Convolution Neural Network ◽

Visual Speech ◽

Large Set ◽

Speech Recognition System ◽

Visual Speech Recognition

Introduction: This paper introduces novel and reliable approach for speech impaired people to assist them to communicate effectively in real time. A deep learning technique named as convolution neural network is used as its classifier. With the help of this algorithm, words are recognized from an input which is a visual speech, disregards with its audible or acoustic property. Methods: This network extracts the features from mouth stances and different images respectively. With the help of a source, non-audible mouth stances are taken as an input and then segregated as subsets to get desired output. The Complete Datum is then arranged to recognize the word as an affricate. Results: Convolution neural network is one of the most effective algorithms that extracts features, performs classification and provides the desired output from the input images for speech recognition system. Conclusion: Recognizing the syllables at real time from visual mouth stances input is the main objective of the proposed method. When tested, datum accuracy and quantity of training sets is giving satisfactory output. A small set of datum is taken as first step of learning. In future, large set of datum can be considered for analyzing the data. Discussion: On the basis of type of Datum, network proposed in this paper is tested to obtain its precision level. A network is maintained to identify the syllables but it fails when syllables are of same set. Requirement of Higher end graphics pro-cessing units is there to bring down the time consumption and increases the efficiency of network.

Download Full-text

End-to-end visual speech recognition for small-scale datasets

Pattern Recognition Letters ◽

10.1016/j.patrec.2020.01.022 ◽

2020 ◽

Vol 131 ◽

pp. 421-427 ◽

Cited By ~ 1

Author(s):

Stavros Petridis ◽

Yujiang Wang ◽

Pingchuan Ma ◽

Zuwei Li ◽

Maja Pantic

Keyword(s):

Speech Recognition ◽

Visual Speech ◽

Small Scale ◽

Visual Speech Recognition ◽

End To End

Download Full-text

Audio Visual Speech Recognition using Feed Forward Neural Network Architecture

2020 IEEE International Conference for Innovation in Technology (INOCON) ◽

10.1109/inocon50539.2020.9298429 ◽

2020 ◽

Author(s):

Shashidhar R ◽

S Patilkulkarni ◽

Puneeth S B

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Network Architecture ◽

Visual Speech ◽

Neural Network Architecture ◽

Feed Forward Neural Network ◽

Feed Forward ◽

Visual Speech Recognition

Download Full-text

Visual Speech Recognition of Korean Words Using Convolutional Neural Network

International Journal of Fuzzy Logic and Intelligent Systems ◽

10.5391/ijfis.2019.19.1.1 ◽

2019 ◽

Vol 19 (1) ◽

pp. 1-9

Author(s):

Sung-Won Lee ◽

Je-Hun Yu ◽

Seung Min Park ◽

Kwee-Bo Sim

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Convolutional Neural Network ◽

Visual Speech ◽

Visual Speech Recognition

Download Full-text

Integration of Audio video Speech Recognition using LSTM and Feed Forward Convolutional Neural Network

10.21203/rs.3.rs-173380/v1 ◽

2021 ◽

Author(s):

Shashidhar R ◽

Sudarshan Patil Kulkarni

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Good Accuracy ◽

Visual Speech ◽

Feed Forward Neural Network ◽

Final Model ◽

Feed Forward ◽

Speech Detection ◽

Visual Speech Recognition ◽

Lip Reading

Abstract In the current scenario, audio visual speech recognition is one of the emerging fields of research, but there is still deficiency of appropriate visual features for recognition of visual speech. Human lip-readers are increasingly being presented as useful in the gathering of forensic evidence but, like all human, suffer from unreliability in analyzing the lip movement. Here we used a custom dataset and design the system in such a way that it predicts the output for the lip reading. The problem of speaker independent lip-reading is very demanding due to unpredictable variations between people. Also due to recent developments and advances in the fields of signal processing and computer vision. The task of automating the lip reading is becoming a field of great interest. Here we use MFCC techniques for audio processing and LSTM method for visual speech recognition and finally integrate the audio and video using feed forward neural network (FFNN) and also got good accuracy. That is why the AVSR technique attract a great attention as a reliable solution for the speech detection problem. The final model was capable of taking more appropriate decision while predicting the spoken word. We were able to get a good accuracy of about 92.38% for the final model.

Download Full-text

Liveness Verification Using Deep Neural Network Based Visual Speech Recognition

International Journal of Multimedia and Image Processing ◽

10.20533/ijmip.2042.4647.2018.0047 ◽

2018 ◽

Vol 8 (1) ◽

pp. 380-388

Author(s):

Philip McShane ◽

Darryl Stewart

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Visual Speech ◽

Visual Speech Recognition

Download Full-text

Neural network acoustic and visual speech recognition system

The Journal of the Acoustical Society of America ◽

10.1121/1.420021 ◽

1997 ◽

Vol 102 (3) ◽

pp. 1282

Author(s):

David G. Stork

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recognition System ◽

Visual Speech ◽

Speech Recognition System ◽

Visual Speech Recognition

Download Full-text

Audio-Visual Speech Recognition System Using Recurrent Neural Network

2019 4th International Conference on Information Technology (InCIT) ◽

10.1109/incit.2019.8912049 ◽

2019 ◽

Author(s):

Yeh-Huann Goh ◽

Kai-Xian Lau ◽

Yoon-Ket Lee

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Recurrent Neural Network ◽

Recognition System ◽

Visual Speech ◽

Speech Recognition System ◽

Visual Speech Recognition

Download Full-text

Visual Speech Recognition using Convolutional Neural Network

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1084/1/012020 ◽

2021 ◽

Vol 1084 (1) ◽

pp. 012020

Author(s):

B Soundarya ◽

R Krishnaraj ◽

S Mythili

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Convolutional Neural Network ◽

Visual Speech ◽

Visual Speech Recognition

Download Full-text

An Critical Analysis of Speech Recognition of Tamil and Malay Language Through Artificial Neural Network

10.20944/preprints202102.0156.v1 ◽

2021 ◽

Author(s):

Kingston Pal Thamburaj ◽

Kartheges Ponniah ◽

Ilangkumaran Sivanathan ◽

Muniisvaran Kumar

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Speech Recognition ◽

Data Entry ◽

Recognition System ◽

Visual Speech ◽

Document Preparation ◽

Multiple Sensors ◽

Visual Speech Recognition ◽

Artificial Neural

Human and Computer interaction has been a part of our day-to-day life. Speech is one of the essential and comfortable ways of interacting through devices as well as a human being. The device, particularly smartphones have multiple sensors in camera and microphone, etc. speech recognition is the process of converting the acoustic signal to a smartphone as a set of words. The efficient performance of the speech recognition system highly enhances the interaction between humans and machines by making the latter more receptive to user needs. The recognized words can be applied for many applications such as Commands & Control, Data entry, and Document preparation. This research paper highlights speech recognition through ANN (Artificial Neural Network). Also, a hybrid model is proposed for audio-visual speech recognition of the Tamil and Malay language through SOM (Self-organizing map0 and MLP (Multilayer Perceptron). The Effectiveness of the different models of NN (Neural Network) utilized in speech recognition will be examined.

Download Full-text