scholarly journals Real-time video based emotion recognition using convolutional neural network and transfer learning

2020 ◽  
Vol 13 (31) ◽  
pp. 3222-3229
Author(s):  
J Sujanaa ◽  
Author(s):  
Saurabh Takle ◽  
Shubham Desai ◽  
Sahil Mirgal ◽  
Ichhanshu Jaiswal

<p>The main cause of accidents is due to Manual, Visual or Cognitive distraction out of these three Manual distractions are concerned with various activities where “driver’s hands are off the wheel”. Such distractions include talking or texting using mobile phones, eating and drinking, talking to passengers in the vehicle, adjusting the radio, makeup, etc. To solve the problem of manual distraction, the Convolutional Neural Network (CNN) model of ResNet-50 using transfer learning with 23,587,712 parameters was used. The dataset used is from State Farm Distracted Driver Detection Dataset. The training accuracy is 97.27% and validation accuracy is 55%. Further the model works on detecting real-time distractions on a video feed for this purpose the system uses OpenCV and the model is integrated with the frontend using the flask.</p>


Recognition of face emotion has been a challenging task for many years. This work uses machine learning algorithms for both, a real-time image or a stored database image in the area of facial emotion recognition system. So it is very clear that, deep learning technology becomes important for Human-computer interaction (HCI) applications. The proposed system has two parts, real-time based facial emotion recognition system and also the image based facial emotion recognition system. A Convolutional Neural Network (CNN) model is used to train and test different facial emotion images in this research work. This work was executed successfully using Python 3.7.6 platform. The input Face image of a person was taken using the webcam video stream or from the standard database available for research. The five different facial emotions considered in this work are happy, surprise, angry, sad and neutral. The best recognition accuracy with the proposed system for the webcam video stream is found to be 91.2%, whereas for the input database images is found to be 90.08%.


2019 ◽  
Vol 8 (4) ◽  
pp. 12940-12944

Human life is a complex social structure. It is not possible for the humans to navigate without reading the other persons. They do it by identifying the faces. The state of response can be decided based on the mood of the opposite person. Whereas a person’s mood can be figured out by observing his emotion (Facial Gesture). The aim of the project is to construct a “Facial emotion Recognition” model using DCNN (Deep convolutional neural network) in real time. The model is constructed using DCNN as it is proven that DCNN work with greater accuracy than CNN (convolutional neural network). The facial expression of humans is very dynamic in nature it changes in split seconds whether it may be Happy, Sad, Angry, Fear, Surprise, Disgust and Neutral etc. This project is to predict the emotion of the person in real time. Our brains have neural networks which are responsible for all kinds of thinking (decision making, understanding). This model tries to develop these decisions making and classification skills by training the machine. It can classify and predict the multiple faces and different emotions at the very same time. In order to obtain higher accuracy, we take the models which are trained over thousands of datasets.


2020 ◽  
Vol 12 (1) ◽  
pp. 1
Author(s):  
Vivian Alfionita Sutama ◽  
Suryo Adhi Wibowo ◽  
Rissa Rahmania

Nowadays, Artificial Intelligence is one of the most developing technology, especially on Augmented Reality (AR). AR is a technology which connected between real world and virtual in a real time that allows user to interact directly and display it in 3D. AR technology has two methods, that are AR based on marker and AR based on markerless. However, AR based on marker need an object detection system which has high performance as an interaction tools between user and the device. Single shot multibox detector (SSD) is an object detection algorithm that has fast learning computation and good performance. This method is affected by some parameters like number of epoch, learning rate, batch size, step training, etc. However, to create a good system it took a long process such as taking dataset, labelling process, then training and testing models to gain the best performance. In this experiment, we analyze SSD method in AR technology using inception architecture as pre-trained Convolutional neural network (CNN), and then do transfer learning to minimize amount training time. The configuration that used is the number of step training. The result of this experiment gets the best accuracy in 70.17%. Then, the best performance is used as an object detection model for marker’s AR technology.Abstrak Saat ini, Artificial intelligence merupakan teknologi yang sedang berkembang pesat. Salah satunya adalah teknologi Augmented Reality (AR). AR adalah teknologi yang menggabungkan dunia nyata dengan virtual secara real-time dengan interaksi pengguna secara langsung dan menampilkannya dalam bentuk 3D. Teknologi AR ini memiliki dua metode yaitu dengan marker dan markerless. Dalam perkembangannya, AR berbasis marker membutuhkan sistem deteksi objek yang memiliki performa tinggi sebagai alat interaksi antara pengguna dengan perangkatnya. Single shot multibox detector (SSD) merupakan algoritma deteksi objek yang memiliki komputasi pembelajaran dan kinerja yang baik. Metode ini dipengaruhi oleh beberapa parameter seperti jumlah lapisan konvolusi, epoch, learning rate, jumlah batch, step training, dll. Namun, dalam mengimplementasikannya diperlukan proses yang cukup panjang seperti, pengambilan dataset, proses pelabelan, proses pelatihan menggunakan metode SSD, dan melakukan pengujian terhadap beberapa model untuk mencari perfomansi paling baik. Dalam percobaan ini, kami melakukan analisis terhadap metode SSD pada teknologi AR menggunakan arsitektur Inception sebagai pre-trained Convolutional neural network (CNN), kemudian dilakukan transfer learning untuk memperkecil jumlah kelas data pelatihan dan waktu pelatihan data. Konfigurasi yang digunakan berupa jumlah step pada pelatihan. Hasil dari penilitian ini menunjukan akurasi terbaik sebesar 70,17%. Kemudian, perfomansi terbaik digunakan sebagai model deteksi objek untuk marker pada teknologi AR.


Sign in / Sign up

Export Citation Format

Share Document