visual database
Recently Published Documents


TOTAL DOCUMENTS

81
(FIVE YEARS 13)

H-INDEX

10
(FIVE YEARS 2)

2021 ◽  
Vol 13 (7) ◽  
pp. 182
Author(s):  
Shinnosuke Isobe ◽  
Satoshi Tamura ◽  
Satoru Hayamizu ◽  
Yuuto Gotoh ◽  
Masaki Nose

Recently, automatic speech recognition (ASR) and visual speech recognition (VSR) have been widely researched owing to development in deep learning. Most VSR research works focus only on frontal face images. However, assuming real scenes, it is obvious that a VSR system should correctly recognize spoken contents from not only frontal but also diagonal or profile faces. In this paper, we propose a novel VSR method that is applicable to faces taken at any angle. Firstly, view classification is carried out to estimate face angles. Based on the results, feature extraction is then conducted using the best combination of pre-trained feature extraction models. Next, lipreading is carried out using the features. We also developed audio-visual speech recognition (AVSR) using the VSR in addition to conventional ASR. Audio results were obtained from ASR, followed by incorporating audio and visual results in a decision fusion manner. We evaluated our methods using OuluVS2, a multi-angle audio-visual database. We then confirmed that our approach achieved the best performance among conventional VSR schemes in a phrase classification task. In addition, we found that our AVSR results are better than ASR and VSR results.


2021 ◽  
Author(s):  
Mahsa Hedayatipour ◽  
Yasser Shekofteh ◽  
Mohsen Ebrahimi Moghaddam
Keyword(s):  

2021 ◽  
Vol 17 (1) ◽  
pp. 109-127
Author(s):  
Sayed El-konaly ◽  
Sabry Saraya ◽  
Ali El-Desouky ◽  
Yehia Enab ◽  
Aida Osman ◽  
...  

2021 ◽  
Vol 2 (1) ◽  
pp. 106-121
Author(s):  
Dorina Szente

In the first half of the twentieth century, photography allowed families and groups to capture important moments. In the 1920s and 1930s, cheaper and simpler cameras appeared on the market, which became available to many people. It was the Kodak revolution. The intimate family spaces opened; the everyday life of the schools became visible. The Fortepan visual database is a collection of such photographs taken between 1900 and 1990. As a cultural imprint of the time, the photograph has become a new source for researchers to observe a symbolic world we know little about. The oldest communication medium is the human body, so its movement in space can take cultural anthropological and pedagogical anthropological research to a whole new level. Rituals interarm everyday life, forming a transition between past, present, and future. It creates community, order. School celebrations are a good way to see hidden content that settles social conditions. The research looks at how school dances appeared in the 1920s and 1930s and how school dances changed to different social influences, and what ritual elements appear in them.


2021 ◽  
Vol 1 (9 (109)) ◽  
pp. 58-65
Author(s):  
Israa Mohammed Khudher

Steganography is the science of hiding secret data inside another data type as image and text. This data is known as carrier data; it lets people interconnect secretly. This suggested paper aims to design a Steganography Biometric Imaging System (SBIS). The system is constructed in a hybridization manner between image processing, steganography, and artificial intelligence techniques. During image processing techniques the system receives RGB foot-tip images and preprocesses the images to get foot-template images. Then a chain code is illustrated for personal information within the foot-template image by Least Significant Bit (LSB). Accurate recognition operation is performed by artificial bee colony optimization (ABC). The automated system was tested on a live-took about ninety RGB foot-tip images known as the cover image and clustered to nine clusters that authorized visual database. The Least Significant Bit method transforms the foot template to a stego image and is stored on a stego visual database for further use. Features database was constructed for each stego footprint template. This step converts the image to quantities data and stored in an Excel feature database file. The quantities data was used at the recognition stage to produce either a notification of rejection or acceptance. At the acceptance choice, the corresponding stego foot-tip template occurrence was retrieved, it is corresponding individual data were extracted and cluster position on the stego template visual database. Indeed, the foot-tip template is displayed. The suggested work consequence is affected by the optimum feature selection via the artificial bee colony optimization usage and clustering, which declined the complication and subsequently raised the recognition rate to 93.65 %. This rate competes out the technique over others’ techniques in the field of biometric recognition


Imagine how tiresome it is for the scorers to update the scoreboard after each ball delivery during a cricket match. They need to be alert during any point in the match, watch every single ball, record ball by ball events, modify the score and coordinate with the umpire the entire time. A system that can update the scoreboard automatically after every ball will lessen their effort by half; the time taken for the updation and the chances of errors will also be reduced. A novel method for umpire pose detection for updating the cricket scoreboard during real-time cricket matches is suggested in this work. The proposed system identifies the events happening in the pitch by recognizing the gestures of the umpire and then updates the scoreboard accordingly. The concept of transfer learning is used to accelerate the training of neural network for feature extraction. The Inception V3 network pretrained on the visual database ImageNet is culled as the primary prospect for feature extraction. Instead of initializing the model with random weights, initializing it with the pretrained weights reduces the training time and hence is more efficient. The proposed system is a combination of two SVM classifiers. The leadoff classifier tells apart the images that contain an umpire from the non-umpire images. These ‘umpire’ images are then carried forward to the event detection classifier while the ‘non-umpire’ images are repudiated. The second classifier is able to identify four gestures – ‘Six’, ‘Wide’, ‘No ball’ and ‘Out’ from the images, following which the scoreboard is updated. In addition to these four classes, one more label is defined to group those umpire frames within which the umpire does not show any signal, namely the ‘No Action’ class. The cricket video given as input is first split into number of shots and each frame is considered as a test image for the combined classifier system. A majority voter is used to confirm the final classification result which decreases the chances of misclassifications. The preliminary results suggest that the intended system is efficacious for the purpose of automating the updation of scoreboard during real time cricket matches.


Author(s):  
Israa Mohammed Alhamdani ◽  
Yahya Ismail Ibrahim

At the last decade the importance of biometrics has been clearly configured due to its important in the daily life that starts from civil applications with security and recently terrorizing. A Footprint recognition is one of the effective personal identifications based on biometric measures. The aim of this research is to design a proper and reliable left human footprint biometrics system addressed (LFBS). In addition, to create a human footprint database which it is very helpful for numerous use such as during authentication. The existing footprint databases were very rare and limited. This paper presents a sturdy combined technique which merges between Image Processing with Artificial Intelligent technique via Bird Swarm Optimization Algorithm (BSA) to recognize the human footprint. The use of (BSA) enhance the performance and the quality of the results in the biometric system through feature selection. The selected features was treated as the optimal feature set in standings of feature set size. The visual database was constructed by capturing life RGB footprint images from nine person with ten images per person. The visual dataset images was pre-processed by successive operations. Chain Code is used with footprint binary image, then statistical features which represent the footprint features. These features were extracted from each image and stored in Excel file to be entered into the Bird Swarm Algorithm. The experimental results show that the proposed algorithm estimates, excellent results with a smaller feature set in comparison with other algorithms. Experimental outcomes show that our algorithm achieves well-organized and accurate result about 100% accuracy in relation with other papers on the same field.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 56641-56649 ◽  
Author(s):  
Helard B. Martinez ◽  
Andrew Hines ◽  
Mylene C. Q. Farias

Sensors ◽  
2019 ◽  
Vol 20 (1) ◽  
pp. 183 ◽  
Author(s):  
Mustaqeem ◽  
Soonil Kwon

Speech is the most significant mode of communication among human beings and a potential method for human-computer interaction (HCI) by using a microphone sensor. Quantifiable emotion recognition using these sensors from speech signals is an emerging area of research in HCI, which applies to multiple applications such as human-reboot interaction, virtual reality, behavior assessment, healthcare, and emergency call centers to determine the speaker’s emotional state from an individual’s speech. In this paper, we present major contributions for; (i) increasing the accuracy of speech emotion recognition (SER) compared to state of the art and (ii) reducing the computational complexity of the presented SER model. We propose an artificial intelligence-assisted deep stride convolutional neural network (DSCNN) architecture using the plain nets strategy to learn salient and discriminative features from spectrogram of speech signals that are enhanced in prior steps to perform better. Local hidden patterns are learned in convolutional layers with special strides to down-sample the feature maps rather than pooling layer and global discriminative features are learned in fully connected layers. A SoftMax classifier is used for the classification of emotions in speech. The proposed technique is evaluated on Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) datasets to improve accuracy by 7.85% and 4.5%, respectively, with the model size reduced by 34.5 MB. It proves the effectiveness and significance of the proposed SER technique and reveals its applicability in real-world applications.


Sign in / Sign up

Export Citation Format

Share Document