scholarly journals Comparative Study of Movie Shot Classification Based on Semantic Segmentation

2020 ◽  
Vol 10 (10) ◽  
pp. 3390
Author(s):  
Hui-Yong Bak ◽  
Seung-Bo Park

The shot-type decision is a very important pre-task in movie analysis due to the vast information, such as the emotion, psychology of the characters, and space information, from the shot type chosen. In order to analyze a variety of movies, a technique that automatically classifies shot types is required. Previous shot type classification studies have classified shot types by the proportion of the face on-screen or using a convolutional neural network (CNN). Studies that have classified shot types by the proportion of the face on-screen have not classified the shot if a person is not on the screen. A CNN classifies shot types even in the absence of a person on the screen, but there are certain shots that cannot be classified because instead of semantically analyzing the image, the method classifies them only by the characteristics and patterns of the image. Therefore, additional information is needed to access the image semantically, which can be done through semantic segmentation. Consequently, in the present study, the performance of shot type classification was improved by preprocessing the semantic segmentation of the frame extracted from the movie. Semantic segmentation approaches the images semantically and distinguishes the boundary relationships among objects. The representative technologies of semantic segmentation include Mask R-CNN and Yolact. A study was conducted to compare and evaluate performance using these as pretreatments for shot type classification. As a result, the average accuracy of shot type classification using a frame preprocessed with semantic segmentation increased by 1.9%, from 93% to 94.9%, when compared with shot type classification using the frame without such preprocessing. In particular, when using ResNet-50 and Yolact, the classification of shot type showed a 3% performance improvement (to 96% accuracy from 93%).

2019 ◽  
Author(s):  
Marion Poupard ◽  
Paul Best ◽  
Jan Schlüter ◽  
Helena Symonds ◽  
Paul Spong ◽  
...  

Killer whales (Orcinus orca) can produce 3 types of signals: clicks, whistles and vocalizations. This study focuses on Orca vocalizations from northern Vancouver Island (Hanson Island) where the NGO Orcalab developed a multi-hydrophone recording station to study Orcas. The acoustic station is composed of 5 hydrophones and extends over 50 km 2 of ocean. Since 2015 we are continuously streaming the hydrophone signals to our laboratory in Toulon, France, yielding nearly 50 TB of synchronous multichannel recordings. In previous work, we trained a Convolutional Neural Network (CNN) to detect Orca vocalizations, using transfer learning from a bird activity dataset. Here, for each detected vocalization, we estimate the pitch contour (fundamental frequency). Finally, we cluster vocalizations by features describing the pitch contour. While preliminary, our results demonstrate a possible route towards automatic Orca call type classification. Furthermore, they can be linked to the presence of particular Orca pods in the area according to the classification of their call types. A large-scale call type classification would allow new insights on phonotactics and ethoacoustics of endangered Orca populations in the face of increasing anthropic pressure.


2021 ◽  
Vol 10 (2) ◽  
pp. 182-188
Author(s):  
Ajeng Restu Kusumastuti ◽  
Yosi Kristian ◽  
Endang Setyati

Abstract—The Covid-19 pandemic has transformed the offline education system into online. Therefore, in order to maximize the learning process, teachers were forced to adapt by having presentations that attract student's attention, including kindergarten teachers. This is a major problem considering the attention rate of children at early age is very diverse combined with their limited communication skill. Thus, there is a need to identify and classify student's learning interest through facial expressions and gestures during the online session. Through this research, student's learning interest were classified into several classes, validated by the teacher. There are three classes: Interested, Moderately Interested, and Not Interested. Trials to get the classification of student's learning interest by teacher validation, carried out by training and testing the cut area of the center of the face (eyes, mouth, face) to get facial expression recognition, supported by the gesture area as gesture recognition. This research has scenarios of four cut areas and two cut areas that were applied to the interest class that utilizes the weight of transfer learning architectures such as VGG16, ResNet50, and Xception. The results of the learning interest classification test obtained a minimum validation percentage of 70%. The result obtained through scenarios of three learning interest classes four cut areas using VGG16 was 75%, while for two cut areas using ResNet50 was 71%. These results proved that the methods of this research can be used to determine the duration and theme of online kindergarten classes.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254181
Author(s):  
Kamila Lis ◽  
Mateusz Koryciński ◽  
Konrad A. Ciecierski

Data classification is one of the most commonly used applications of machine learning. The are many developed algorithms that can work in various environments and for different data distributions that perform this task with excellence. Classification algorithms, just like other machine learning algorithms have one thing in common: in order to operate on data, they must see the data. In the present world, where concerns about privacy, GDPR (General Data Protection Regulation), business confidentiality and security are growing bigger and bigger; this requirement to work directly on the original data might become, in some situations, a burden. In this paper, an approach to the classification of images that cannot be directly accessed during training has been made. It has been shown that one can train a deep neural network to create such a representation of the original data that i) without additional information, the original data cannot be restored, and ii) that this representation—called a masked form—can still be used for classification purposes. Moreover, it has been shown that classification of the masked data can be done using both classical and neural network-based classifiers.


2022 ◽  
Vol 4 (4) ◽  
pp. 1-22
Author(s):  
Valentina Candiani ◽  
◽  
Matteo Santacesaria ◽  

<abstract><p>We consider the problem of the detection of brain hemorrhages from three-dimensional (3D) electrical impedance tomography (EIT) measurements. This is a condition requiring urgent treatment for which EIT might provide a portable and quick diagnosis. We employ two neural network architectures - a fully connected and a convolutional one - for the classification of hemorrhagic and ischemic strokes. The networks are trained on a dataset with $ 40\, 000 $ samples of synthetic electrode measurements generated with the complete electrode model on realistic heads with a 3-layer structure. We consider changes in head anatomy and layers, electrode position, measurement noise and conductivity values. We then test the networks on several datasets of unseen EIT data, with more complex stroke modeling (different shapes and volumes), higher levels of noise and different amounts of electrode misplacement. On most test datasets we achieve $ \geq 90\% $ average accuracy with fully connected neural networks, while the convolutional ones display an average accuracy $ \geq 80\% $. Despite the use of simple neural network architectures, the results obtained are very promising and motivate the applications of EIT-based classification methods on real phantoms and ultimately on human patients.</p></abstract>


Author(s):  
L. Xue ◽  
C. Liu ◽  
Y. Wu ◽  
H. Li

Semantic segmentation is a fundamental research in remote sensing image processing. Because of the complex maritime environment, the classification of roads, vegetation, buildings and water from remote Sensing Imagery is a challenging task. Although the neural network has achieved excellent performance in semantic segmentation in the last years, there are a few of works using CNN for ground object segmentation and the results could be further improved. This paper used convolution neural network named U-Net, its structure has a contracting path and an expansive path to get high resolution output. In the network , We added BN layers, which is more conducive to the reverse pass. Moreover, after upsampling convolution , we add dropout layers to prevent overfitting. They are promoted to get more precise segmentation results. To verify this network architecture, we used a Kaggle dataset. Experimental results show that U-Net achieved good performance compared with other architectures, especially in high-resolution remote sensing imagery.


Author(s):  
Hatem Keshk ◽  
Xu-Cheng Yin

Background: Deep Learning (DL) neural network methods have become a hotspot subject of research in the remote sensing field. Classification of aerial satellite images depends on spectral content, which is a challenging topic in remote sensing. Objective: With the aim to accomplish a high performance and accuracy of Egyptsat-1 satellite image classification, the use of the Convolutional Neural Network (CNN) is raised in this paper because CNN is considered a leading deep learning method. CNN is developed to classify aerial photographs into land cover classes such as urban, vegetation, desert, water bodies, soil, roads, etc. In our work, a comparison between MAXIMUM Likelihood (ML) which represents the traditional supervised classification methods and CNN method is conducted. Conclusion: This research finds that CNN outperforms ML by 9%. The convolutional neural network has better classification result, which reached 92.25% as its average accuracy. Also, the experiments showed that the convolutional neural network is the most satisfactory and effective classification method applied to classify Egyptsat-1 satellite images.


2021 ◽  
Vol 10 (1) ◽  
pp. 44
Author(s):  
Bhargavi Mahesh ◽  
Teresa Scholz ◽  
Jana Streit ◽  
Thorsten Graunke ◽  
Sebastian Hettenkofer

Metal oxide (MOX) sensors offer a low-cost solution to detect volatile organic compound (VOC) mixtures. However, their operation involves time-consuming heating cycles, leading to a slower data collection and data classification process. This work introduces a few-shot learning approach that promotes rapid classification. In this approach, a model trained on several base classes is fine-tuned to recognize a novel class using a small number (n = 5, 25, 50 and 75) of randomly selected novel class measurements/shots. The used dataset comprises MOX sensor measurements of four different juices (apple, orange, currant and multivitamin) and air, collected over 10-minute phases using a pulse heater signal. While high average accuracy of 82.46 is obtained for five-class classification using 75 shots, the model’s performance depends on the juice type. One-shot validation showed that not all measurements within a phase are representative, necessitating careful shot selection to achieve high classification accuracy. Error analysis revealed contamination of some measurements by the previously measured juice, a characteristic of MOX sensor data that is often overlooked and equivalent to mislabeling. Three strategies are adopted to overcome this: (E1) and (E2) fine-tuning after dropping initial/final measurements and the first half of each phase, respectively, (E3) pretraining with data from the second half of each phase. Results show that each of the strategies performs best for a specific number of shots. E3 results in the highest performance for five-shot learning (accuracy 63.69), whereas E2 yields the best results for 25-/50-shot learning (accuracies 79/87.1) and E1 predicts best for 75-shot learning (accuracy 88.6). Error analysis also showed that, for all strategies, more than 50% of air misclassifications resulted from contamination, but E1 was affected the least. This work demonstrates how strongly data quality can affect prediction performance, especially for few-shot classification methods, and that a data-centric approach can improve the results.


Author(s):  
Putri Marhida Badarudin ◽  
◽  
Rozaida Ghazali ◽  
Abdullah Alahdal ◽  
N.A.M. Alduais ◽  
...  

This work develops an Artificial Neural Network (ANN) model for performing Breast Cancer (BC) classification tasks. The design of the model considers studying different ANN architectures from the literature and chooses the one with the best performance. This ANN model aims to classify BC cases more systematically and more quickly. It provides facilities in the field of medicine to detect breast cancer among women. The ANN classification model is able to achieve an average accuracy of 98.88 % with an average run time of 0.182 seconds. Using this model, the classification of BC can be carried out much more faster than manual diagnosis and with good enough accuracy.


2014 ◽  
Vol 26 (02) ◽  
pp. 1450029 ◽  
Author(s):  
Chuang-Chien Chiu ◽  
Bui Huy Hai ◽  
Shoou-Jeng Yeh

Recognition of sleep stages is an important task in the assessment of the quality of sleep. Several biomedical signals, such as EEG, ECG, EMG and EOG are used extensively to classify the stages of sleep, which is very important for the diagnosis of sleep disorders. Many sleep studies have been conducted that focused on the automatic classification of sleep stages. In this research, a new classification method is presented that uses an Elman neural network combined with fuzzy rules to extract sleep features based on wavelet decompositions. The nine subjects who participated in this study were recruited from Cheng-Ching General Hospital in Taichung, Taiwan. The sampling frequency was 250 Hz, and a single-channel (C3-A1) EEG signal was acquired for each subject. The system consisted of a combined neural network and fuzzy system that was used to recognize sleep stages based on epochs (10-second segments of data). The classification results relied on the strong points of combined neural network and fuzzy system, which achieved an average specificity of approximately 96% and an average accuracy of approximately 94%.


2021 ◽  
Vol 2 (1) ◽  
Author(s):  
Putri Marhida Badarudin ◽  
◽  
Rozaida Ghazali ◽  
Abdullah Alahdal ◽  
N.A.M. Alduais ◽  
...  

This work develops an Artificial Neural Network (ANN) model for performing Breast Cancer (BC) classification tasks. The design of the model considers studying different ANN architectures from the literature and chooses the one with the best performance. This ANN model aims to classify BC cases more systematically and more quickly. It provides facilities in the field of medicine to detect breast cancer among women. The ANN classification model is able to achieve an average accuracy of 98.88 % with an average run time of 0.182 seconds. Using this model, the classification of BC can be carried out much more faster than manual diagnosis and with good enough accuracy.


Sign in / Sign up

Export Citation Format

Share Document