Automatic Topic Segmentation for Video Lectures Using Low and High-Level Audio Features

Author(s):  
Eduardo R. Soares ◽  
Eduardo Barrére
2018 ◽  
Author(s):  
Eduardo R. Soares ◽  
Eduardo Barrére

Nowadays, video lectures are a very popular way to transmit knowledge, and because of that, there are many repositories with a large catalog of those videos on web. Despite all benefits that this high availability of video lectures brings, some problems also emerge from this scenario. One of these problems is that, it is very difficult find relevant content associate with those videos. Many times, students must to watch the entire video lecture to find the point of interest and, sometimes, these points are not found. For that reason, the proposal of this master’s project is to investigate and propose a novel framework based on early fusion of low and high-level audio features enriched with external knowledge from open databases for automatic topic segmentation in video lectures. We have performed preliminary experiments in two sets of video lectures using the current state of our work. The obtained results were very satisfactory, which evidences the potential of our proposal.


2007 ◽  
Vol 01 (03) ◽  
pp. 377-402 ◽  
Author(s):  
SHU-CHING CHEN ◽  
NA ZHAO ◽  
MEI-LING SHYU

In this paper, a user-centered framework is proposed for video database modeling and retrieval to provide appealing multimedia experiences on the content-based video queries. By incorporating the Hierarchical Markov Model Mediator (HMMM) mechanism, the source videos, segmented video shots, visual/audio features, semantic events, and high-level user perceptions are seamlessly integrated in a video database. With the hierarchical and stochastic design for video databases and semantic concept modeling, the proposed framework supports the retrieval for not only single events but also temporal sequences with multiple events. Additionally, an innovative method is proposed to capture the individual user's preferences by considering both the low-level features and the semantic concepts. The retrieval and ranking of video events and the temporal patterns can be updated dynamically online to satisfy individual user's interest and information requirements. Moreover, the users' feedbacks are efficiently accumulated for the offline system training process such that the overall retrieval performance can be enhanced periodically and continuously. For the evaluation of the proposed approach, a soccer video retrieval system is developed, presented, and tested to demonstrate the overall retrieval performance improvement achieved by modeling and capturing the user preferences.


2019 ◽  
Vol 66 (8) ◽  
pp. 2319-2330 ◽  
Author(s):  
Jesus Monge-Alvarez ◽  
Carlos Hoyos-Barcelo ◽  
Luis Miguel San-Jose-Revuelta ◽  
Pablo Casaseca-de-la-Higuera

2019 ◽  
Vol 8 (1) ◽  
pp. 322-330
Author(s):  
Lyudmila Gienovna Yun

The paper deals with the problem of optimizing the language training process for Chinese engineering students in the framework of joint Sino-Russian programs. The existing curricula on the Russian language as a foreign language (RFL) used to train future engineers have several disadvantages, and therefore do not provide a high level of subject and language competences. This is primarily due to weak contacts between teachers of special disciplines and teachers of the Russian language, the lack of coordination of teaching methods used by Russian and Chinese teachers of Russian. Features of training in a non-linguistic and linguistic environment predetermine the expediency of interconnected teaching of all types of speech activity in Russian. In this case, the system of exercises should be based on taking into account the ethno-psychological characteristics of Chinese students and the learning strategies they use when learning a foreign language. Particular attention should be paid to teaching the language of the specialty on the material of authentic video lectures of subject teachers in Russian. The author concludes that it is necessary to develop nationally oriented teaching aids for Chinese students-non-philologists who study in joint training programs for engineers in 2 + 2 and 3 + 1 schemes.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Gwenaelle Cunha Sergio ◽  
Minho Lee

Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there is currently a demand for more empathetic computers to aid humans in applications such as augmenting the perception ability of visually- and/or hearing-impaired people. Current approaches overlook the video’s emotional characteristics in the music generation step, only consider static images instead of videos, are unable to generate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video’s emotion from its visual features and a deep Long Short-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The former is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time properties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the extraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for users. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets, respectively, and similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain transformation between visual and audio features. Based on experimental results, our model can effectively generate an audio that matches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen more often (code available online at https://github.com/gcunhase/Emotional-Video-to-Audio-with-ANFIS-DeepRNN).


2015 ◽  
Vol 19 (3) ◽  
Author(s):  
Cheryl A. Murphy ◽  
John C. Stewart

Blended learning options vary and universities are exploring an assortment of instructional combinations, some involving video lectures as a replacement for face-to-face (f2f) lectures. This methodological study investigates the impact of the provision of lecture choice (online or f2f) on overall student achievement and course engagement. This research uses a within-group design to obtain baseline data on a single set of physics students (n=168), and investigates the impact of providing a lecture viewing choice (online, f2f) mid-semester on student achievement (tests, homework, and standardized conceptual evaluation scores), and course engagement (student lecture viewing, homework submissions, bonus project submissions, and note taking behaviors). The study reveals that the type of lecture does not serve to significantly impact overall student achievement or engagement. However, although recorded and f2f lectures demonstrate an overall educationally equivalent impact, students who elect a high level of recorded lecture use were significantly lower performing and less engaged before the option to watch recorded lectures was introduced and largely continued to be so after the option was introduced, but there was evidence of a reduction in achievement and engagement differences after the option is introduced. Therefore, results of this study suggest weaker performing students self-select higher levels of recorded lecture use, and the use of these video lectures may assist this specific group of students in closing the gap between themselves and students who were initially higher performing and more engaged.


Author(s):  
Anuranjan Pandey

Abstract: In the tropical jungle, hearing a species is considerably simpler than seeing it. The sounds of many birds and frogs may be heard if we are in the woods, but the bird cannot be seen. It is difficult in this these circumstances for the expert in identifying the many types of insects and harmful species that may be found in the wild. An audio-input model has been developed in this study. Intelligent signal processing is used to extract patterns and characteristics from the audio signal, and the output is used to identify the species. Sound of the birds and frogs vary according to their species in the tropical environment. In this research we have developed a deep learning model, this model enhances the process of recognizing the bird and frog species based on the audio features. The model achieved a high level of accuracy in recognizing the birds and the frog species. The Resnet model which includes block of simple and convolution neural network is effective in recognizing the birds and frog species using the sound of the animal. Above 90 percent of accuracy is achieved for this classification task. Keywords: Bird Frog Detection, Neural Network, Resnet, CNN.


2015 ◽  
Vol 19 (3) ◽  
Author(s):  
Cheryl A. Murphy ◽  
John C. Stewart

Blended learning options vary and universities are exploring an assortment of instructional combinations, some involving video lectures as a replacement for face-to-face (f2f) lectures. This methodological study investigates the impact of the provision of lecture choice (online or f2f) on overall student achievement and course engagement. This research uses a within-group design to obtain baseline data on a single set of physics students (n=168), and investigates the impact of providing a lecture viewing choice (online, f2f) mid-semester on student achievement (tests, homework, and standardized conceptual evaluation scores), and course engagement (student lecture viewing, homework submissions, bonus project submissions, and note taking behaviors). The study reveals that the type of lecture does not serve to significantly impact overall student achievement or engagement. However, although recorded and f2f lectures demonstrate an overall educationally equivalent impact, students who elect a high level of recorded lecture use were significantly lower performing and less engaged before the option to watch recorded lectures was introduced and largely continued to be so after the option was introduced, but there was evidence of a reduction in achievement and engagement differences after the option is introduced. Therefore, results of this study suggest weaker performing students self-select higher levels of recorded lecture use, and the use of these video lectures may assist this specific group of students in closing the gap between themselves and students who were initially higher performing and more engaged.


Sign in / Sign up

Export Citation Format

Share Document