Automatic Topic Segmentation for Video Lectures Using Low and High-Level Audio Features

A Framework for Automatic Topic Segmentation in Video Lectures

10.5753/webmedia.2018.4558 ◽

2018 ◽

Author(s):

Eduardo R. Soares ◽

Eduardo Barrére

Keyword(s):

High Availability ◽

Topic Segmentation ◽

External Knowledge ◽

Video Lecture ◽

Early Fusion ◽

Point Of Interest ◽

Audio Features ◽

Current State ◽

Video Lectures ◽

High Level

Nowadays, video lectures are a very popular way to transmit knowledge, and because of that, there are many repositories with a large catalog of those videos on web. Despite all benefits that this high availability of video lectures brings, some problems also emerge from this scenario. One of these problems is that, it is very difficult find relevant content associate with those videos. Many times, students must to watch the entire video lecture to find the point of interest and, sometimes, these points are not found. For that reason, the proposal of this master’s project is to investigate and propose a novel framework based on early fusion of low and high-level audio features enriched with external knowledge from open databases for automatic topic segmentation in video lectures. We have performed preliminary experiments in two sets of video lectures using the current state of our work. The obtained results were very satisfactory, which evidences the potential of our proposal.

Download Full-text

MODELING SEMANTIC CONCEPTS AND USER PREFERENCES IN CONTENT-BASED VIDEO RETRIEVAL

International Journal of Semantic Computing ◽

10.1142/s1793351x07000159 ◽

2007 ◽

Vol 01 (03) ◽

pp. 377-402 ◽

Cited By ~ 7

Author(s):

SHU-CHING CHEN ◽

NA ZHAO ◽

MEI-LING SHYU

Keyword(s):

Video Retrieval ◽

User Preferences ◽

Retrieval Performance ◽

Video Database ◽

Database Modeling ◽

User Perceptions ◽

Audio Features ◽

Semantic Concepts ◽

The Individual ◽

High Level

In this paper, a user-centered framework is proposed for video database modeling and retrieval to provide appealing multimedia experiences on the content-based video queries. By incorporating the Hierarchical Markov Model Mediator (HMMM) mechanism, the source videos, segmented video shots, visual/audio features, semantic events, and high-level user perceptions are seamlessly integrated in a video database. With the hierarchical and stochastic design for video databases and semantic concept modeling, the proposed framework supports the retrieval for not only single events but also temporal sequences with multiple events. Additionally, an innovative method is proposed to capture the individual user's preferences by considering both the low-level features and the semantic concepts. The retrieval and ranking of video events and the temporal patterns can be updated dynamically online to satisfy individual user's interest and information requirements. Moreover, the users' feedbacks are efficiently accumulated for the offline system training process such that the overall retrieval performance can be enhanced periodically and continuously. For the evaluation of the proposed approach, a soccer video retrieval system is developed, presented, and tested to demonstrate the overall retrieval performance improvement achieved by modeling and capturing the user preferences.

Download Full-text

High-Level Analysis of Audio Features for Identifying Emotional Valence in Human Singing

Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion - AM'18 ◽

10.1145/3243274.3243313 ◽

2018 ◽

Cited By ~ 3

Author(s):

Stuart Cunningham ◽

Jonathan Weinel ◽

Richard Picking

Keyword(s):

Emotional Valence ◽

Audio Features ◽

High Level ◽

Level Analysis

Download Full-text

Interpretable Music Categorisation Based on Fuzzy Rules and High-Level Audio Features

Data Science, Learning by Latent Structures, and Knowledge Discovery - Studies in Classification, Data Analysis, and Knowledge Organization ◽

10.1007/978-3-662-44983-7_37 ◽

2015 ◽

pp. 423-432 ◽

Cited By ~ 1

Author(s):

Igor Vatolkin ◽

Günter Rudolph

Keyword(s):

Fuzzy Rules ◽

Audio Features ◽

High Level

Download Full-text

A Machine Hearing System for Robust Cough Detection Based on a High-Level Representation of Band-Specific Audio Features

IEEE Transactions on Biomedical Engineering ◽

10.1109/tbme.2018.2888998 ◽

2019 ◽

Vol 66 (8) ◽

pp. 2319-2330 ◽

Cited By ~ 1

Author(s):

Jesus Monge-Alvarez ◽

Carlos Hoyos-Barcelo ◽

Luis Miguel San-Jose-Revuelta ◽

Pablo Casaseca-de-la-Higuera

Keyword(s):

Audio Features ◽

Machine Hearing ◽

High Level ◽

Hearing System

Download Full-text

System of exercises for teaching general knowledge of Russian language and specialty language to Chinese engineering students

Samara Journal of Science ◽

10.17816/snv201981319 ◽

2019 ◽

Vol 8 (1) ◽

pp. 322-330

Author(s):

Lyudmila Gienovna Yun

Keyword(s):

Foreign Language ◽

Learning Strategies ◽

Engineering Students ◽

Chinese Students ◽

Russian Language ◽

Psychological Characteristics ◽

Teaching Aids ◽

Video Lectures ◽

High Level ◽

The Russian Language

The paper deals with the problem of optimizing the language training process for Chinese engineering students in the framework of joint Sino-Russian programs. The existing curricula on the Russian language as a foreign language (RFL) used to train future engineers have several disadvantages, and therefore do not provide a high level of subject and language competences. This is primarily due to weak contacts between teachers of special disciplines and teachers of the Russian language, the lack of coordination of teaching methods used by Russian and Chinese teachers of Russian. Features of training in a non-linguistic and linguistic environment predetermine the expediency of interconnected teaching of all types of speech activity in Russian. In this case, the system of exercises should be based on taking into account the ethno-psychological characteristics of Chinese students and the learning strategies they use when learning a foreign language. Particular attention should be paid to teaching the language of the specialty on the material of authentic video lectures of subject teachers in Russian. The author concludes that it is necessary to develop nationally oriented teaching aids for Chinese students-non-philologists who study in joint training programs for engineers in 2 + 2 and 3 + 1 schemes.

Download Full-text

Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System

Mathematical Problems in Engineering ◽

10.1155/2020/8478527 ◽

2020 ◽

Vol 2020 ◽

pp. 1-15

Author(s):

Gwenaelle Cunha Sergio ◽

Minho Lee

Keyword(s):

Neural Network ◽

Short Term Memory ◽

Fuzzy Inference ◽

Audio Signals ◽

Global Features ◽

Inference System ◽

Audio Features ◽

Neuro Fuzzy ◽

High Level ◽

Domain Transformation

Generating music with emotion similar to that of an input video is a very relevant issue nowadays. Video content creators and automatic movie directors benefit from maintaining their viewers engaged, which can be facilitated by producing novel material eliciting stronger emotions in them. Moreover, there is currently a demand for more empathetic computers to aid humans in applications such as augmenting the perception ability of visually- and/or hearing-impaired people. Current approaches overlook the video’s emotional characteristics in the music generation step, only consider static images instead of videos, are unable to generate novel music, and require a high level of human effort and skills. In this study, we propose a novel hybrid deep neural network that uses an Adaptive Neuro-Fuzzy Inference System to predict a video’s emotion from its visual features and a deep Long Short-Term Memory Recurrent Neural Network to generate its corresponding audio signals with similar emotional inkling. The former is able to appropriately model emotions due to its fuzzy properties, and the latter is able to model data with dynamic time properties well due to the availability of the previous hidden state information. The novelty of our proposed method lies in the extraction of visual emotional features in order to transform them into audio signals with corresponding emotional aspects for users. Quantitative experiments show low mean absolute errors of 0.217 and 0.255 in the Lindsey and DEAP datasets, respectively, and similar global features in the spectrograms. This indicates that our model is able to appropriately perform domain transformation between visual and audio features. Based on experimental results, our model can effectively generate an audio that matches the scene eliciting a similar emotion from the viewer in both datasets, and music generated by our model is also chosen more often (code available online at https://github.com/gcunhase/Emotional-Video-to-Audio-with-ANFIS-DeepRNN).

Download Full-text

The Impact of Online or F2F Lecture Choice on Student Achievement and Engagement in a Large Lecture-Based Science Course: Closing the Gap

Online Learning ◽

10.24059/olj.v19i3.536 ◽

2015 ◽

Vol 19 (3) ◽

Cited By ~ 2

Author(s):

Cheryl A. Murphy ◽

John C. Stewart

Keyword(s):

Student Achievement ◽

Physics Students ◽

Face To Face ◽

Video Lectures ◽

Evaluation Scores ◽

Recorded Lecture ◽

Course Engagement ◽

Closing The Gap ◽

High Level ◽

The Impact

Blended learning options vary and universities are exploring an assortment of instructional combinations, some involving video lectures as a replacement for face-to-face (f2f) lectures. This methodological study investigates the impact of the provision of lecture choice (online or f2f) on overall student achievement and course engagement. This research uses a within-group design to obtain baseline data on a single set of physics students (n=168), and investigates the impact of providing a lecture viewing choice (online, f2f) mid-semester on student achievement (tests, homework, and standardized conceptual evaluation scores), and course engagement (student lecture viewing, homework submissions, bonus project submissions, and note taking behaviors). The study reveals that the type of lecture does not serve to significantly impact overall student achievement or engagement. However, although recorded and f2f lectures demonstrate an overall educationally equivalent impact, students who elect a high level of recorded lecture use were significantly lower performing and less engaged before the option to watch recorded lectures was introduced and largely continued to be so after the option was introduced, but there was evidence of a reduction in achievement and engagement differences after the option is introduced. Therefore, results of this study suggest weaker performing students self-select higher levels of recorded lecture use, and the use of these video lectures may assist this specific group of students in closing the gap between themselves and students who were initially higher performing and more engaged.

Download Full-text

Detecting Bird and Frog Species Using Tropical Soundscape

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38953 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1173-1179

Author(s):

Anuranjan Pandey

Keyword(s):

Neural Network ◽

Audio Signal ◽

Audio Features ◽

Frog Species ◽

In The Wild ◽

Input Model ◽

The Many ◽

High Level ◽

Deep Learning Model ◽

Harmful Species

Abstract: In the tropical jungle, hearing a species is considerably simpler than seeing it. The sounds of many birds and frogs may be heard if we are in the woods, but the bird cannot be seen. It is difficult in this these circumstances for the expert in identifying the many types of insects and harmful species that may be found in the wild. An audio-input model has been developed in this study. Intelligent signal processing is used to extract patterns and characteristics from the audio signal, and the output is used to identify the species. Sound of the birds and frogs vary according to their species in the tropical environment. In this research we have developed a deep learning model, this model enhances the process of recognizing the bird and frog species based on the audio features. The model achieved a high level of accuracy in recognizing the birds and the frog species. The Resnet model which includes block of simple and convolution neural network is effective in recognizing the birds and frog species using the sound of the animal. Above 90 percent of accuracy is achieved for this classification task. Keywords: Bird Frog Detection, Neural Network, Resnet, CNN.

Download Full-text

The Impact of Online or F2F Lecture Choice on Student Achievement and Engagement in a Large Lecture-Based Science Course: Closing the Gap

Online Learning ◽

10.24059/olj.v19i3.670 ◽

2015 ◽

Vol 19 (3) ◽

Author(s):

Cheryl A. Murphy ◽

John C. Stewart

Keyword(s):

Student Achievement ◽

Physics Students ◽

Face To Face ◽

Video Lectures ◽

Evaluation Scores ◽

Recorded Lecture ◽

Course Engagement ◽

Closing The Gap ◽

High Level ◽

The Impact

Blended learning options vary and universities are exploring an assortment of instructional combinations, some involving video lectures as a replacement for face-to-face (f2f) lectures. This methodological study investigates the impact of the provision of lecture choice (online or f2f) on overall student achievement and course engagement. This research uses a within-group design to obtain baseline data on a single set of physics students (n=168), and investigates the impact of providing a lecture viewing choice (online, f2f) mid-semester on student achievement (tests, homework, and standardized conceptual evaluation scores), and course engagement (student lecture viewing, homework submissions, bonus project submissions, and note taking behaviors). The study reveals that the type of lecture does not serve to significantly impact overall student achievement or engagement. However, although recorded and f2f lectures demonstrate an overall educationally equivalent impact, students who elect a high level of recorded lecture use were significantly lower performing and less engaged before the option to watch recorded lectures was introduced and largely continued to be so after the option was introduced, but there was evidence of a reduction in achievement and engagement differences after the option is introduced. Therefore, results of this study suggest weaker performing students self-select higher levels of recorded lecture use, and the use of these video lectures may assist this specific group of students in closing the gap between themselves and students who were initially higher performing and more engaged.

Download Full-text