scholarly journals Hierarchical Temporal Multi-Instance Learning for Video-based Student Learning Engagement Assessment

Author(s):  
Jiayao Ma ◽  
Xinbo Jiang ◽  
Songhua Xu ◽  
Xueying Qin

Video-based automatic assessment of a student's learning engagement on the fly can provide immense values for delivering personalized instructional services, a vehicle particularly important for massive online education. To train such an assessor, a major challenge lies in the collection of sufficient labels at the appropriate temporal granularity since a learner's engagement status may continuously change throughout a study session. Supplying labels at either frame or clip level incurs a high annotation cost. To overcome such a challenge, this paper proposes a novel hierarchical multiple instance learning (MIL) solution, which only requires labels anchored on full-length videos to learn to assess student engagement at an arbitrary temporal granularity and for an arbitrary duration in a study session. The hierarchical model mainly comprises a bottom module and a top module, respectively dedicated to learning the latent relationship between a clip and its constituent frames and that between a video and its constituent clips, with the constraints on the training stage that the average engagements of local clips is that of the video label. To verify the effectiveness of our method, we compare the performance of the proposed approach with that of several state-of-the-art peer solutions through extensive experiments.

2021 ◽  
pp. 026553222110361
Author(s):  
Chao Han

Over the past decade, testing and assessing spoken-language interpreting has garnered an increasing amount of attention from stakeholders in interpreter education, professional certification, and interpreting research. This is because in these fields assessment results provide a critical evidential basis for high-stakes decisions, such as the selection of prospective students, the certification of interpreters, and the confirmation/refutation of research hypotheses. However, few reviews exist providing a comprehensive mapping of relevant practice and research. The present article therefore aims to offer a state-of-the-art review, summarizing the existing literature and discovering potential lacunae. In particular, the article first provides an overview of interpreting ability/competence and relevant research, followed by main testing and assessment practice (e.g., assessment tasks, assessment criteria, scoring methods, specificities of scoring operationalization), with a focus on operational diversity and psychometric properties. Second, the review describes a limited yet steadily growing body of empirical research that examines rater-mediated interpreting assessment, and casts light on automatic assessment as an emerging research topic. Third, the review discusses epistemological, psychometric, and practical challenges facing interpreting testers. Finally, it identifies future directions that could address the challenges arising from fast-changing pedagogical, educational, and professional landscapes.


Author(s):  
Yibin Yu ◽  
Min Yang ◽  
Yulan Zhang ◽  
Shifang Yuan

Although traditional dictionary learning (DL) methods have made great success in pattern recognition and machine learning, it is extremely time-consuming, especially in the training stage. The projective dictionary pair learning (DPL) learned the synthesis dictionary and the analysis dictionary jointly to achieve a fast and accurate classifier. However, the dictionary pair is initialized as random matrices without using any data samples information, it required many iterations to ensure convergence. In this paper, we propose a novel compact DPL and refining method based on the observation that the eigenvalue curve of sample data covariance matrix usually decrease very fast, which means we can compact the synthesis dictionary and analysis dictionary. For each class of the data samples, we utilize the principal components analysis (PCA) to retain global important information and compact the row space of a synthesis dictionary and the column space of an analysis dictionary in the first stage. We further refine the learned dictionary pair to achieve a more accurate classifier during compact dictionary pair refining, which combines the orthogonality of PCA with the redundancy of DL. We solve this refining problem in closed-form completely, naturally reducing the computation complexity significantly. Experimental results on the Extended YaleB database and AR database show that the proposed method achieves competitive accuracy and low computational complexity compared with other state-of-the-art methods.


2020 ◽  
Vol 34 (04) ◽  
pp. 5742-5749
Author(s):  
Xiaoshuang Shi ◽  
Fuyong Xing ◽  
Yuanpu Xie ◽  
Zizhao Zhang ◽  
Lei Cui ◽  
...  

Although attention mechanisms have been widely used in deep learning for many tasks, they are rarely utilized to solve multiple instance learning (MIL) problems, where only a general category label is given for multiple instances contained in one bag. Additionally, previous deep MIL methods firstly utilize the attention mechanism to learn instance weights and then employ a fully connected layer to predict the bag label, so that the bag prediction is largely determined by the effectiveness of learned instance weights. To alleviate this issue, in this paper, we propose a novel loss based attention mechanism, which simultaneously learns instance weights and predictions, and bag predictions for deep multiple instance learning. Specifically, it calculates instance weights based on the loss function, e.g. softmax+cross-entropy, and shares the parameters with the fully connected layer, which is to predict instance and bag predictions. Additionally, a regularization term consisting of learned weights and cross-entropy functions is utilized to boost the recall of instances, and a consistency cost is used to smooth the training process of neural networks for boosting the model generalization performance. Extensive experiments on multiple types of benchmark databases demonstrate that the proposed attention mechanism is a general, effective and efficient framework, which can achieve superior bag and image classification performance over other state-of-the-art MIL methods, with obtaining higher instance precision and recall than previous attention mechanisms. Source codes are available on https://github.com/xsshi2015/Loss-Attention.


2017 ◽  
Vol 26 (1) ◽  
pp. 185-195 ◽  
Author(s):  
Jie Wang ◽  
Liangjian Cai ◽  
Xin Zhao

AbstractAs we are usually confronted with a large instance space for real-word data sets, it is significant to develop a useful and efficient multiple-instance learning (MIL) algorithm. MIL, where training data are prepared in the form of labeled bags rather than labeled instances, is a variant of supervised learning. This paper presents a novel MIL algorithm for an extreme learning machine called MI-ELM. A radial basis kernel extreme learning machine is adapted to approach the MIL problem using Hausdorff distance to measure the distance between the bags. The clusters in the hidden layer are composed of bags that are randomly generated. Because we do not need to tune the parameters for the hidden layer, MI-ELM can learn very fast. The experimental results on classifications and multiple-instance regression data sets demonstrate that the MI-ELM is useful and efficient as compared to the state-of-the-art algorithms.


Author(s):  
Chirag S Indi ◽  
Varun Pritham ◽  
Vasundhara Acharya ◽  
Krishna Prakasha

Examination malpractice is a deliberate wrong doing contrary to official examina-tion rules designed to place a candidate at unfair advantage or disadvantage. The proposed system depicts a new use of technology to identify malpractice in E-Exams which is essential due to growth of online education. The current solu-tions for such a problem either require complete manual labor or have various vulnerabilities that can be exploited by an examinee. The proposed application en-compasses an end-to-end system that assists an examiner/evaluator in deciding whether a student passes an online exam without any probable attempts of mal-practice or cheating in e-exams with the help of visual aids. The system works by categorizing the student’s VFOA (visual focus of attention) data by capturing the head pose estimates and eye gaze estimates using state-of-the-art machine learn-ing techniques. The system only requires the student (test-taker) to have a func-tioning internet connection along with a webcam to transmit the feed. The exam-iner is alerted when the student wavers in his VFOA, from the screen greater than X, a predefined threshold of times. If this threshold X is crossed, the appli-cation will save the data of the person when his VFOA is off the screen and send it to the examiner to be manually checked and marked whether the action per-formed by the student was an attempt at malpractice or just momentary lapse in concentration. The system use a hybrid classifier approach where two different classifiers are used, one when gaze values are being read successfully (which may fail due to various reasons like transmission quality or glare from his specta-cles), the model falls back to the default classifier which only reads the head pose values to classify the attention metric, which is used to map the student’s VFOA to check the likelihood of malpractice. The model has achieved an accuracy of 96.04 percent in classifying the attention metric.


2020 ◽  
Vol 34 (07) ◽  
pp. 11320-11327 ◽  
Author(s):  
Pilhyeon Lee ◽  
Youngjung Uh ◽  
Hyeran Byun

Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether each video contains action frames of interest. Previous methods aggregate frame-level class scores to produce video-level prediction and learn from video-level action labels. This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately. In this paper, we design Background Suppression Network (BaS-Net) which introduces an auxiliary class for background and has a two-branch weight-sharing architecture with an asymmetrical training strategy. This enables BaS-Net to suppress activations from background frames to improve localization performance. Extensive experiments demonstrate the effectiveness of BaS-Net and its superiority over the state-of-the-art methods on the most popular benchmarks – THUMOS'14 and ActivityNet. Our code and the trained model are available at https://github.com/Pilhyeon/BaSNet-pytorch.


2020 ◽  
Vol 32 (2) ◽  
pp. 485-514
Author(s):  
Xiaofeng Xu ◽  
Ivor W. Tsang ◽  
Chuancai Liu

Zero-shot learning (ZSL) aims to recognize unseen objects (test classes) given some other seen objects (training classes) by sharing information of attributes between different objects. Attributes are artificially annotated for objects and treated equally in recent ZSL tasks. However, some inferior attributes with poor predictability or poor discriminability may have negative impacts on the ZSL system performance. This letter first derives a generalization error bound for ZSL tasks. Our theoretical analysis verifies that selecting the subset of key attributes can improve the generalization performance of the original ZSL model, which uses all the attributes. Unfortunately, previous attribute selection methods have been conducted based on the seen data, and their selected attributes have poor generalization capability to the unseen data, which is unavailable in the training stage of ZSL tasks. Inspired by learning from pseudo-relevance feedback, this letter introduces out-of-the-box data—pseudo-data generated by an attribute-guided generative model—to mimic the unseen data. We then present an iterative attribute selection (IAS) strategy that iteratively selects key attributes based on the out-of-the-box data. Since the distribution of the generated out-of-the-box data is similar to that of the test data, the key attributes selected by IAS can be effectively generalized to test data. Extensive experiments demonstrate that IAS can significantly improve existing attribute-based ZSL methods and achieve state-of-the-art performance.


2020 ◽  
Author(s):  
Luoyang Xue ◽  
Ang Xu ◽  
Qirong Mao ◽  
Lijian Gao ◽  
Jie Chen

Abstract Local information has significant contributions to visual sentiment analysis (VSA). Recent studies about local region discovery need manually annotate region location. Affective local information learning and automatic discovery of sentiment-specific region are still the challenges in VSA. In this paper, we propose an end-to-end VSA method for weakly supervised sentiment-specific region discovery. Our method contains two branches: an automatic sentiment-specific region discovery branch and a sentiment analysis branch. In the sentiment-specific region discovery branch, a region proposal network with multiple convolution kernels is proposed to generate candidate affective regions. Then, we design the multiple instance learning (MIL) loss to remove redundant and noisy candidate regions. Finally, the sentiment analysis branch integrates both holistic and localized information obtained in the first branch by feature map coupling for final sentiment classification. Our method automatically discovers sentiment-specific regions by the constraint of MIL loss function without object-level labels. Quantitative and qualitative evaluations on four benchmark affective datasets demonstrate that our proposed method outperforms the state-of-the-art methods.


2017 ◽  
Vol 2017 ◽  
pp. 1-7 ◽  
Author(s):  
Zhenjie Wang ◽  
Lijia Wang ◽  
Hua Zhang

To deal with the problems of illumination changes or pose variations and serious partial occlusion, patch based multiple instance learning (P-MIL) algorithm is proposed. The algorithm divides an object into many blocks. Then, the online MIL algorithm is applied on each block for obtaining strong classifier. The algorithm takes account of both the average classification score and classification scores of all the blocks for detecting the object. In particular, compared with the whole object based MIL algorithm, the P-MIL algorithm detects the object according to the unoccluded patches when partial occlusion occurs. After detecting the object, the learning rates for updating weak classifiers’ parameters are adaptively tuned. The classifier updating strategy avoids overupdating and underupdating the parameters. Finally, the proposed method is compared with other state-of-the-art algorithms on several classical videos. The experiment results illustrate that the proposed method performs well especially in case of illumination changes or pose variations and partial occlusion. Moreover, the algorithm realizes real-time object tracking.


Author(s):  
Cintia COLIBABA ◽  
Anca Cristina COLIBAB ◽  
Irina GHEORGHIU ◽  
Stefan COLIBABA ◽  
Ovidiu URSA ◽  
...  

The article is based on the ZOE project, Zoonoses Online Education, (2016-1-RO01-KA203-024732), funded by Erasmus+. The project aims to create open digital educational resources in the field of veterinary medicine based on developing innovative guidelines on zoonotic diseases.The article looks at the main findings of the project’s state-of-the-art research in Romania on the 5-10 most encountered and identified zoonoses in the last 10 years at national level.The project research has evaluated the medical literature on zoonotic diseases in the veterinary field and highlighted a variety of diseases that cover the spectrum of infectious agents and display a variety of transmission patterns in different environments, identifying ways of intervention in the local context. This article looks at the second most spread disease present in the-state of-the-art research, Leishmaniasis.The analysis will be incorporated into a guide and an open online course guide on main infectious diseases transmitted from non-human animals to humans, including videos capturing zoonoses bio-manipulation in simulation centres. It will also be part of a guide and an open online course on medical communication, including the zoonoses videos processed from a linguistic, cultural and communication point of view, available in six languages.


Sign in / Sign up

Export Citation Format

Share Document