COWO: towards real-time spatiotemporal action localization in videos

2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yang Yi ◽  
Yang Sun ◽  
Saimei Yuan ◽  
Yiji Zhu ◽  
Mengyi Zhang ◽  
...  

Purpose The purpose of this paper is to provide a fast and accurate network for spatiotemporal action localization in videos. It detects human actions both in time and space simultaneously in real-time, which is applicable in real-world scenarios such as safety monitoring and collaborative assembly. Design/methodology/approach This paper design an end-to-end deep learning network called collaborator only watch once (COWO). COWO recognizes the ongoing human activities in real-time with enhanced accuracy. COWO inherits from the architecture of you only watch once (YOWO), known to be the best performing network for online action localization to date, but with three major structural modifications: COWO enhances the intraclass compactness and enlarges the interclass separability in the feature level. A new correlation channel fusion and attention mechanism are designed based on the Pearson correlation coefficient. Accordingly, a correction loss function is designed. This function minimizes the same class distance and enhances the intraclass compactness. Use a probabilistic K-means clustering technique for selecting the initial seed points. The idea behind this is that the initial distance between cluster centers should be as considerable as possible. CIOU regression loss function is applied instead of the Smooth L1 loss function to help the model converge stably. Findings COWO outperforms the original YOWO with improvements of frame mAP 3% and 2.1% at a speed of 35.12 fps. Compared with the two-stream, T-CNN, C3D, the improvement is about 5% and 14.5% when applied to J-HMDB-21, UCF101-24 and AGOT data sets. Originality/value COWO extends more flexibility for assembly scenarios as it perceives spatiotemporal human actions in real-time. It contributes to many real-world scenarios such as safety monitoring and collaborative assembly.

2016 ◽  
Vol 12 (2) ◽  
pp. 126-149 ◽  
Author(s):  
Masoud Mansoury ◽  
Mehdi Shajari

Purpose This paper aims to improve the recommendations performance for cold-start users and controversial items. Collaborative filtering (CF) generates recommendations on the basis of similarity between users. It uses the opinions of similar users to generate the recommendation for an active user. As a similarity model or a neighbor selection function is the key element for effectiveness of CF, many variations of CF are proposed. However, these methods are not very effective, especially for users who provide few ratings (i.e. cold-start users). Design/methodology/approach A new user similarity model is proposed that focuses on improving recommendations performance for cold-start users and controversial items. To show the validity of the authors’ similarity model, they conducted some experiments and showed the effectiveness of this model in calculating similarity values between users even when only few ratings are available. In addition, the authors applied their user similarity model to a recommender system and analyzed its results. Findings Experiments on two real-world data sets are implemented and compared with some other CF techniques. The results show that the authors’ approach outperforms previous CF techniques in coverage metric while preserves accuracy for cold-start users and controversial items. Originality/value In the proposed approach, the conditions in which CF is unable to generate accurate recommendations are addressed. These conditions affect CF performance adversely, especially in the cold-start users’ condition. The authors show that their similarity model overcomes CF weaknesses effectively and improve its performance even in the cold users’ condition.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jiawei Lian ◽  
Junhong He ◽  
Yun Niu ◽  
Tianze Wang

Purpose The current popular image processing technologies based on convolutional neural network have the characteristics of large computation, high storage cost and low accuracy for tiny defect detection, which is contrary to the high real-time and accuracy, limited computing resources and storage required by industrial applications. Therefore, an improved YOLOv4 named as YOLOv4-Defect is proposed aim to solve the above problems. Design/methodology/approach On the one hand, this study performs multi-dimensional compression processing on the feature extraction network of YOLOv4 to simplify the model and improve the feature extraction ability of the model through knowledge distillation. On the other hand, a prediction scale with more detailed receptive field is added to optimize the model structure, which can improve the detection performance for tiny defects. Findings The effectiveness of the method is verified by public data sets NEU-CLS and DAGM 2007, and the steel ingot data set collected in the actual industrial field. The experimental results demonstrated that the proposed YOLOv4-Defect method can greatly improve the recognition efficiency and accuracy and reduce the size and computation consumption of the model. Originality/value This paper proposed an improved YOLOv4 named as YOLOv4-Defect for the detection of surface defect, which is conducive to application in various industrial scenarios with limited storage and computing resources, and meets the requirements of high real-time and precision.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yongxiang Wu ◽  
Yili Fu ◽  
Shuguo Wang

Purpose This paper aims to use fully convolutional network (FCN) to predict pixel-wise antipodal grasp affordances for unknown objects and improve the grasp detection performance through multi-scale feature fusion. Design/methodology/approach A modified FCN network is used as the backbone to extract pixel-wise features from the input image, which are further fused with multi-scale context information gathered by a three-level pyramid pooling module to make more robust predictions. Based on the proposed unify feature embedding framework, two head networks are designed to implement different grasp rotation prediction strategies (regression and classification), and their performances are evaluated and compared with a defined point metric. The regression network is further extended to predict the grasp rectangles for comparisons with previous methods and real-world robotic grasping of unknown objects. Findings The ablation study of the pyramid pooling module shows that the multi-scale information fusion significantly improves the model performance. The regression approach outperforms the classification approach based on same feature embedding framework on two data sets. The regression network achieves a state-of-the-art accuracy (up to 98.9%) and speed (4 ms per image) and high success rate (97% for household objects, 94.4% for adversarial objects and 95.3% for objects in clutter) in the unknown object grasping experiment. Originality/value A novel pixel-wise grasp affordance prediction network based on multi-scale feature fusion is proposed to improve the grasp detection performance. Two prediction approaches are formulated and compared based on the proposed framework. The proposed method achieves excellent performances on three benchmark data sets and real-world robotic grasping experiment.


2009 ◽  
Vol 103 (1) ◽  
pp. 62-68
Author(s):  
Kathleen Cage Mittag ◽  
Sharon Taylor

Using activities to create and collect data is not a new idea. Teachers have been incorporating real-world data into their classes since at least the advent of the graphing calculator. Plenty of data collection activities and data sets exist, and the graphing calculator has made modeling data much easier. However, the authors were in search of a better physical model for a quadratic. We wanted students to see an actual parabola take shape in real time and then explore its characteristics, but we could not find such a hands-on model.


Sensor Review ◽  
2016 ◽  
Vol 36 (3) ◽  
pp. 277-286 ◽  
Author(s):  
Wenhao Zhang ◽  
Melvyn Lionel Smith ◽  
Lyndon Neal Smith ◽  
Abdul Rehman Farooq

Purpose This paper aims to introduce an unsupervised modular approach for eye centre localisation in images and videos following a coarse-to-fine, global-to-regional scheme. The design of the algorithm aims at excellent accuracy, robustness and real-time performance for use in real-world applications. Design/methodology/approach A modular approach has been designed that makes use of isophote and gradient features to estimate eye centre locations. This approach embraces two main modalities that progressively reduce global facial features to local levels for more precise inspections. A novel selective oriented gradient (SOG) filter has been specifically designed to remove strong gradients from eyebrows, eye corners and self-shadows, which sabotage most eye centre localisation methods. The proposed algorithm, tested on the BioID database, has shown superior accuracy. Findings The eye centre localisation algorithm has been compared with 11 other methods on the BioID database and six other methods on the GI4E database. The proposed algorithm has outperformed all the other algorithms in comparison in terms of localisation accuracy while exhibiting excellent real-time performance. This method is also inherently robust against head poses, partial eye occlusions and shadows. Originality/value The eye centre localisation method uses two mutually complementary modalities as a novel, fast, accurate and robust approach. In addition, other than assisting eye centre localisation, the SOG filter is able to resolve general tasks regarding the detection of curved shapes. From an applied point of view, the proposed method has great potentials in benefiting a wide range of real-world human-computer interaction (HCI) applications.


2018 ◽  
Vol 42 (7) ◽  
pp. 1010-1023 ◽  
Author(s):  
Jungwon Yeo ◽  
Louise Comfort ◽  
Kyujin Jung

PurposeThe purpose of this paper is to elaborate pros and cons of two coding methods: the rapid network assessment (RNA) and the manual content analysis (MCA). In particular, it focuses on the applicability of a new rapid data extraction and utilization method, which can contribute to the timely coordination of disaster and emergency response operations.Design/methodology/approachUtilizing the data set of textual information on the Superstorm Sandy response in 2012, retrieved from the LexisNexis Academic news archive, the two coding methods, MCA and RNA, are subjected to social network analysis.FindingsThe analysis results indicate a significant level of similarity between the data collected using these two methods. The findings indicate that the RNA method could be effectively used to extract megabytes of electronic data, characterize the emerging disaster response network and suggest timely policy implications for managers and practitioners during actual emergency response operations and coordination processes.Originality/valueConsidering the growing needs for the timely assessment of real-time disaster response systems and the emerging doubts regarding the effectiveness of the RNA method, this study contributes to uncovering the potential of the RNA method to extract relevant data from the megabytes of digitally available information. Also this research illustrates the applicability of MCA for assessing real-time disaster response networks by comparing network analysis results from data sets built by both the RNA and the MCA.


2016 ◽  
Vol 17 (2) ◽  
pp. 203-210 ◽  
Author(s):  
Margie Jantti ◽  
Jennifer Heath

Purpose – The purpose of this paper is to provide an overview of the development of an institution wide approach to learning analytics at the University of Wollongong (UOW) and the inclusion of library data drawn from the Library Cube. Design/methodology/approach – The Student Support and Education Analytics team at UOW is tasked with creating policy, frameworks and infrastructure for the systematic capture, mapping and analysis of data from the across the university. The initial data set includes: log file data from Moodle sites, Library Cube, student administration data, tutorials and student support service usage data. Using the learning analytics data warehouse UOW is developing new models for analysis and visualisation with a focus on the provision of near real-time data to academic staff and students to optimise learning opportunities. Findings – The distinct advantage of the learning analytics model is that the selected data sets are updated weekly, enabling near real-time monitoring and intervention where required. Inclusion of library data with the other often disparate data sets from across the university has enabled development of a comprehensive platform for learning analytics. Future work will include the development of predictive models using the rapidly growing learning analytics data warehouse. Practical implications – Data warehousing infrastructure, the systematic capture and exporting of relevant library data sets are requisite for the consideration of library data in learning analytics. Originality/value – What was not anticipated five years ago when the Value Cube was first realised, was the development of learning analytic services at UOW. The Cube afforded University of Wollongong Library considerable advantage: the framework for data harvesting and analysis was established, ready for inclusion within learning analytics data sets and subsequent reporting to faculty.


Author(s):  
Palash Goyal ◽  
Divya Choudhary ◽  
Shalini Ghosh

Classification algorithms in machine learning often assume a flat label space. However, most real world data have dependencies between the labels, which can often be captured by using a hierarchy. Utilizing this relation can help develop a model capable of satisfying the dependencies and improving model accuracy and interpretability. Further, as different levels in the hierarchy correspond to different granularities, penalizing each label equally can be detrimental to model learning. In this paper, we propose a loss function, hierarchical curriculum loss, with two properties: (i) satisfy hierarchical constraints present in the label space, and (ii) provide non-uniform weights to labels based on their levels in the hierarchy, learned implicitly by the training paradigm. We theoretically show that the proposed hierarchical class-based curriculum loss is a tight bound of 0-1 loss among all losses satisfying the hierarchical constraints. We test our loss function on real world image data sets, and show that it significantly outperforms state-of-the-art baselines.


2019 ◽  
Vol 30 (1) ◽  
pp. 329-355 ◽  
Author(s):  
Dawn M. Russell ◽  
David Swanson

Purpose The purpose of this paper is to investigate the mediators that occupy the gap between information processing theory and supply chain agility. In today’s Mach speed business environment, managers often install new technology and expect an agile supply chain when they press<Enter>. This study reveals the naivety of such an approach, which has allowed new technology to be governed by old processes. Design/methodology/approach This work takes a qualitative approach to the dynamic conditions surrounding information processing and its connection to supply chain agility through the assessment of 60 exemplar cases. The situational conditions that have created the divide between information processing and supply chain agility are studied. Findings The agility adaptation typology (AAT) defining three types of adaptations and their mediating constructs is presented. Type 1: information processing, is generally an exercise in synchronization that can be used to support assimilation. Type 2: demand sensing, is where companies are able to incorporate real-time data into everyday processes to better understand demand and move toward a real-time environment. Type 3: supply chain agility, requires fundamentally new thinking in the areas of transformation, mindset and culture. Originality/value This work describes the reality of today’s struggle to achieve supply chain agility, providing guidelines and testable propositions, and at the same time, avoids “ivory tower prescriptions,” which exclude the real world details from the research process (Meredith, 1993). By including the messy real world details, while difficult to understand and explain, the authors are able to make strides in the AAT toward theory that explains and guides the manager’s everyday reality with all of its messy real world details.


2015 ◽  
Vol 27 (3) ◽  
pp. 417-433 ◽  
Author(s):  
Yuko Mesuda ◽  
Shigeru Inui ◽  
Yosuke Horiba

Purpose – Draping is one method used in clothing design. It is important to virtualize draping in real time, and virtual cloth handling is a key technology for this purpose. A mouse is often used for real-time cloth handling in many studies. However, gesture manipulation is more realistic than movements using the mouse. The purpose of this paper is to demonstrate virtual cloth manipulation using hand gestures in the real world. Design/methodology/approach – In this study, the authors demonstrate three types of manipulation: moving, cutting, and attaching. The user’s hand coordinates are obtained with a Kinect, and the cloth model is manipulated by them. The cloth model is moved based on the position of the hand coordinates. The cloth model is cut along a cut line calculated from the hand coordinates. In attaching the cloth model, it is mapped to a dummy model and then part of the cloth model is fixed and another part is released. Findings – This method can move the cloth model according to the motion of the hands. The authors have succeeded in cutting the cloth model based on the hand trajectory. The cloth model can be attached to the dummy model and its form is changed along the dummy model shape. Originality/value – Cloth handling in many studies is based on indirect manipulation using a mouse. In this study, the cloth model is manipulated according to hand motion in the real world in real time.


Sign in / Sign up

Export Citation Format

Share Document