Automated Event Detection and Classification in Soccer: The Potential of Using Multiple Modalities

Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and evaluates different approaches based on neural networks where we combine visual features with audio features to detect (spot) and classify events in soccer videos. We employ model fusion to combine different modalities such as video and audio, and test these combinations against different state-of-the-art models on the SoccerNet dataset. The results show that a multimodal approach is beneficial. We also analyze how the tolerance for delays in classification and spotting time, and the tolerance for prediction accuracy, influence the results. Our experiments show that using multiple modalities improves event detection performance for certain types of events.

Download Full-text

Visual Weather Property Prediction by Multi-Task Learning and Two-Dimensional RNNs

Atmosphere ◽

10.3390/atmos12050584 ◽

2021 ◽

Vol 12 (5) ◽

pp. 584

Author(s):

Wei-Ta Chu ◽

Yu-Hsuan Liang ◽

Kai-Chia Ho

Keyword(s):

Neural Networks ◽

Visual Information ◽

Temporal Evolution ◽

State Of The Art ◽

Image Data ◽

Visual Features ◽

Two Dimensional ◽

Estimation Model ◽

Property Estimation ◽

Task Learning

We attempted to employ convolutional neural networks to extract visual features and developed recurrent neural networks for weather property estimation using only image data. Four common weather properties are estimated, i.e., temperature, humidity, visibility, and wind speed. Based on the success of previous works on temperature prediction, we extended them in terms of two aspects. First, by considering the effectiveness of deep multi-task learning, we jointly estimated four weather properties on the basis of the same visual information. Second, we propose that weather property estimations considering temporal evolution can be conducted from two perspectives, i.e., day-wise or hour-wise. A two-dimensional recurrent neural network is thus proposed to unify the two perspectives. In the evaluation, we show that better prediction accuracy can be obtained compared to the state-of-the-art models. We believe that the proposed approach is the first visual weather property estimation model trained based on multi-task learning.

Download Full-text

Boosted Transformer for Image Captioning

Applied Sciences ◽

10.3390/app9163260 ◽

2019 ◽

Vol 9 (16) ◽

pp. 3260 ◽

Cited By ~ 1

Author(s):

Jiangyun Li ◽

Peng Yao ◽

Longteng Guo ◽

Weicun Zhang

Keyword(s):

Visual Information ◽

State Of The Art ◽

The Self ◽

Visual Features ◽

Image Captioning ◽

Decoder Architecture ◽

Semantic Concepts ◽

Transformer Model ◽

Internal Relationships ◽

Auxiliary Module

Image captioning attempts to generate a description given an image, usually taking Convolutional Neural Network as the encoder to extract the visual features and a sequence model, among which the self-attention mechanism has achieved advanced progress recently, as the decoder to generate descriptions. However, this predominant encoder-decoder architecture has some problems to be solved. On the encoder side, without the semantic concepts, the extracted visual features do not make full use of the image information. On the decoder side, the sequence self-attention only relies on word representations, lacking the guidance of visual information and easily influenced by the language prior. In this paper, we propose a novel boosted transformer model with two attention modules for the above-mentioned problems, i.e., “Concept-Guided Attention” (CGA) and “Vision-Guided Attention” (VGA). Our model utilizes CGA in the encoder, to obtain the boosted visual features by integrating the instance-level concepts into the visual features. In the decoder, we stack VGA, which uses the visual information as a bridge to model internal relationships among the sequences and can be an auxiliary module of sequence self-attention. Quantitative and qualitative results on the Microsoft COCO dataset demonstrate the better performance of our model than the state-of-the-art approaches.

Download Full-text

Cooperative Multimodal Approach to Depression Detection in Twitter

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301110 ◽

2019 ◽

Vol 33 ◽

pp. 110-117 ◽

Cited By ~ 1

Author(s):

Tao Gui ◽

Liang Zhu ◽

Qi Zhang ◽

Minlong Peng ◽

Xu Zhou ◽

...

Keyword(s):

Visual Information ◽

State Of The Art ◽

Error Reduction ◽

Multimodal Approach ◽

Robust Performance ◽

Agent Model ◽

Depression Detection ◽

Multi Agent ◽

Detection Of Depression ◽

Do So

The advent of social media has presented a promising new opportunity for the early detection of depression. To do so effectively, there are two challenges to overcome. The first is that textual and visual information must be jointly considered to make accurate inferences about depression. The second challenge is that due to the variety of content types posted by users, it is difficult to extract many of the relevant indicator texts and images. In this work, we propose the use of a novel cooperative multi-agent model to address these challenges. From the historical posts of users, the proposed method can automatically select related indicator texts and images. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods by a large margin (over 30% error reduction). In several experiments and examples, we also verify that the selected posts can successfully indicate user depression, and our model can obtained a robust performance in realistic scenarios.

Download Full-text

Multi-Evidence and Multi-Modal Fusion Network for Ground-Based Cloud Recognition

Remote Sensing ◽

10.3390/rs12030464 ◽

2020 ◽

Vol 12 (3) ◽

pp. 464

Author(s):

Shuang Liu ◽

Mei Li ◽

Zhong Zhang ◽

Baihua Xiao ◽

Tariq S. Durrani

Keyword(s):

Classification Accuracy ◽

Visual Information ◽

Deep Neural Networks ◽

State Of The Art ◽

Visual Features ◽

Unified Framework ◽

Global Features ◽

Heterogeneous Features ◽

Novel Method ◽

Global And Local

In recent times, deep neural networks have drawn much attention in ground-based cloud recognition. Yet such kind of approaches simply center upon learning global features from visual information, which causes incomplete representations for ground-based clouds. In this paper, we propose a novel method named multi-evidence and multi-modal fusion network (MMFN) for ground-based cloud recognition, which could learn extended cloud information by fusing heterogeneous features in a unified framework. Namely, MMFN exploits multiple pieces of evidence, i.e., global and local visual features, from ground-based cloud images using the main network and the attentive network. In the attentive network, local visual features are extracted from attentive maps which are obtained by refining salient patterns from convolutional activation maps. Meanwhile, the multi-modal network in MMFN learns multi-modal features for ground-based cloud. To fully fuse the multi-modal and multi-evidence visual features, we design two fusion layers in MMFN to incorporate multi-modal features with global and local visual features, respectively. Furthermore, we release the first multi-modal ground-based cloud dataset named MGCD which not only contains the ground-based cloud images but also contains the multi-modal information corresponding to each cloud image. The MMFN is evaluated on MGCD and achieves a classification accuracy of 88.63% comparative to the state-of-the-art methods, which validates its effectiveness for ground-based cloud recognition.

Download Full-text

Utterance Level Feature Aggregation with Deep Metric Learning for Speech Emotion Recognition

Sensors ◽

10.3390/s21124233 ◽

2021 ◽

Vol 21 (12) ◽

pp. 4233

Author(s):

Bogdan Mocanu ◽

Ruxandra Tapu ◽

Titus Zaharia

Keyword(s):

Emotion Recognition ◽

Loss Function ◽

State Of The Art ◽

Disease Diagnosis ◽

Data Representation ◽

Speech Emotion Recognition ◽

Audio Features ◽

Global Accuracy ◽

Space Data ◽

Art Techniques

Emotion is a form of high-level paralinguistic information that is intrinsically conveyed by human speech. Automatic speech emotion recognition is an essential challenge for various applications; including mental disease diagnosis; audio surveillance; human behavior understanding; e-learning and human–machine/robot interaction. In this paper, we introduce a novel speech emotion recognition method, based on the Squeeze and Excitation ResNet (SE-ResNet) model and fed with spectrogram inputs. In order to overcome the limitations of the state-of-the-art techniques, which fail in providing a robust feature representation at the utterance level, the CNN architecture is extended with a trainable discriminative GhostVLAD clustering layer that aggregates the audio features into compact, single-utterance vector representation. In addition, an end-to-end neural embedding approach is introduced, based on an emotionally constrained triplet loss function. The loss function integrates the relations between the various emotional patterns and thus improves the latent space data representation. The proposed methodology achieves 83.35% and 64.92% global accuracy rates on the RAVDESS and CREMA-D publicly available datasets, respectively. When compared with the results provided by human observers, the gains in global accuracy scores are superior to 24%. Finally, the objective comparative evaluation with state-of-the-art techniques demonstrates accuracy gains of more than 3%.

Download Full-text

Efficient Rank-Based Diffusion Process with Assured Convergence

Journal of Imaging ◽

10.3390/jimaging7030049 ◽

2021 ◽

Vol 7 (3) ◽

pp. 49

Author(s):

Daniel Carlos Guimarães Pedronette ◽

Lucas Pascotti Valem ◽

Longin Jan Latecki

Keyword(s):

Diffusion Process ◽

Learning Strategies ◽

State Of The Art ◽

Representation Learning ◽

Theoretical Background ◽

High Dimensional ◽

Visual Features ◽

Learning Approaches ◽

Previous Decade ◽

Asymptotic Complexity

Visual features and representation learning strategies experienced huge advances in the previous decade, mainly supported by deep learning approaches. However, retrieval tasks are still performed mainly based on traditional pairwise dissimilarity measures, while the learned representations lie on high dimensional manifolds. With the aim of going beyond pairwise analysis, post-processing methods have been proposed to replace pairwise measures by globally defined measures, capable of analyzing collections in terms of the underlying data manifold. The most representative approaches are diffusion and ranked-based methods. While the diffusion approaches can be computationally expensive, the rank-based methods lack theoretical background. In this paper, we propose an efficient Rank-based Diffusion Process which combines both approaches and avoids the drawbacks of each one. The obtained method is capable of efficiently approximating a diffusion process by exploiting rank-based information, while assuring its convergence. The algorithm exhibits very low asymptotic complexity and can be computed regionally, being suitable to outside of dataset queries. An experimental evaluation conducted for image retrieval and person re-ID tasks on diverse datasets demonstrates the effectiveness of the proposed approach with results comparable to the state-of-the-art.

Download Full-text

Stand-Alone Containment Analysis of the PHÉBUS FPT-1 Test With the ASTEC and the MELCOR Codes

Volume 4: Computational Fluid Dynamics (CFD) and Coupled Codes; Decontamination and Decommissioning, Radiation Protection, Shielding, and Waste Management; Workforce Development, Nuclear Education and Public Acceptance; Mitigation Strategies for Beyond Design Basis Events; Risk Management ◽

10.1115/icone24-60184 ◽

2016 ◽

Author(s):

Bruno Gonfiotti ◽

Sandro Paci

Keyword(s):

State Of The Art ◽

External Environment ◽

Fission Products ◽

Severe Accident ◽

Sensitivity Analyses ◽

Complex Task ◽

Primary Coolant ◽

Control Volumes ◽

Coolant System

The estimation of Fission Products (FPs) release from the containment system of a nuclear plant to the external environment during a Severe Accident (SA) is a quite complex task. In the last 30–40 years several efforts were made to understand and to investigate the different phenomena occurring in such a kind of accidents in the primary coolant system and in the containment. These researches moved along two tracks: understanding of involved phenomenologies through the execution of different experiments, and creation of numerical codes capable to simulate such phenomena. These codes are continuously developed to reflect the actual SA state-of-the-art, but it is necessary to continuously check that modifications and improvements are able to increase the quality of the obtained results. For this purpose, a continuous verification and validation work should be carried out. Therefore, the aim of the present work is to re-analyze the Phébus FPT-1 test employing the ASTEC (F) and MELCOR (USA) codes. The analysis focuses on the stand-alone containment aspects of the test, and three different modellisations of the containment vessel have been developed showing that at least 15/20 Control Volumes (CVs) are necessary for the spatial schematization to correctly predict thermal-hydraulics and the aerosol behavior. Furthermore, the paper summarizes the main thermal-hydraulic results, and presents different sensitivity analyses carried out on the aerosols and FPs behavior.

Download Full-text

SHEDR: An End-to-End Deep Neural Event Detection and Recommendation Framework for Hyperlocal News Using Social Media

INFORMS Journal on Computing ◽

10.1287/ijoc.2021.1112 ◽

2021 ◽

Author(s):

Yuheng Hu ◽

Yili Hong

Keyword(s):

Neural Network ◽

Social Media ◽

Deep Learning ◽

Event Detection ◽

Large Scale ◽

Short Term Memory ◽

State Of The Art ◽

Neural Network Models ◽

Neural Event ◽

End To End

Residents often rely on newspapers and television to gather hyperlocal news for community awareness and engagement. More recently, social media have emerged as an increasingly important source of hyperlocal news. Thus far, the literature on using social media to create desirable societal benefits, such as civic awareness and engagement, is still in its infancy. One key challenge in this research stream is to timely and accurately distill information from noisy social media data streams to community members. In this work, we develop SHEDR (social media–based hyperlocal event detection and recommendation), an end-to-end neural event detection and recommendation framework with a particular use case for Twitter to facilitate residents’ information seeking of hyperlocal events. The key model innovation in SHEDR lies in the design of the hyperlocal event detector and the event recommender. First, we harness the power of two popular deep neural network models, the convolutional neural network (CNN) and long short-term memory (LSTM), in a novel joint CNN-LSTM model to characterize spatiotemporal dependencies for capturing unusualness in a region of interest, which is classified as a hyperlocal event. Next, we develop a neural pairwise ranking algorithm for recommending detected hyperlocal events to residents based on their interests. To alleviate the sparsity issue and improve personalization, our algorithm incorporates several types of contextual information covering topic, social, and geographical proximities. We perform comprehensive evaluations based on two large-scale data sets comprising geotagged tweets covering Seattle and Chicago. We demonstrate the effectiveness of our framework in comparison with several state-of-the-art approaches. We show that our hyperlocal event detection and recommendation models consistently and significantly outperform other approaches in terms of precision, recall, and F-1 scores. Summary of Contribution: In this paper, we focus on a novel and important, yet largely underexplored application of computing—how to improve civic engagement in local neighborhoods via local news sharing and consumption based on social media feeds. To address this question, we propose two new computational and data-driven methods: (1) a deep learning–based hyperlocal event detection algorithm that scans spatially and temporally to detect hyperlocal events from geotagged Twitter feeds; and (2) A personalized deep learning–based hyperlocal event recommender system that systematically integrates several contextual cues such as topical, geographical, and social proximity to recommend the detected hyperlocal events to potential users. We conduct a series of experiments to examine our proposed models. The outcomes demonstrate that our algorithms are significantly better than the state-of-the-art models and can provide users with more relevant information about the local neighborhoods that they live in, which in turn may boost their community engagement.

Download Full-text

Organization of the Drosophila larval visual circuit

eLife ◽

10.7554/elife.28387 ◽

2017 ◽

Vol 6 ◽

Cited By ~ 37

Author(s):

Ivan Larderet ◽

Pauline MJ Fritsch ◽

Nanae Gendre ◽

G Larisa Neagu-Maier ◽

Richard D Fetter ◽

...

Keyword(s):

Visual Processing ◽

Visual Information ◽

Neural Circuit ◽

Visual Features ◽

Environmental Cues ◽

Wiring Diagram ◽

Visual Systems ◽

Drosophila Larvae ◽

Organizational Principles ◽

Numerical Complexity

Visual systems transduce, process and transmit light-dependent environmental cues. Computation of visual features depends on photoreceptor neuron types (PR) present, organization of the eye and wiring of the underlying neural circuit. Here, we describe the circuit architecture of the visual system of Drosophila larvae by mapping the synaptic wiring diagram and neurotransmitters. By contacting different targets, the two larval PR-subtypes create two converging pathways potentially underlying the computation of ambient light intensity and temporal light changes already within this first visual processing center. Locally processed visual information then signals via dedicated projection interneurons to higher brain areas including the lateral horn and mushroom body. The stratified structure of the larval optic neuropil (LON) suggests common organizational principles with the adult fly and vertebrate visual systems. The complete synaptic wiring diagram of the LON paves the way to understanding how circuits with reduced numerical complexity control wide ranges of behaviors.

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.21203/rs.3.rs-91905/v1 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Filip Ferdinand ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Data Science ◽

State Of The Art ◽

Hybrid Models ◽

The Other ◽

Learning Models ◽

Comprehensive Review

Abstract This paper provides the state of the art of data science in economics. Through a novel taxonomy of applications and methods advances in data science are investigated. The data science advances are investigated in three individual classes of deep learning models, ensemble models, and hybrid models. Application domains include stock market, marketing, E-commerce, corporate banking, and cryptocurrency. Prisma method, a systematic literature review methodology is used to ensure the quality of the survey. The findings revealed that the trends are on advancement of hybrid models as more than 51% of the reviewed articles applied hybrid model. On the other hand, it is found that based on the RMSE accuracy metric, hybrid models had higher prediction accuracy than other algorithms. While it is expected the trends go toward the advancements of deep learning models.

Download Full-text