scholarly journals MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

Author(s):  
Patrick Dendorfer ◽  
Aljos̆a Os̆ep ◽  
Anton Milan ◽  
Konrad Schindler ◽  
Daniel Cremers ◽  
...  

AbstractStandardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched in late 2014, to collect existing and new data and create a framework for the standardized evaluation of multiple object tracking methods. The benchmark is focused on multiple people tracking, since pedestrians are by far the most studied object in the tracking community, with applications ranging from robot navigation to self-driving cars. This paper collects the first three releases of the benchmark: (i) MOT15, along with numerous state-of-the-art results that were submitted in the last years, (ii) MOT16, which contains new challenging videos, and (iii) MOT17, that extends MOT16 sequences with more precise labels and evaluates tracking performance on three different object detectors. The second and third release not only offers a significant increase in the number of labeled boxes, but also provide labels for multiple object classes beside pedestrians, as well as the level of visibility for every single object of interest. We finally provide a categorization of state-of-the-art trackers and a broad error analysis. This will help newcomers understand the related work and research trends in the MOT community, and hopefully shed some light into potential future research directions.

2020 ◽  
Vol 123 (5) ◽  
pp. 1630-1644
Author(s):  
Nicholas S. Bland ◽  
Jason B. Mattingley ◽  
Martin V. Sale

Using a multiple object tracking paradigm, we were able to manipulate the need for interhemispheric integration on a per-trial basis, while also having an objective measure of integration efficacy (i.e., tracking performance). We show that tracking performance reflects a cost of integration, which correlates with individual differences in interhemispheric EEG coherence. Gamma coherence appears to uniquely benefit between-hemifield tracking, predicting performance both across participants and across trials.


2021 ◽  
Author(s):  
Jonathan Michael Paul Wilbiks

The capacities of unimodal processes such as visual and auditory working memory, multiple object tracking, and attention have been heavily researched in the psychological science literature. In recent years there has been an increase in the amount of research into multimodal processes such as the integration of auditory and visual stimuli, but to my knowledge, there has only been a single published article to date investigating the capacity of audiovisual integration, which found that the capacity of audiovisual integration is limited to a single item. The purpose of this dissertation is to elucidate some of the factors that contribute to the capacity of audiovisual integration, and to illustrate that the interaction of these respective factors makes the capacity a fluid, dynamic property. Chapter 1 reviews the literature coming from multimodal integration research, as well as from unimodal topics that are pertinent to the factors that are being manipulated in the dissertation: namely, working memory, multiple object tracking, and attention. Chapter 2 considers the paradigmatic structure employed by the single study on audiovisual integration capacity and breaks down the component factors of proactive interference and temporal predictability, which contribute to the environmental complexity of the scenario, in the first illustration of the flexibility of capacity of audiovisual integration. Chapter 3 explores the effects of stimulus factors, considering the effects of crossmodal congruency and perceptual chunking on audiovisual integration capacity. Chapter 4 explores the variability of audiovisual integration capacity within an individual over time by means of a training study. Chapter 5 summarizes the findings of the research within, discusses some overarching themes with regard to audiovisual integration capacity including how information is processed through integration and how these findings could be applied to real-life scenarios, suggests some avenues for future research such as further manipulations of modality and SOA, and draws conclusions and answers to the research questions. This research extends what is known about audiovisual integration capacity, both in terms of its numerical value and the factors that play a role in its establishment. It also demonstrates that there is no overarching limitation on the capacity of audiovisual integration, as the initial paper on this topic suggests, but rather that it is a process subject to multiple factors, and can be changed depending on the situation in which integration is occurring.


Author(s):  
Shinfeng D. Lin ◽  
Tingyu Chang ◽  
Wensheng Chen

In computer vision, multiple object tracking (MOT) plays a crucial role in solving many important issues. A common approach of MOT is tracking by detection. Tracking by detection includes occlusions, motion prediction, and object re-identification. From the video frames, a set of detections is extracted for leading the tracking process. These detections are usually associated together for assigning the same identifications to bounding boxes holding the same target. In this article, MOT using YOLO-based detector is proposed. The authors’ method includes object detection, bounding box regression, and bounding box association. First, the YOLOv3 is exploited to be an object detector. The bounding box regression and association is then utilized to forecast the object’s position. To justify their method, two open object tracking benchmarks, 2D MOT2015 and MOT16, were used. Experimental results demonstrate that our method is comparable to several state-of-the-art tracking methods, especially in the impressive results of MOT accuracy and correctly identified detections.


2021 ◽  
Author(s):  
Jonathan Michael Paul Wilbiks

The capacities of unimodal processes such as visual and auditory working memory, multiple object tracking, and attention have been heavily researched in the psychological science literature. In recent years there has been an increase in the amount of research into multimodal processes such as the integration of auditory and visual stimuli, but to my knowledge, there has only been a single published article to date investigating the capacity of audiovisual integration, which found that the capacity of audiovisual integration is limited to a single item. The purpose of this dissertation is to elucidate some of the factors that contribute to the capacity of audiovisual integration, and to illustrate that the interaction of these respective factors makes the capacity a fluid, dynamic property. Chapter 1 reviews the literature coming from multimodal integration research, as well as from unimodal topics that are pertinent to the factors that are being manipulated in the dissertation: namely, working memory, multiple object tracking, and attention. Chapter 2 considers the paradigmatic structure employed by the single study on audiovisual integration capacity and breaks down the component factors of proactive interference and temporal predictability, which contribute to the environmental complexity of the scenario, in the first illustration of the flexibility of capacity of audiovisual integration. Chapter 3 explores the effects of stimulus factors, considering the effects of crossmodal congruency and perceptual chunking on audiovisual integration capacity. Chapter 4 explores the variability of audiovisual integration capacity within an individual over time by means of a training study. Chapter 5 summarizes the findings of the research within, discusses some overarching themes with regard to audiovisual integration capacity including how information is processed through integration and how these findings could be applied to real-life scenarios, suggests some avenues for future research such as further manipulations of modality and SOA, and draws conclusions and answers to the research questions. This research extends what is known about audiovisual integration capacity, both in terms of its numerical value and the factors that play a role in its establishment. It also demonstrates that there is no overarching limitation on the capacity of audiovisual integration, as the initial paper on this topic suggests, but rather that it is a process subject to multiple factors, and can be changed depending on the situation in which integration is occurring.


2020 ◽  
Author(s):  
Jonathan Wilbiks ◽  
Annika Beatteay

There has been a recent increase in individual differences research within the field of audio-visual perception (Spence & Squire, 2003), and furthering the understanding of audiovisual integration capacity with an individual differences approach is an important facet within this line of research. Across four experiments, participants were asked to complete an audiovisual integration capacity task (cf. Van der Burg et al., 2013; Wilbiks & Dyson, 2016; 2018), along with differing combinations of additional perceptual tasks. Experiment 1 employed a multiple object tracking task and a visual working memory task. Experiment 2 compared performance on the capacity task with that of the attention network test. Experiment 3 examined participants’ focus in space through a Navon task and vigilance through time. Having completed this exploratory work, in Experiment 4 we collected data again from the tasks that were found to correlate significantly across the first three experiments and entered them into a regression model to predict capacity. The current research provides a preliminary explanation of the vast individual differences seen in audiovisual integration capacity in previous research, showing that by considering an individual’s multiple object tracking span, focus in space, and attentional factors, we can account for up to 34.3% of the observed variation in capacity. Future research should seek to examine higher-level differences between individuals that may contribute to audiovisual integration capacity, including neurodevelopmental and mental health differences.


Author(s):  
K. Botterill ◽  
R. Allen ◽  
P. McGeorge

The Multiple-Object Tracking paradigm has most commonly been utilized to investigate how subsets of targets can be tracked from among a set of identical objects. Recently, this research has been extended to examine the function of featural information when tracking is of objects that can be individuated. We report on a study whose findings suggest that, while participants can only hold featural information for roughly two targets this task does not affect tracking performance detrimentally and points to a discontinuity between the cognitive processes that subserve spatial location and featural information.


2010 ◽  
Author(s):  
Todd S. Horowitz ◽  
Michael A. Cohen ◽  
Yair Pinto ◽  
Piers D. L. Howe

Sign in / Sign up

Export Citation Format

Share Document