Visual Tracking with Discriminative Correlation Filters

D3S - A Discriminative Single Shot Segmentation Tracker


Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker -- D3S, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve high robustness and online target segmentation. Without per-dataset finetuning and trained only for segmentation as the primary output, D3S outperforms all trackers on VOT2016, VOT2018 and GOT-10k benchmarks and performs close to the state-of-the-art trackers on the TrackingNet. D3S outperforms the leading segmentation tracker SiamMask on video object segmentation benchmark and performs on par with top video object segmentation algorithms, while running an order of magnitude faster, close to real-time.

Paper: CVPR2020
Source code: Github

Alan Lukežič, Jiří Matas and Matej Kristan.
D3S - A Discriminative Single Shot Segmentation Tracker.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

Demo video on YouTube:

FuCoLoT - A Fully-Correlational Long-Term Tracker

Channel-Spatial-Reliability CF

A Fully Correlational Long-term Tracker (FuCoLoT) exploits the novel DCF constrained filter learning method to design a detector that is able to re-detect the target in the whole image efficiently. FuCoLoT maintains several correlation filters trained on different time scales that act as the detector components. A novel mechanism based on the correlation response is used for tracking failure estimation. FuCoLoT achieves state-of-the-art results on standard short-term benchmarks and it outperforms the current best-performing tracker on the long-term UAV20L benchmark by over 19%. It has an order of magnitude smaller memory footprint than its best-performing competitors and runs at 15fps in a single CPU thread.

Paper: ACCV18
Source code: Github

Alan Lukežič, Luka Čehovin Zajc, Tomáš Vojíř, Jiří Matas and Matej Kristan.
FuCoLoT - A Fully-Correlational Long-Term Tracker.
Asian Conference on Computer Vision (ACCV), 2018.

Fast Spatially Regularized Correlation Filter Tracker


Discriminative correlation filters (DCF) have attracted significant attention of the tracking community. Standard formulation of the DCF affords a closed form solution, but is not robust and constrained to learning and detection using a relatively small search region. Spatial regularization was proposed to address learning from larger regions. But this prohibits a closed form solution and leads to an iterative optimization with significant computational load, resulting in slow model learning and tracking. We propose to reformulate the spatially regularized filter cost function such that it offers an efficient optimization. This significantly speeds up the tracker (approximately 14 times) and results in real-time tracking at the same or better accuracy.

Paper: ERK18

Alan Lukežič, Luka Čehovin Zajc and Matej Kristan.
Fast Spatially Regularized Correlation Filter Tracker.
27th International Electrotechnical and Computer Science Conference (ERK), 2018.

Discriminative Correlation Filter with Channel and Spatial Reliability

Channel-Spatial-Reliability CF

Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a novel learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to the part of the object suitable for tracking. This both allows to enlarge the search region and improves tracking of non-rectangular objects. Reliability scores reflect channel-wise quality of the learned filters and are used as feature weighting coefficients in localization. Experimentally, with only two simple standard features, HoGs and Colornames, the novel CSR-DCF method -- DCF with Channel and Spatial Reliability -- achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB100. The CSR-DCF runs in real-time on a CPU.

Papers: CVPR17, IJCV
Source code: Github
C++ version of the tracker is available in OpenCV contrib repository (tracking module) Github

Alan Lukežič, Tomáš Vojíř, Luka Čehovin, Jiří Matas and Matej Kristan.
Discriminative Correlation Filter Tracker with Channel and Spatial Reliability.
International Journal of Computer Vision, 2018.

Alan Lukežič, Tomáš Vojíř, Luka Čehovin, Jiří Matas and Matej Kristan.
Discriminative Correlation Filter with Channel and Spatial Reliability.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Demo video on YouTube:

Deformable Parts Correlation Filters for Robust Visual Tracking

Deformable-parts CF

Deformable parts models show a great potential in tracking by principally addressing non-rigid object deformations and self occlusions, but according to recent benchmarks, they often lag behind the holistic approaches. The reason is that potentially large number of degrees of freedom have to be estimated for object localization and simplifications of the constellation topology are often assumed to make the inference tractable. We present a new formulation of the constellation model with correlation filters that treats the geometric and visual constraints within a single convex cost function and derive a highly efficient optimization for MAP inference of a fully-connected constellation. We propose a tracker that models the object at two levels of detail. The coarse level corresponds a root correlation filter and a novel color model for approximate object localization, while the mid-level representation is composed of the new deformable constellation of correlation filters that refine the object location. The resulting tracker is rigorously analyzed on a highly challenging OTB, VOT2014 and VOT2015 benchmarks, exhibits a state-of-the-art performance and runs in real-time.

Paper: TCyB
Source: Link