Coder Social home page Coder Social logo

facebookresearch / co-tracker Goto Github PK

View Code? Open in Web Editor NEW
2.5K 26.0 165.0 52.65 MB

CoTracker is a model for tracking any point (pixel) on a video.

Home Page: https://co-tracker.github.io/

License: Other

Python 1.17% Jupyter Notebook 98.83% Shell 0.01%
optical-flow point-tracking track-anything

co-tracker's Introduction

CoTracker: It is Better to Track Together

Meta AI Research, GenAI; University of Oxford, VGG

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

Open In Colab Spaces

CoTracker is a fast transformer-based model that can track any point in a video. It brings to tracking some of the benefits of Optical Flow.

CoTracker can track:

  • Any pixel in a video
  • A quasi-dense set of pixels together
  • Points can be manually selected or sampled on a grid in any video frame

Try these tracking modes for yourself with our Colab demo or in the Hugging Face Space ๐Ÿค—.

Updates:

  • [December 27, 2023] ๐Ÿ“ฃ CoTracker2 is now available! It can now track many more (up to 265*265!) points jointly and it has a cleaner and more memory-efficient implementation. It also supports online processing. See the updated paper for more details. The old version remains available here.

  • [September 5, 2023] ๐Ÿ“ฃ You can now run our Gradio demo locally!

Quick start

The easiest way to use CoTracker is to load a pretrained model from torch.hub:

Offline mode:

pip install imageio[ffmpeg], then:

import torch
# Download the video
url = 'https://github.com/facebookresearch/co-tracker/blob/main/assets/apple.mp4'

import imageio.v3 as iio
frames = iio.imread(url, plugin="FFMPEG")  # plugin="pyav"

device = 'cuda'
grid_size = 10
video = torch.tensor(frames).permute(0, 3, 1, 2)[None].float().to(device)  # B T C H W

# Run Offline CoTracker:
cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker2").to(device)
pred_tracks, pred_visibility = cotracker(video, grid_size=grid_size) # B T N 2,  B T N 1

Online mode:

cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker2_online").to(device)

# Run Online CoTracker, the same model with a different API:
# Initialize online processing
cotracker(video_chunk=video, is_first_step=True, grid_size=grid_size)  

# Process the video
for ind in range(0, video.shape[1] - cotracker.step, cotracker.step):
    pred_tracks, pred_visibility = cotracker(
        video_chunk=video[:, ind : ind + cotracker.step * 2]
    )  # B T N 2,  B T N 1

Online processing is more memory-efficient and allows for the processing of longer videos. However, in the example provided above, the video length is known! See the online demo for an example of tracking from an online stream with an unknown video length.

Visualize predicted tracks:

pip install matplotlib, then:

from cotracker.utils.visualizer import Visualizer

vis = Visualizer(save_dir="./saved_videos", pad_value=120, linewidth=3)
vis.visualize(video, pred_tracks, pred_visibility)

We offer a number of other ways to interact with CoTracker:

  1. Interactive Gradio demo:
  2. Jupyter notebook:
  3. You can install CoTracker locally and then:
    • Run an offline demo with 10 โจ‰ 10 points sampled on a grid on the first frame of a video (results will be saved to ./saved_videos/demo.mp4)):

      python demo.py --grid_size 10
    • Run an online demo:

      python online_demo.py

A GPU is strongly recommended for using CoTracker locally.

Installation Instructions

You can use a Pretrained Model via PyTorch Hub, as described above, or install CoTracker from this GitHub repo. This is the best way if you need to run our local demo or evaluate/train CoTracker.

Ensure you have both PyTorch and TorchVision installed on your system. Follow the instructions here for the installation. We strongly recommend installing both PyTorch and TorchVision with CUDA support, although for small tasks CoTracker can be run on CPU.

Install a Development Version

git clone https://github.com/facebookresearch/co-tracker
cd co-tracker
pip install -e .
pip install matplotlib flow_vis tqdm tensorboard imageio[ffmpeg]

You can manually download the CoTracker2 checkpoint from the links below and place it in the checkpoints folder as follows:

mkdir -p checkpoints
cd checkpoints
wget https://huggingface.co/facebook/cotracker/resolve/main/cotracker2.pth
cd ..

For old checkpoints, see this section.

After installation, this is how you could run the model on ./assets/apple.mp4 (results will be saved to ./saved_videos/apple.mp4):

python demo.py --checkpoint checkpoints/cotracker2.pth

Evaluation

To reproduce the results presented in the paper, download the following datasets:

And install the necessary dependencies:

pip install hydra-core==1.1.0 mediapy

Then, execute the following command to evaluate on TAP-Vid DAVIS:

python ./cotracker/evaluation/evaluate.py --config-name eval_tapvid_davis_first exp_dir=./eval_outputs dataset_root=your/tapvid/path

By default, evaluation will be slow since it is done for one target point at a time, which ensures robustness and fairness, as described in the paper.

We have fixed some bugs and retrained the model after updating the paper. These are the numbers that you should be able to reproduce using the released checkpoint and the current version of the codebase:

DAVIS First, AJ DAVIS First, $\delta_\text{avg}^\text{vis}$ DAVIS First, OA DAVIS Strided, AJ DAVIS Strided, $\delta_\text{avg}^\text{vis}$ DAVIS Strided, OA DR, $\delta_\text{avg}$ DR, $\delta_\text{avg}^\text{vis}$ DR, $\delta_\text{avg}^\text{occ}$
CoTracker2, 27.12.23 60.9 75.4 88.4 65.1 79.0 89.4 61.4 68.4 38.2

Training

To train the CoTracker as described in our paper, you first need to generate annotations for Google Kubric MOVI-f dataset. Instructions for annotation generation can be found here. You can also find a discussion on dataset generation in this issue.

Once you have the annotated dataset, you need to make sure you followed the steps for evaluation setup and install the training dependencies:

pip install pytorch_lightning==1.6.0 tensorboard

Now you can launch training on Kubric. Our model was trained for 50000 iterations on 32 GPUs (4 nodes with 8 GPUs). Modify dataset_root and ckpt_path accordingly before running this command. For training on 4 nodes, add --num_nodes 4.

python train.py --batch_size 1 \
--num_steps 50000 --ckpt_path ./ --dataset_root ./datasets --model_name cotracker \
--save_freq 200 --sequence_len 24 --eval_datasets dynamic_replica tapvid_davis_first \
--traj_per_sample 768 --sliding_window_len 8 \
--num_virtual_tracks 64 --model_stride 4

Development

Building the documentation

To build CoTracker documentation, first install the dependencies:

pip install sphinx
pip install sphinxcontrib-bibtex

Then you can use this command to generate the documentation in the docs/_build/html folder:

make -C docs html

Previous version

You can use CoTracker v1 directly via pytorch hub:

import torch
import einops
import timm
import tqdm

cotracker = torch.hub.load("facebookresearch/co-tracker:v1.0", "cotracker_w8")

The old version of the code is available here. You can also download the corresponding checkpoints:

wget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_4_wind_8.pth
wget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_4_wind_12.pth
wget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_8_wind_16.pth

License

The majority of CoTracker is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Particle Video Revisited is licensed under the MIT license, TAP-Vid is licensed under the Apache 2.0 license.

Acknowledgments

We would like to thank PIPs and TAP-Vid for publicly releasing their code and data. We also want to thank Luke Melas-Kyriazi for proofreading the paper, Jianyuan Wang, Roman Shapovalov and Adam W. Harley for the insightful discussions.

Citing CoTracker

If you find our repository useful, please consider giving it a star โญ and citing our paper in your work:

@article{karaev2023cotracker,
  title={CoTracker: It is Better to Track Together},
  author={Nikita Karaev and Ignacio Rocco and Benjamin Graham and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},
  journal={arXiv:2307.07635},
  year={2023}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.