Coder Social home page Coder Social logo

facebookresearch / co-tracker Goto Github PK

View Code? Open in Web Editor NEW
2.4K 25.0 161.0 52.67 MB

CoTracker is a model for tracking any point (pixel) on a video.

Home Page: https://co-tracker.github.io/

License: Other

Python 1.17% Jupyter Notebook 98.83% Shell 0.01%
optical-flow point-tracking track-anything

co-tracker's Introduction

CoTracker: It is Better to Track Together

Meta AI Research, GenAI; University of Oxford, VGG

Nikita Karaev, Ignacio Rocco, Benjamin Graham, Natalia Neverova, Andrea Vedaldi, Christian Rupprecht

Open In Colab Spaces

CoTracker is a fast transformer-based model that can track any point in a video. It brings to tracking some of the benefits of Optical Flow.

CoTracker can track:

  • Any pixel in a video
  • A quasi-dense set of pixels together
  • Points can be manually selected or sampled on a grid in any video frame

Try these tracking modes for yourself with our Colab demo or in the Hugging Face Space 🤗.

Updates:

  • [December 27, 2023] 📣 CoTracker2 is now available! It can now track many more (up to 265*265!) points jointly and it has a cleaner and more memory-efficient implementation. It also supports online processing. See the updated paper for more details. The old version remains available here.

  • [September 5, 2023] 📣 You can now run our Gradio demo locally!

Quick start

The easiest way to use CoTracker is to load a pretrained model from torch.hub:

Offline mode:

pip install imageio[ffmpeg], then:

import torch
# Download the video
url = 'https://github.com/facebookresearch/co-tracker/blob/main/assets/apple.mp4'

import imageio.v3 as iio
frames = iio.imread(url, plugin="FFMPEG")  # plugin="pyav"

device = 'cuda'
grid_size = 10
video = torch.tensor(frames).permute(0, 3, 1, 2)[None].float().to(device)  # B T C H W

# Run Offline CoTracker:
cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker2").to(device)
pred_tracks, pred_visibility = cotracker(video, grid_size=grid_size) # B T N 2,  B T N 1

Online mode:

cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker2_online").to(device)

# Run Online CoTracker, the same model with a different API:
# Initialize online processing
cotracker(video_chunk=video, is_first_step=True, grid_size=grid_size)  

# Process the video
for ind in range(0, video.shape[1] - cotracker.step, cotracker.step):
    pred_tracks, pred_visibility = cotracker(
        video_chunk=video[:, ind : ind + cotracker.step * 2]
    )  # B T N 2,  B T N 1

Online processing is more memory-efficient and allows for the processing of longer videos. However, in the example provided above, the video length is known! See the online demo for an example of tracking from an online stream with an unknown video length.

Visualize predicted tracks:

pip install matplotlib, then:

from cotracker.utils.visualizer import Visualizer

vis = Visualizer(save_dir="./saved_videos", pad_value=120, linewidth=3)
vis.visualize(video, pred_tracks, pred_visibility)

We offer a number of other ways to interact with CoTracker:

  1. Interactive Gradio demo:
  2. Jupyter notebook:
  3. You can install CoTracker locally and then:
    • Run an offline demo with 10 ⨉ 10 points sampled on a grid on the first frame of a video (results will be saved to ./saved_videos/demo.mp4)):

      python demo.py --grid_size 10
    • Run an online demo:

      python online_demo.py

A GPU is strongly recommended for using CoTracker locally.

Installation Instructions

You can use a Pretrained Model via PyTorch Hub, as described above, or install CoTracker from this GitHub repo. This is the best way if you need to run our local demo or evaluate/train CoTracker.

Ensure you have both PyTorch and TorchVision installed on your system. Follow the instructions here for the installation. We strongly recommend installing both PyTorch and TorchVision with CUDA support, although for small tasks CoTracker can be run on CPU.

Install a Development Version

git clone https://github.com/facebookresearch/co-tracker
cd co-tracker
pip install -e .
pip install matplotlib flow_vis tqdm tensorboard

You can manually download the CoTracker2 checkpoint from the links below and place it in the checkpoints folder as follows:

mkdir -p checkpoints
cd checkpoints
wget https://huggingface.co/facebook/cotracker/resolve/main/cotracker2.pth
cd ..

For old checkpoints, see this section.

Evaluation

To reproduce the results presented in the paper, download the following datasets:

And install the necessary dependencies:

pip install hydra-core==1.1.0 mediapy

Then, execute the following command to evaluate on TAP-Vid DAVIS:

python ./cotracker/evaluation/evaluate.py --config-name eval_tapvid_davis_first exp_dir=./eval_outputs dataset_root=your/tapvid/path

By default, evaluation will be slow since it is done for one target point at a time, which ensures robustness and fairness, as described in the paper.

We have fixed some bugs and retrained the model after updating the paper. These are the numbers that you should be able to reproduce using the released checkpoint and the current version of the codebase:

DAVIS First, AJ DAVIS First, $\delta_\text{avg}^\text{vis}$ DAVIS First, OA DAVIS Strided, AJ DAVIS Strided, $\delta_\text{avg}^\text{vis}$ DAVIS Strided, OA DR, $\delta_\text{avg}$ DR, $\delta_\text{avg}^\text{vis}$ DR, $\delta_\text{avg}^\text{occ}$
CoTracker2, 27.12.23 60.9 75.4 88.4 65.1 79.0 89.4 61.4 68.4 38.2

Training

To train the CoTracker as described in our paper, you first need to generate annotations for Google Kubric MOVI-f dataset. Instructions for annotation generation can be found here. You can also find a discussion on dataset generation in this issue.

Once you have the annotated dataset, you need to make sure you followed the steps for evaluation setup and install the training dependencies:

pip install pytorch_lightning==1.6.0 tensorboard

Now you can launch training on Kubric. Our model was trained for 50000 iterations on 32 GPUs (4 nodes with 8 GPUs). Modify dataset_root and ckpt_path accordingly before running this command. For training on 4 nodes, add --num_nodes 4.

python train.py --batch_size 1 \
--num_steps 50000 --ckpt_path ./ --dataset_root ./datasets --model_name cotracker \
--save_freq 200 --sequence_len 24 --eval_datasets dynamic_replica tapvid_davis_first \
--traj_per_sample 768 --sliding_window_len 8 \
--num_virtual_tracks 64 --model_stride 4

Development

Building the documentation

To build CoTracker documentation, first install the dependencies:

pip install sphinx
pip install sphinxcontrib-bibtex

Then you can use this command to generate the documentation in the docs/_build/html folder:

make -C docs html

Previous version

You can use CoTracker v1 directly via pytorch hub:

import torch
import einops
import timm
import tqdm

cotracker = torch.hub.load("facebookresearch/co-tracker:v1.0", "cotracker_w8")

The old version of the code is available here. You can also download the corresponding checkpoints:

wget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_4_wind_8.pth
wget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_4_wind_12.pth
wget https://dl.fbaipublicfiles.com/cotracker/cotracker_stride_8_wind_16.pth

License

The majority of CoTracker is licensed under CC-BY-NC, however portions of the project are available under separate license terms: Particle Video Revisited is licensed under the MIT license, TAP-Vid is licensed under the Apache 2.0 license.

Acknowledgments

We would like to thank PIPs and TAP-Vid for publicly releasing their code and data. We also want to thank Luke Melas-Kyriazi for proofreading the paper, Jianyuan Wang, Roman Shapovalov and Adam W. Harley for the insightful discussions.

Citing CoTracker

If you find our repository useful, please consider giving it a star ⭐ and citing our paper in your work:

@article{karaev2023cotracker,
  title={CoTracker: It is Better to Track Together},
  author={Nikita Karaev and Ignacio Rocco and Benjamin Graham and Natalia Neverova and Andrea Vedaldi and Christian Rupprecht},
  journal={arXiv:2307.07635},
  year={2023}
}

co-tracker's People

Contributors

bharat787 avatar ernestchu avatar fcole avatar junkybyte avatar nikitakaraevv avatar patripfr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

co-tracker's Issues

evaluation results on BADJA don't match the paper.

Hi,
When trying to evaluate the model on BADJA, I am getting different results then reported in the paper.
The results are as follows (I added the avg. results at the end of the dictionary):

{
    "bear": 88.57142639160156,
    "bear_accuracy": 20.357141494750977,
    "camel": 90.35369873046875,
    "camel_accuracy": 22.186494827270508,
    "cows": 86.89839935302734,
    "cows_accuracy": 31.283422470092773,
    "dog": 54.59769821166992,
    "dog-agility": 6.896551609039307,
    "dog-agility_accuracy": 0.0,
    "dog_accuracy": 4.597701072692871,
    "horsejump-high": 62.25165557861328,
    "horsejump-high_accuracy": 17.218544006347656,
    "horsejump-low": 62.30366897583008,
    "horsejump-low_accuracy": 27.74869155883789,
    "avg": 64.55329983575004,
    "avg acc 3px": 17.627427918570383,
    "time": 576.1103093624115
}

I was able to evaluate the model on TAP-Vid DAVIS properly.
I ran the following code.

python ./cotracker/evaluation/evaluate.py 
--config-name eval_badja \
exp_dir=./eval_outputs_badja \
dataset_root=<path_to_BADJA_dir> \

Could you please assist me in the matter?
In addition, I see that the "extra_videos" as referred to in BADJA are not being evaluated, and I see that they are being explicitly ignored during dataset creation. Could you please explain to me why are they not being evaluated?

Thank you for your help!
Assaf

How to download FastCapture?

Thank you for your amazing work! But when I want to reproduce it on my own computer I realize I can't find the download link for FastCapture. Could you please provide more details about how to download FastCapture?

Input shapes in forward_batch()

Hi,
Thank you for your great work.
I am little bit confused about the indexing operations in the forward_batch() function in co-tracker/train.py.

I think vis_g variable has the shape (B, T, N), with respect to class CoTrackerData .

So that, with this operation, you find the first_positive_inds of shape (B, N):
__, first_positive_inds = torch.max(vis_g, dim=1)

Then, this one follows:

# inds of visible points in the 1st frame
nonzero_inds = [torch.nonzero(vis_g[0, :, i]) for i in range(N)]

Does not vis_g[0, :, i] correspond to the visibility in the first batch item, but not the first frame of different batch items?

After that step, rand_vis_inds is calculated and has the shape (1, N).
Isn't this a problem while concatenating the [rand_vis_inds[:, :N_rand], first_positive_inds[:, N_rand:]], having different dimension 0, 1 vs N?
What am I missing about the shapes?

Thank you

question about v2

Hi, I find that you mention the model is re-trained. I wonder if the old checkpoint is still compatible with v2 implementation? thanks!

How to prepare new dataset?

Great job, I have benefited a lot. I would like to train on the videos I have collected. What should I do if I want to train with my video dataset? Thank you.

how to draw trajectory of predicted points

Hi, @nikitakaraevv, thanks for your excellent work! I want to know how to draw the trajectory of tracked points on graph in the README. I've tried set tracks_leave_trace=-1 in Visualizer, but the result looks a little different from the one in the README. Greatly appreciate it If you could show some commands or any help!

Error when loading model using torch.hub

Every time I try to load the model using the instruction from Readme.md using torch.hub, I get such an error:

torch/hub/facebookresearch_co-tracker_master/cotracker/models/core/cotracker/blocks.py", line 237, in <lambda>
    approx_gelu = lambda: nn.GELU(approximate="tanh")
TypeError: __init__() got an unexpected keyword argument 'approximate'

Specular reflective surfaces

The performance on the well known ambiguous specular reflective surfaces are really not good also on light reflective surfaces. Often it is tracked the reflected content and the occlusion logic seems to be off.
Do you plan to improve this with synthetic data like pointodyssey?

CoTracker fails to track points on small number of frames

Hi, thanks for interesting architecture! I have found an interesting behaviour of a model: when I try to track a point on small number of frames (3 frames), model always predicts zero coordinates (0, 0). But when I track the same point on the same video for longer number of frames (15, for example), it tracks really good.

Is it a bug or I am using model incorrectly?

My usage is quite standard:

input_video = torch.from_numpy(np.array(rgb_images)).permute(0, 3, 1, 2)[None].float()
input_video = input_video.to(self.device)
query = torch.tensor([[0, point_x, point_y]]).float()
query = query.to(self.device)
pred_tracks, pred_visibility = self.model(input_video, queries=query[None])
pred_tracks = pred_tracks.squeeze().cpu()[1:]

I guess CoTracker was trained on dataset which does not contain sequences as short as 3 frames, so model fails on such types of data. Am I right?

possible applications

this result is impressive, does there any possible downstream applications examples which this tech can be used?

Convert to ONNX

Hi,
I want to convert the model to onnx format but I get an error, can anyone help me to solve the problem?
Note: I am using Colab for model loading and converting.

Conversion code:

dummy_input = torch.randn(1, 48, 3, 719, 1282, device="cuda")

input_names = ["input"]
output_names = ["output_tracker", "output_visib"]

dynamic_axes_dict = {
    'input': {
        0: 'bs'
     },
    'output_tracker': {
        0: 'bs'
     },
     'output_visib': {
        0: 'bs'
     }
} 
 
torch.onnx.export(model,
                  dummy_input,
                  "cotracker.onnx",
                  verbose=False,
                  input_names=input_names,
                  output_names=output_names,
                  dynamic_axes=dynamic_axes_dict,
                  export_params=True,
                  )

Error:

WARNING:py.warnings:/content/co-tracker/cotracker/predictor.py:47: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if queries is None and grid_size == 0:

  0%|          | 0/1764 [00:00<?, ?it/s]WARNING:py.warnings:/content/co-tracker/cotracker/predictor.py:106: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert B == 1

WARNING:py.warnings:/content/co-tracker/cotracker/predictor.py:115: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert D == 3

WARNING:py.warnings:/content/co-tracker/cotracker/models/core/cotracker/cotracker.py:224: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert B == 1

WARNING:py.warnings:/content/co-tracker/cotracker/models/core/cotracker/cotracker.py:264: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  while ind < T - self.S // 2:

WARNING:py.warnings:/content/co-tracker/cotracker/predictor.py:144: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if backward_tracking:

100%|██████████| 1764/1764 [01:33<00:00, 18.82it/s]
WARNING:py.warnings:/usr/local/lib/python3.10/dist-packages/torch/onnx/utils.py:619: UserWarning: ONNX Preprocess - Removing mutation from node aten::fill_ on block input: 'grid_query_frame'. This changes graph semantics. (Triggered internally at ../torch/csrc/jit/passes/onnx/remove_inplace_ops_for_onnx.cpp:350.)
  _C._jit_pass_onnx_remove_inplace_ops_for_onnx(graph, module)

============= Diagnostic Run torch.onnx.export version 2.0.1+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[<ipython-input-11-d51d0e1e59b2>](https://localhost:8080/#) in <cell line: 13>()
     11 } 
     12 
---> 13 torch.onnx.export(model,
     14                   dummy_input,
     15                   "cotracker.onnx",

16 frames
[/usr/local/lib/python3.10/dist-packages/torch/onnx/_internal/jit_utils.py](https://localhost:8080/#) in _add_attribute(node, key, value, aten)
    353             else:
    354                 kind = "i"
--> 355     return getattr(node, f"{kind}_")(name, value)
    356 
    357 

TypeError: z_(): incompatible function arguments. The following argument types are supported:
    1. (self: torch._C.Node, arg0: str, arg1: torch.Tensor) -> torch._C.Node

Invoked with: %367 : Tensor = onnx::Constant(), scope: cotracker.predictor.CoTrackerPredictor::
, 'value', 0 
(Occurred when translating repeat_interleave).

How to run the online tracker for every frame?

In the online demo, the tracker runs on every 4th step because of this if-clause in online_demo.py:

if i % model.step == 0 and i != 0:
       _process_step(...)

Is it possible to run the tracker on every frame somehow?
I tried adjusting the model.window_len which ultimately determines model.step, but could not figure out a working solution somehow.

Training with different batch size

I noticed training with --batch_size other than 1 does not work, amonst onthers due to the assert B==1 in cotracker/models/core/cotracker/cotracker.py

Why is that so?
Can't I train with say batch_size 16 ? And how would I do that?

The result of tracking point from the middle of video is not precise

Hi, thanks for your great work. When I tried you notebook demo. There's some ambiguities when tracking manually selected points.

queries = torch.tensor([
    [0., 400., 350.],  # point tracked from the first frame
    [10., 600., 500.], # frame number 10
    [20., 750., 600.], # ...
    [30., 900., 200.]
])

Unknown-3

Let's say we are interesting in queries[1], which is the index to a point in the 10th frame, so the model should output a trajectory of all (0, 0) and visibility of False from 0 to 9 timestamps. However, when inspecting pred_visibility, the expected behavior only presents at the first four timestamps. (same problem also happens to pred_tracks)

pred_visibility[:, :, 1]

tensor([[False, False, False, False,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
         False, False, False,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True,  True,  True,  True,  True,  True,  True,  True,  True,
          True,  True]], device='cuda:0')

Why is that? Thanks!

Tracking from webcam

Hello,

instead of using recorded videos, is it possible to track items from the camera feed in real time?

Thanks

moviepy saving video error

Dear all,

Thanks for sharing the awesome work!

I tried the notebook tutorial on local desktop but got the following errors on this line

vis.visualize(video=video, tracks=pred_tracks, visibility=pred_visibility, filename='teaser')
Moviepy - Building video notebooks/videos/teaser_pred_track.mp4.
Moviepy - Writing video notebooks/videos/teaser_pred_track.mp4

Traceback (most recent call last):
  File "/home/jma/Documents/co-tracker/notebooks/tutorial.py", line 90, in <module>
    vis.visualize(video=video, tracks=pred_tracks, visibility=pred_visibility, filename='teaser')
  File "/home/jma/Documents/co-tracker/cotracker/utils/visualizer.py", line 104, in visualize
    self.save_video(res_video, filename=filename, writer=writer, step=step)
  File "/home/jma/Documents/co-tracker/cotracker/utils/visualizer.py", line 123, in save_video
    clip.write_videofile(save_path, codec="libx264", fps=self.fps, logger=None)
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/moviepy/decorators.py", line 54, in requires_duration
    return f(clip, *a, **k)
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/moviepy/decorators.py", line 135, in use_clip_fps_by_default
    return f(clip, *new_a, **new_kw)
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/moviepy/decorators.py", line 22, in convert_masks_to_RGB
    return f(clip, *a, **k)
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/moviepy/video/VideoClip.py", line 300, in write_videofile
    ffmpeg_write_video(self, filename, fps, codec,
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/moviepy/video/io/ffmpeg_writer.py", line 213, in ffmpeg_write_video
    with FFMPEG_VideoWriter(filename, clip.size, fps, codec = codec,
  File "/home/jma/anaconda3/envs/track/lib/python3.10/site-packages/moviepy/video/io/ffmpeg_writer.py", line 88, in __init__
    '-r', '%.02f' % fps,
TypeError: must be real number, not NoneType

Any suggestions are highly appreciated.

Full video in memory

Is there a way to avoid to load the full video in memory.
Specially cause currently the full video is loaded in memory at full-res before you rescale it internally with
video = F.interpolate(video, tuple(self.interp_shape), mode="bilinear")

Use cotracker_stride_8_wind_16.pth And 6 frames get bad result

model: cotracker_stride_8_wind_16.pth
I modify the function named read_video_from_path in demo.py , just append 6 frames to video.
Then , I run the demo.py and print(pred_tracks), get the result below:
`

tensor([[[[465., 180.]],

     [[  0.,   0.]],

     [[  0.,   0.]],

     [[  0.,   0.]],

     [[  0.,   0.]],

     [[  0.,   0.]]]], device='cuda:0')

`
It seems no track result.

But when I append 10 frames to video , the result it's ok .
`
video = read_video_from_path(args.video_path)
points = torch.tensor([[0.,465.,180.]])

if torch.cuda.is_available():
    points = points.cuda()
video = torch.from_numpy(video).permute(0, 3, 1, 2)[None].float()


model = CoTrackerPredictor(checkpoint=args.checkpoint)
model = model.to(DEFAULT_DEVICE)
video = video.to(DEFAULT_DEVICE)

pred_tracks, pred_visibility = model(
    video,
    queries=points[None]
)
print(pred_tracks)
print("computed")`

The training dataset prepareation

Hi, I am following your work to prepare the training data for MOVI-f.
Could you please give more details about the instructions for annotation generation?

I generally follow the function create_kubric_eval_train_dataset in this link, and set the train_size=(512,512) and tracks_to_sample=2000 in create_point_tracking_dataset.
Besides, I also modify the 'movi_e/256x256' in this link in order to generate MOVI-f.

Is that right?

Inaccurate tracking in case of occlusions

@nikitakaraevv

image

I've noticed that when there's an obstruction in between frames, the model struggles to track points effectively. Is there a possibility of implementing a mechanism to temporarily remove tracking points when an obstruction is detected in the frame, and then resume tracking when the target object reappears in subsequent frames?

I've conducted several tests, and unfortunately, the results haven't been as promising as described in the documentation.

If there's a potential solution or workaround for this issue, I'd greatly appreciate any insights or guidance.

Thank you in advance for your assistance!
This is the Video that I obtained while testing it.
https://github.com/facebookresearch/co-tracker/assets/55429956/c93433fe-adbe-450f-899b-519237eade74

the problem of evaluation

Hi, thanks for your excellent work!
In your paper, I found the results of PIPs in Table 2 are much higher (64.8% with DAVIS first) than my reproduced results (55.8%). I also noticed there is another paper [1] that gives their re-implemented results of PIPs (around 55.1% in Table 1), which is quite close to mine. Besides, for PIPs, the "strided" result (59.4%) In TAPIR [2] is even worse than your results with "first" version.
I am wondering what makes the differences, did you re-train the PIPs with more GPUs or make other improvements ( e.g., improving the chaining algorithm in evaluation, using higher resolution for inference )?

[1] Context-TAP: Tracking Any Point Demands Spatial Context Features. Arxiv
[2] TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement. Arxiv

Can we init track point at middle frame?

It seems only show the track after the init frame. For example, if I specify [10, x, y], it will start tracking after frame 10. Can we init the point at frame 10, but track this point from the whole sequence.

Running on CPU

Hello,

How can I run this code on MAC-M1 or cpu? It seems cuda is hardcorded somewhere.

Thanks

GPU out of memory when trying to evaluate model on kinetics_first

Hi,
When trying to evaluate the model on TAP-Vid Kinetics with 'first' sampling, my GPU is reaching a memory limit and crashes.
The error occurs when trying to aggregate the TAP-Vid Kinetics pickle files during the instantiation of the TapVidDataset object.
I was able to evaluate the model on TAP-Vid DAVIS properly.
I am running the following code on an NVIDIA-A100 GPU (the GPU with the most memory that I have access to).

python ./cotracker/evaluation/evaluate.py \
--config-name eval_tapvid_kinetics_first \
exp_dir=./eval_outputs_kinetics_first \
dataset_root=<path_to_tapvid_kinetics_dir> \

Could you please assist me in the matter? Are you able to provide the code you used to evaluate the model in Kinetics? What GPU were you using?
In the TAP-Vid repo they provide create_kinetics_dataset which returns an iterable that yields a video example each time, but I couldn't adjust the code properly to use this iterable instead.

Thank you for your help!
Assaf

GPU out of memory

Thanks for sharing this amazing work
On both Collab and my computer (Nvidia 3080 RTX) when I ty to run the segmentation part or the dense track I get
OutOfMemoryError: CUDA out of memory. Tried to allocate 2.81 GiB (GPU 0; 15.74 GiB total capacity; 13.82 GiB already allocated; 605.50 MiB free; 13.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Is there a way to optimize that ?
Thanks for your help

Host model checkpoints on HF

Hi there! Omar from Hugging Face here 👋

Super nice project and demo 🔥 It would be nice also to host the models on HF (e.g. https://hf.co/facebook). There is a library (huggingface_hub) that allows programmatically downloading and caching the models. It also has the nice benefit that the model weights are more discoverable. People will find the model checkpoint with a model card for discoverability.

How to get weights for CPU support? (M1 Mac support)

Hi,

Thanks for the great work! I was trying to implement the SAM-PT project with co-tracker and downloaded the co-tracker model weights. But I keep running into the error AssertionError: Torch not compiled with CUDA enabled on my M1 MacBook machine.

Is there a way to get the weights running on for CPU-only support? They seem to be hardcoded. (I did go through issue #9, but the suggestion there seems to be if I'm running co-tracker demo on local, although here for SAM-PT, I'm just trying to use the weights).

Thanks for the help!

Inference time Multi GPUs for long video out of memory issue.

Hello, thanks a lot for sharing your work!

I used co-tracker for videos of up to 3min to track a segmentation mask, and the GPU from Colab ran out of memory. I understand I can reduce the grid size for the memory issue, but I wanted to know if it's possible to run the inference on multiple GPUs at the same time in parallel?

I'm not sure it would work as I would cut the video into 3 segments of 1min each and give the 3 segments to 3 GPUs for example. But I guess the tracking algorithm needs the whole video to work?

I saw in the colab notebook that you can track manually selected points, for my problem can I apply co-tracker on the first 1min of the video, save the tracked points of the last frame, and start tracking those points from minute 1 to minute 2?

CoTracker application on live videos

Hi, is there any possibility to port the actual cotracker capabilities to live videos (camera stream for example)? Accordingly to the paper the tracking is computed in an iterative way splitting the input video in sub-windows; I'm wondering if an application to live videos has been considered in this scheme.

Thanks for your work

Occluded points at t = `query_frame`

Hi, thanks for your wonderful work. However, there seems to be a fundamental issue in your method.

The problem is that you do not overwrite the estimated tracks $\hat{P}_0$ at the query_frame with the start location $P$ , neither do you set $\hat{v}_0$ to 1 throughout the refinement.

This may results in, at the query_frame,

  • occlusions (e.g. at image borders and in complex fur texture on the bear) and
  • drifted query position (on the bottom-left corner, a point even translates over one grid point)

output_image

The expected behavior is that if we query specific points on the query_frame, these points need to be visible, by definition, and should be precisely at the coordinates we give. I doubt that direct overwriting at each refinement stage is the right way to fix it, but the problem need to be addressed in order to make this work useful for downstream tasks. (especially for extremely low-level vision ones)

Sorry for jumping in to such a detailed question. But I am dealing with some downstream applications that require a very strict (pixel-level) problem definition of point tracking. Here's the full video result.

dense_pred_track.mp4

How better use CoTracker in our projects(traffic tracking)

Hello, when I’m using a specific number of points to track in a video for autonomous driving inference, I’ve noticed that once the target object is occluded, the tracked points start following other objects, resulting in poor tracking performance. Are there any good methods to address this issue of poor performance after occlusion? Which approach, using points or a grid, yields better results in tracking when facing occlusion?

Regarding the Application of the generic features

Hello,

Thank you for your work.

I would like to ask if there are any plans to release a DINOv2 version of the model.

Additionally, have you tried using a larger DINOv2 backbone, such as ViT-B or ViT-L,
to see if it could help improve performance?

Solve camera poses?

HI, and thank you for making this available!
Can this be used to calculate camera poses, eg offline camera tracking?

RuntimeError: Invalid buffer size

For relatively simple and short videos (in my case, for example w: 1944, h: 1944, frame number: 1680) I am getting errors like RuntimeError: Invalid buffer size: 70.96 GB. I presume this is because the whole video is loaded at once.
Is there an obvious solution to loading and processing larger video files?

I did stumble over @whymatter main...whymatter:co-tracker:streaming - this might be a solution? However, without diving into that code: When loading chunks, how are the transitions handled in between frame buffers?

cannot backward tracking in online model with specific queries in middle frame

Hi,

I am using your online model for point tracking. and I want to process a video. I want to set up specific queries in one of the middle frame and try to use backward tracking. here is my code:

`video = torch.tensor(frames).permute(0, 3, 1, 2)[None].float().to(device)
cotracker = torch.hub.load("facebookresearch/co-tracker", "cotracker2_online").to(device)
cotracker(video_chunk = video, is_first_step = True, queries=maxima_coords[None], grid_query_frame=grid_query_frame, backward_tracking=True)
vis = Visualizer(save_dir="./co-tracker/result", pad_value=120, linewidth=2)
for ind in range(0, video.shape[1] - cotracker.step, cotracker.step):
pred_tracks, pred_visibility = cotracker(video_chunk=video[:, ind : ind + cotracker.step * 2], queries=maxima_coords[None], grid_query_frame=grid_query_frame, backward_tracking=True)

vis.visualize(video, pred_tracks, pred_visibility)`

and here is the bug i got:
---> 60 pred_tracks, pred_visibility = cotracker(video_chunk=video[:, ind : ind + cotracker.step * 2], queries=maxima_coords[None], grid_query_frame=grid_query_frame, backward_tracking=True)
61
62 vis.visualize(video, pred_tracks, pred_visibility)

~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []

~/anaconda3/lib/python3.9/site-packages/torch/autograd/grad_mode.py in decorate_context(*args, **kwargs)
25 def decorate_context(*args, **kwargs):
26 with self.clone():
---> 27 return func(*args, **kwargs)
28 return cast(F, decorate_context)
29

TypeError: forward() got an unexpected keyword argument 'backward_tracking'

Can it be used commercially?

The key points in the estimation of human posture may be obscured, we do technical analysis, perhaps it can solve this problem.

Huggingface Spaces Demo

I have created a demo of this repository using Hugging Face Spaces. The demo can be accessed here with the source code. If someone is interested, they can try it out or deploy it on their own machine.
Screenshot from 2023-08-17 01-46-29

missing re-trace capability

Great job making this possible! One of the biggest functions I am missing is the possibility to re-trace, like move back to the frame where it looses it´s trace and then adjust the positioning of the tracker.

I am trying this out on sports applications like movement analysis. I have modified the code to change the colors according to magnitude of translation which looks pretty cool, in addition to that combining sensor data such as EMG or Accelerometer to color the traces.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.