Coder Social home page Coder Social logo

bopang1996 / tubetk Goto Github PK

View Code? Open in Web Editor NEW
137.0 137.0 23.0 39.59 MB

Official implementation of paper: TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model (CVPR 2020 oral)

License: MIT License

Python 95.82% Shell 0.22% C++ 1.15% Cuda 2.81%

tubetk's People

Contributors

bopang1996 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tubetk's Issues

Something about demo

Hi, May i ask you about the input of the demo, should it be the mp4 video or just jpg image?

When I run luanch.py,error is :"cannot import name 'nms_cuda' from 'nms'

英文不太好,我就发中文了
在tube_nms.py中导入nms_cuda,nm_cpu失败,报错
cannot import name 'nms_cuda' from 'nms'(/TubeTK/post_processing/nms/init.py
我尝试运行该目录下的setup.py,依旧报这个错误,不知道是不是因为nms_cuda.cpp是cpp文件呢?请问作者这个问题该怎么解决呀?我想复现本文代码,并用在自己的数据集上的,然鹅目前遇到了这个棘手的错误,希望作者可以回复一下呢,拜托了

Paper's experimental details

I draw your attention to following points

  1. The MOT17 dataset in the training code has a length of 632040.

  2. I am using this code for training on MOT17 on a cluster of 8 V100 GPUs and nvidi-apex and I cannot fit a batch size more than 2 on one GPU. Thus effectively 16 data points are processed by the node per iteration.

  3. As implied by point 1, it means that it will take 632040/16 iterations to finish one epoch, and this means ~39500 steps per epoch.

  4. Now with the node configuration, I am getting a training speed of 4.44 steps/seconds. So, one epoch would take ~2 days to finish.

My questions are

a) The authors mention using a batchsize of 32. How many GPUs or nodes were used to accomodate this batchsize ?

b) How long it took to train the model on MOT17 and JTA ?

c) Are my estimates close to what the authors experienced ?

python ./pre_processing/get_tubes_MOT17.py

when I run python ./pre_processing/get_tubes_MOT17.py,I have a question.

Traceback (most recent call last):
File "./pre_processing/get_tubes_MOT17.py", line 4, in
from network.utils import bbox_iou
ModuleNotFoundError: No module named 'network'

what's the problem?

Tube NMS Implementation with Two Thresholds?

Hello,

First of all, thanks for publishing your code! I am curious about your tube NMS, so I tried finding the implementation as described in the paper with the two thresholds (gamma_1 for the middle frame and gamma_2 for the outside frames) in the repository. In the configuration as well as the multiclass_nms function, I only see one threshold ("test_nms_iou_thre") though.

Could you tell me where I can find the implementation with the two tresholds, or is that not included in the current code?

Thanks!

KeyError: None

Hello, I encountered the following error when running the code, after debugging I found that the "file_num" in the "tube_iou_matching.py" file was caused by None, I want to ask how to solve it, is it a wrong configuration in the code, or is there a missing configuration file? Hope the question can be answered, thank you!

10:28:28.442851 Frame: 423.0 Tubes: 10 Cur tracks:15 Arch tracks:110
0%| | 0/1 [00:17<?
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/root/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "demo.py", line 242, in match_video
matching(tubes, save_path=os.path.join(output_dir, video_name + '.txt'), verbose=True, arg=model_arg)
File "/root/TubeTK/post_processing/tube_iou_matching.py", line 434, in matching
final_processing(tracks, save_path, mid_only)
File "/root/TubeTK/post_processing/tube_iou_matching.py", line 336, in final_processing
filt_bbox(save_path)
File "/root/TubeTK/post_processing/tube_iou_matching.py", line 298, in filt_bbox
res, _ = track_complete(tracks.get_group(tid).values, params[file_num][2])
KeyError: None
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "demo.py", line 379, in
main(test_arg, model_arg)
File "demo.py", line 351, in main
evaluate(model, loader, test_arg, model_arg, output_dir=test_arg.output_dir)
File "demo.py", line 282, in evaluate
p.get()
File "/root/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: None
Traceback (most recent call last):
File "/root/TubeTK/launch.py", line 123, in
main()
File "/root/TubeTK/launch.py", line 118, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py38/bin/python', '-u', 'demo.py', '--local_rank=0', '--batch_size=3', '--config', 'configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml', '--video_url', './TubeTK_video', '--output_dir', './vis_video']' returned non-zero exit status 1.

Building environment

Hello,
I've been trying to reproduce the code without much success.

Rank: 0 Start!
Building TubeTK Model
reading video...
100%|████████████████████████████████████████████████████████████████████████████▉| 2077/2080 [00:37<00:00, 57.74it/s]finish_reading
100%|█████████████████████████████████████████████████████████████████████████████| 2080/2080 [00:37<00:00, 54.89it/s]
==> Validation data : 2073
Loading Model
Finish Loading
0%| | 0/691 [00:00Traceback (most recent call last):
File "launch.py", line 95, in
main()
File "launch.py", line 91, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/jugonzal/anaconda3/envs/tubetk/bin/python', '-u', 'demo.py', '--local_rank=0', '--batch_size=3', '--config', 'configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml', '--video_url', './videos/', '--output_dir', './vis_video']' died with <Signals.SIGSEGV: 11>.

I think that it is probably of the cross compilation on libraries.
Could you be so kind and share the builing envionment?
I'm using CUDA 9.2 and cudnn/7.1-cuda-9.2, I installed apex cause I do not have GPU>16GB and I correctly install the requirements.txt dependencies, I think I may be missing something

Thanks a lot,

Ukhu

About fine tuning pre-trained model

Hi
I tried resuming from the pre-trained model and training on MOT17 for only 20 steps.
The prediction on MOT17 test video became totally wrong.
it looks likes random boxes are predicted.

What I did is

python ./pre_processing/get_tubes_MOT17.py

modify tube_limit to 200

python launch.py --nproc_per 1 --training_script main.py --batch_size 1 --config ./configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml --resume --apex

I save the model at step 0, 10, 20.

Did I do anything wrong?
Or this result is expected? How should I do it correctly?

Other classes?

Hello,

Does this algorithm work for other classes such as cars?

The lowest memory

What's the lowest memory by using NVIDIA APEX and reducing tube_limit?

RuntimeError when traning

Hi,
I can run the demo successfully, but I got this error when I tried to train TubeTK on MOT17.

TubeTK$ python launch.py --nproc_per 1 --training_script main.py --batch_size 1 --config ./configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml --apex
Rank: 0 Start!
Building TubeTK Model
MOT17 data
==> Training data : 632040
==> Validation data : 5870
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Training
  0%| | 0/632040 [00Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Loss_cls: 4.1068,	Lo
Traceback (most recent call last):
  File "main.py", line 270, in <module>
    main(train_arg, model_arg)
  File "main.py", line 230, in main
    train(model, optimizer, data_loader, sched, tensorboard_writer, max_acc=max_acc, step_start=step)
  File "main.py", line 113, in train
    losses = run_one_iter(model, optimizer, data, scheduler, False)
  File "main.py", line 88, in run_one_iter
    scaled_loss.backward()
  File "/home/roylu/anaconda3/envs/TubeTK_rily/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/roylu/anaconda3/envs/TubeTK_rily/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: grad.type() == bucket_view.type() INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1579027003190/work/torch/csrc/distributed/c10d/reducer.cpp:200, please report a bug to PyTorch. Expected torch.cuda.FloatTensor, got torch.cuda.HalfTensor
Traceback (most recent call last):
  File "launch.py", line 95, in <module>
    main()
  File "launch.py", line 91, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['/home/roylu/anaconda3/envs/TubeTK_rily/bin/python', '-u', 'main.py', '--local_rank=0', '--batch_size', '1', '--config', './configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml', '--apex']' returned non-zero exit status 1.

The error seems related to apex? But I don't have a GPU>16GB that I can run it without apex.
And what versions of python, torchvision, cuda did you use?
Thank you!

demo

When I use demo,although it works at first,but when at the end of the video,there is a issue.

04:20:59.660330 Frame: 150.0 Tubes: 90 Cur tracks:76 Arch tracks:164
04:20:59.777748 Frame: 151.0 Tubes: 131 Cur tracks:90 Arch tracks:164
04:20:59.927364 Frame: 152.0 Tubes: 147 Cur tracks:97 Arch tracks:164
04:21:00.061989 Frame: 153.0 Tubes: 128 Cur tracks:103 Arch tracks:164
04:21:00.183487 Frame: 154.0 Tubes: 112 Cur tracks:105 Arch tracks:164
04:21:00.294012 Frame: 155.0 Tubes: 68 Cur tracks:107 Arch tracks:164
04:21:00.373073 Frame: 156.0 Tubes: 68 Cur tracks:107 Arch tracks:164
0%| | 0/1 [00:32<?
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/lsw/anaconda3/envs/TubeTK/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "demo.py", line 239, in match_video
matching(tubes, save_path=os.path.join(output_dir, video_name + '.txt'), verbose=True, arg=model_arg)
File "/home/lsw/TubeTK/post_processing/tube_iou_matching.py", line 434, in matching
final_processing(tracks, save_path, mid_only)
File "/home/lsw/TubeTK/post_processing/tube_iou_matching.py", line 336, in final_processing
filt_bbox(save_path)
File "/home/lsw/TubeTK/post_processing/tube_iou_matching.py", line 298, in filt_bbox
res, _ = track_complete(tracks.get_group(tid).values, params[file_num][2])
KeyError: None
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "demo.py", line 370, in
main(test_arg, model_arg)
File "demo.py", line 347, in main
evaluate(model, loader, test_arg, model_arg, output_dir=test_arg.output_dir)
File "demo.py", line 279, in evaluate
p.get()
File "/home/lsw/anaconda3/envs/TubeTK/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
KeyError: None
Traceback (most recent call last):
File "launch.py", line 95, in
main()
File "launch.py", line 91, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/lsw/anaconda3/envs/TubeTK/bin/python', '-u', 'demo.py', '--local_rank=0', '--batch_size=3', '--config', 'configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml', '--video_url', './video', '--output_dir', './vis_video']' returned non-zero exit status 1.

evaluation on mot16

it seems there is no scipt for evaluation on mot16, even in the dataloader part.

how can I do evaluation on mot16? thanks.

Error when changing "forward_frames" parameter from 4 to 2

Hi,

I just tried changing the forward_frames parameter from 4 to 2. This should, if I understood it correctly, change the maximum number of frames in a tube from 8 to 4. I then ran the preprocessing with this parameter change to get the ground truth Btubes. When I run the training with this parameter change, I get the following error:

Rank: 0 Start!
Building TubeTK Model
MOT17 data
==> Training data : 635400
==> Validation data : 5898
Training

  0%| | 0/635400 [00THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument

  0%| | 0/635400 [00
Traceback (most recent call last):
  File "main.py", line 270, in <module>
    main(train_arg, model_arg)
  File "main.py", line 230, in main
    train(model, optimizer, data_loader, sched, tensorboard_writer, max_acc=max_acc, step_start=step)
  File "main.py", line 113, in train
    losses = run_one_iter(model, optimizer, data, scheduler, False)
  File "main.py", line 66, in run_one_iter
    losses = model(imgs, img_metas, return_loss=True, gt_tubes=gt_tubes, gt_labels=gt_labels)
  File "ANACONDA_ENV_PATH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "ANACONDA_ENV_PATH/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 376, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "ANACONDA_ENV_PATH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "USER_PATH/TubeTK/network/tubetk.py", line 86, in forward
    return self.forward_train(img, img_meta, **kwargs)
  File "USER_PATH/TubeTK/network/tubetk.py", line 71, in forward_train
    x = self.extract_feat(img)
  File "USER_PATH/TubeTK/network/tubetk.py", line 63, in extract_feat
    x = self.neck(x)
  File "ANACONDA_ENV_PATH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "USER_PATH/TubeTK/network/fpn.py", line 56, in forward
    laterals[i], scale_factor=2, mode='nearest')
RuntimeError: output with shape [1, 256, 1, 56, 72] doesn't match the broadcast shape [1, 256, 2, 56, 72]
Traceback (most recent call last):
  File "launch.py", line 95, in <module>
    main()
  File "launch.py", line 91, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['ANACONDA_ENV_PATH/bin/python', '-u', 'main.py', '--local_rank=0', '--batch_size', '1', '--config', './configs/TubeTK_resnet_50_FPN_4frame_1stride.yaml', '--logName', './logs/example_4frame_1stride', '--model_name', 'TubeTK_example_4frame_1stride']' returned non-zero exit status 1.

The line numbers might have changed very slightly since the original implementation because I added print statements in some parts. I did not change anything else in the code though.

How do I get the code to work with a different forward_frames value? Do I have to change other hyperparameters (e.g. model_stride)?

Thanks!

evaluating model after training

When I try to evaluate the original TubeTK model, it's ok and I get results. However, when I try to evaluate the same model after I have trained(whether with 1 epoch or more), I get this error shown in the following picture.
I run the evaluation using the command stated on your repo:
python launch.py --nproc_per 1 --training_script evaluate.py --batch_size 3 --config configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml --trainOrTest train

error when evaluating trained model_2

Any idea/solution, please?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.