bopang1996 / tubetk Goto Github PK

Official implementation of paper: TubeTK: Adopting Tubes to Track Multi-Object in a One-Step Training Model (CVPR 2020 oral)

License: MIT License

Python 95.82% Shell 0.22% C++ 1.15% Cuda 2.81%

tubetk's People

Contributors

Stargazers

Watchers

tubetk's Issues

Something about demo

Hi, May i ask you about the input of the demo, should it be the mp4 video or just jpg image?

When I run luanch.py,error is :"cannot import name 'nms_cuda' from 'nms'

英文不太好，我就发中文了
在tube_nms.py中导入nms_cuda,nm_cpu失败，报错
cannot import name 'nms_cuda' from 'nms'(/TubeTK/post_processing/nms/init.py
我尝试运行该目录下的setup.py，依旧报这个错误，不知道是不是因为nms_cuda.cpp是cpp文件呢？请问作者这个问题该怎么解决呀？我想复现本文代码，并用在自己的数据集上的，然鹅目前遇到了这个棘手的错误，希望作者可以回复一下呢，拜托了

Paper's experimental details

I draw your attention to following points

The MOT17 dataset in the training code has a length of 632040.
I am using this code for training on MOT17 on a cluster of 8 V100 GPUs and nvidi-apex and I cannot fit a batch size more than 2 on one GPU. Thus effectively 16 data points are processed by the node per iteration.
As implied by point 1, it means that it will take 632040/16 iterations to finish one epoch, and this means ~39500 steps per epoch.
Now with the node configuration, I am getting a training speed of 4.44 steps/seconds. So, one epoch would take ~2 days to finish.

My questions are

a) The authors mention using a batchsize of 32. How many GPUs or nodes were used to accomodate this batchsize ?

b) How long it took to train the model on MOT17 and JTA ?

c) Are my estimates close to what the authors experienced ?

python ./pre_processing/get_tubes_MOT17.py

when I run python ./pre_processing/get_tubes_MOT17.py,I have a question.

Traceback (most recent call last):
File "./pre_processing/get_tubes_MOT17.py", line 4, in
from network.utils import bbox_iou
ModuleNotFoundError: No module named 'network'

what's the problem?

Tube NMS Implementation with Two Thresholds?

Hello,

First of all, thanks for publishing your code! I am curious about your tube NMS, so I tried finding the implementation as described in the paper with the two thresholds (gamma_1 for the middle frame and gamma_2 for the outside frames) in the repository. In the configuration as well as the multiclass_nms function, I only see one threshold ("test_nms_iou_thre") though.

Could you tell me where I can find the implementation with the two tresholds, or is that not included in the current code?

Thanks!

KeyError: None

Hello, I encountered the following error when running the code, after debugging I found that the "file_num" in the "tube_iou_matching.py" file was caused by None, I want to ask how to solve it, is it a wrong configuration in the code, or is there a missing configuration file? Hope the question can be answered, thank you!

10:28:28.442851 Frame: 423.0 Tubes: 10 Cur tracks:15 Arch tracks:110
0%| | 0/1 [00:17<?
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/root/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "demo.py", line 242, in match_video
matching(tubes, save_path=os.path.join(output_dir, video_name + '.txt'), verbose=True, arg=model_arg)
File "/root/TubeTK/post_processing/tube_iou_matching.py", line 434, in matching
final_processing(tracks, save_path, mid_only)
File "/root/TubeTK/post_processing/tube_iou_matching.py", line 336, in final_processing
filt_bbox(save_path)
File "/root/TubeTK/post_processing/tube_iou_matching.py", line 298, in filt_bbox
res, _ = track_complete(tracks.get_group(tid).values, params[file_num][2])
KeyError: None
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "demo.py", line 379, in
main(test_arg, model_arg)
File "demo.py", line 351, in main
evaluate(model, loader, test_arg, model_arg, output_dir=test_arg.output_dir)
File "demo.py", line 282, in evaluate
p.get()
File "/root/miniconda3/envs/py38/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: None
Traceback (most recent call last):
File "/root/TubeTK/launch.py", line 123, in
main()
File "/root/TubeTK/launch.py", line 118, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py38/bin/python', '-u', 'demo.py', '--local_rank=0', '--batch_size=3', '--config', 'configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml', '--video_url', './TubeTK_video', '--output_dir', './vis_video']' returned non-zero exit status 1.

Building environment

Hello,
I've been trying to reproduce the code without much success.

Rank: 0 Start!
Building TubeTK Model
reading video...
100%|████████████████████████████████████████████████████████████████████████████▉| 2077/2080 [00:37<00:00, 57.74it/s]finish_reading
100%|█████████████████████████████████████████████████████████████████████████████| 2080/2080 [00:37<00:00, 54.89it/s]
==> Validation data : 2073
Loading Model
Finish Loading
0%| | 0/691 [00:00Traceback (most recent call last):
File "launch.py", line 95, in
main()
File "launch.py", line 91, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/jugonzal/anaconda3/envs/tubetk/bin/python', '-u', 'demo.py', '--local_rank=0', '--batch_size=3', '--config', 'configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml', '--video_url', './videos/', '--output_dir', './vis_video']' died with <Signals.SIGSEGV: 11>.

I think that it is probably of the cross compilation on libraries.
Could you be so kind and share the builing envionment?
I'm using CUDA 9.2 and cudnn/7.1-cuda-9.2, I installed apex cause I do not have GPU>16GB and I correctly install the requirements.txt dependencies, I think I may be missing something

Thanks a lot,

Ukhu

About fine tuning pre-trained model

Hi
I tried resuming from the pre-trained model and training on MOT17 for only 20 steps.
The prediction on MOT17 test video became totally wrong.
it looks likes random boxes are predicted.

What I did is

python ./pre_processing/get_tubes_MOT17.py

modify tube_limit to 200

python launch.py --nproc_per 1 --training_script main.py --batch_size 1 --config ./configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml --resume --apex

I save the model at step 0, 10, 20.

Did I do anything wrong?
Or this result is expected? How should I do it correctly?

Other classes?

Hello,

Does this algorithm work for other classes such as cars?

The lowest memory

What's the lowest memory by using NVIDIA APEX and reducing tube_limit?

RuntimeError when traning

Hi,
I can run the demo successfully, but I got this error when I tried to train TubeTK on MOT17.

TubeTK$ python launch.py --nproc_per 1 --training_script main.py --batch_size 1 --config ./configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml --apex
Rank: 0 Start!
Building TubeTK Model
MOT17 data
==> Training data : 632040
==> Validation data : 5870
Selected optimization level O1:  Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O1
cast_model_type        : None
patch_torch_functions  : True
keep_batchnorm_fp32    : None
master_weights         : None
loss_scale             : dynamic
Training
  0%| | 0/632040 [00Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Loss_cls: 4.1068,	Lo
Traceback (most recent call last):
  File "main.py", line 270, in <module>
    main(train_arg, model_arg)
  File "main.py", line 230, in main
    train(model, optimizer, data_loader, sched, tensorboard_writer, max_acc=max_acc, step_start=step)
  File "main.py", line 113, in train
    losses = run_one_iter(model, optimizer, data, scheduler, False)
  File "main.py", line 88, in run_one_iter
    scaled_loss.backward()
  File "/home/roylu/anaconda3/envs/TubeTK_rily/lib/python3.6/site-packages/torch/tensor.py", line 195, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/roylu/anaconda3/envs/TubeTK_rily/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: grad.type() == bucket_view.type() INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1579027003190/work/torch/csrc/distributed/c10d/reducer.cpp:200, please report a bug to PyTorch. Expected torch.cuda.FloatTensor, got torch.cuda.HalfTensor
Traceback (most recent call last):
  File "launch.py", line 95, in <module>
    main()
  File "launch.py", line 91, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['/home/roylu/anaconda3/envs/TubeTK_rily/bin/python', '-u', 'main.py', '--local_rank=0', '--batch_size', '1', '--config', './configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml', '--apex']' returned non-zero exit status 1.

The error seems related to apex? But I don't have a GPU>16GB that I can run it without apex.
And what versions of python, torchvision, cuda did you use?
Thank you!

demo

When I use demo,although it works at first,but when at the end of the video,there is a issue.

04:20:59.660330 Frame: 150.0 Tubes: 90 Cur tracks:76 Arch tracks:164
04:20:59.777748 Frame: 151.0 Tubes: 131 Cur tracks:90 Arch tracks:164
04:20:59.927364 Frame: 152.0 Tubes: 147 Cur tracks:97 Arch tracks:164
04:21:00.061989 Frame: 153.0 Tubes: 128 Cur tracks:103 Arch tracks:164
04:21:00.183487 Frame: 154.0 Tubes: 112 Cur tracks:105 Arch tracks:164
04:21:00.294012 Frame: 155.0 Tubes: 68 Cur tracks:107 Arch tracks:164
04:21:00.373073 Frame: 156.0 Tubes: 68 Cur tracks:107 Arch tracks:164
0%| | 0/1 [00:32<?
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/lsw/anaconda3/envs/TubeTK/lib/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "demo.py", line 239, in match_video
matching(tubes, save_path=os.path.join(output_dir, video_name + '.txt'), verbose=True, arg=model_arg)
File "/home/lsw/TubeTK/post_processing/tube_iou_matching.py", line 434, in matching
final_processing(tracks, save_path, mid_only)
File "/home/lsw/TubeTK/post_processing/tube_iou_matching.py", line 336, in final_processing
filt_bbox(save_path)
File "/home/lsw/TubeTK/post_processing/tube_iou_matching.py", line 298, in filt_bbox
res, _ = track_complete(tracks.get_group(tid).values, params[file_num][2])
KeyError: None
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "demo.py", line 370, in
main(test_arg, model_arg)
File "demo.py", line 347, in main
evaluate(model, loader, test_arg, model_arg, output_dir=test_arg.output_dir)
File "demo.py", line 279, in evaluate
p.get()
File "/home/lsw/anaconda3/envs/TubeTK/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
KeyError: None
Traceback (most recent call last):
File "launch.py", line 95, in
main()
File "launch.py", line 91, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/lsw/anaconda3/envs/TubeTK/bin/python', '-u', 'demo.py', '--local_rank=0', '--batch_size=3', '--config', 'configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml', '--video_url', './video', '--output_dir', './vis_video']' returned non-zero exit status 1.

evaluation on mot16

it seems there is no scipt for evaluation on mot16, even in the dataloader part.

how can I do evaluation on mot16? thanks.

how about the fps?

@BoPang1996
how about the inference time?

Error when changing "forward_frames" parameter from 4 to 2

Hi,

I just tried changing the forward_frames parameter from 4 to 2. This should, if I understood it correctly, change the maximum number of frames in a tube from 8 to 4. I then ran the preprocessing with this parameter change to get the ground truth Btubes. When I run the training with this parameter change, I get the following error:

Rank: 0 Start!
Building TubeTK Model
MOT17 data
==> Training data : 635400
==> Validation data : 5898
Training

  0%| | 0/635400 [00THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument

  0%| | 0/635400 [00
Traceback (most recent call last):
  File "main.py", line 270, in <module>
    main(train_arg, model_arg)
  File "main.py", line 230, in main
    train(model, optimizer, data_loader, sched, tensorboard_writer, max_acc=max_acc, step_start=step)
  File "main.py", line 113, in train
    losses = run_one_iter(model, optimizer, data, scheduler, False)
  File "main.py", line 66, in run_one_iter
    losses = model(imgs, img_metas, return_loss=True, gt_tubes=gt_tubes, gt_labels=gt_labels)
  File "ANACONDA_ENV_PATH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "ANACONDA_ENV_PATH/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 376, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "ANACONDA_ENV_PATH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "USER_PATH/TubeTK/network/tubetk.py", line 86, in forward
    return self.forward_train(img, img_meta, **kwargs)
  File "USER_PATH/TubeTK/network/tubetk.py", line 71, in forward_train
    x = self.extract_feat(img)
  File "USER_PATH/TubeTK/network/tubetk.py", line 63, in extract_feat
    x = self.neck(x)
  File "ANACONDA_ENV_PATH/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "USER_PATH/TubeTK/network/fpn.py", line 56, in forward
    laterals[i], scale_factor=2, mode='nearest')
RuntimeError: output with shape [1, 256, 1, 56, 72] doesn't match the broadcast shape [1, 256, 2, 56, 72]
Traceback (most recent call last):
  File "launch.py", line 95, in <module>
    main()
  File "launch.py", line 91, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['ANACONDA_ENV_PATH/bin/python', '-u', 'main.py', '--local_rank=0', '--batch_size', '1', '--config', './configs/TubeTK_resnet_50_FPN_4frame_1stride.yaml', '--logName', './logs/example_4frame_1stride', '--model_name', 'TubeTK_example_4frame_1stride']' returned non-zero exit status 1.

The line numbers might have changed very slightly since the original implementation because I added print statements in some parts. I did not change anything else in the code though.

How do I get the code to work with a different forward_frames value? Do I have to change other hyperparameters (e.g. model_stride)?

Thanks!

evaluating model after training

When I try to evaluate the original TubeTK model, it's ok and I get results. However, when I try to evaluate the same model after I have trained(whether with 1 epoch or more), I get this error shown in the following picture.
I run the evaluation using the command stated on your repo:
python launch.py --nproc_per 1 --training_script evaluate.py --batch_size 3 --config configs/TubeTK_resnet_50_FPN_8frame_1stride.yaml --trainOrTest train

Any idea/solution, please?

bopang1996 / tubetk Goto Github PK

tubetk's People

Contributors

Stargazers

Watchers

Forkers

tubetk's Issues

My questions are

Recommend Projects

Recommend Topics

Recommend Org