Coder Social home page Coder Social logo

captaineven / mcmot Goto Github PK

View Code? Open in Web Editor NEW
383.0 16.0 82.0 230.59 MB

Real time one-stage multi-class & multi-object tracking based on anchor-free detection and ReID

License: MIT License

Shell 0.32% Python 99.68%
tracking detection multi-object multi-class one-shot anchor-free real-time

mcmot's Introduction

MCMOT: One-shot multi-class multi-object tracking

单阶段实时多类别多目标跟踪
This is an extention work of FairMOT, which extends the one-class multi-object tracking to multi-class multi-object tracking
You can refer to origin fork FairMOT

Tracking demo of C5(car, bicycle, person, cyclist, tricycle)

image image image image image

Tracking demo of VISDrone dataset

image image image image image image

VisDrone dataset training with 10 object classes

VisDrone link
VisDrone is a public dataset for 4 CV challenges: object detection, crowd counting, single class multi-object tracking, multi-class multi-object tracking.

  • Download multi-class multi-object tracking part of Visdrone dataset
  • Using gen_dataset_visdrone.py script to generate labels.
  • Call the gen_dot_train_file function in gen_dataset_visdrone.py to generate the dot train file for VisDrone mcmot training task.
  • Uncomment cls2id and id2cls in multitracker.py to use the correct class names and class Ids mapping.
from gen_dataset_visdrone import cls2id, id2cls  # visdrone
# from gen_labels_detrac_mcmot import cls2id, id2cls  # mcmot_c5
  • Set cls ids for visdrone training n opts.py i.e.
1~10 object classes are what we need      
        non-interest-zone (0)
        pedestrian        (1)  --> 0       
        people            (2)  --> 1       
        bicycle           (3)  --> 2       
        car               (4)  --> 3       
        van               (5)  --> 4       
        truck             (6)  --> 5        
        tricycle          (7)  --> 6        
        awning-tricycle   (8)  --> 7        
        bus               (9)  --> 8        
        motor             (10) --> 9        
        others            (11)
        self.parser.add_argument('--reid_cls_ids',
                                 default='0,1,2,3,4,5,6,7,8,9',  # '0,1,2,3,4' or '0,1,2,3,4,5,6,7,8,9'
                                 help='')  # the object classes need to do reid

Tracking or detection mode setting

Set id_weight to 1 for tracking and 0 for detection mode.

        self.parser.add_argument('--id_weight',
                                 type=float,
                                 default=1,  # 0for detection only and 1 for detection and re-ida
                                 help='loss weight for id')  # ReID feature extraction or not

Pretained model for C5 and VisDrone detection and tracking

HRNet18 backbone with bi-linear upsampling replaced with de-convolution
The pre-trained model is for 5 classes(C5) detection & tracking: car, bicycle, person, cyclist, tricycle, which can be used for road traffic video surveillance and analysis.

baidu drive link extract code:ej4p
one drive link

Resnet18 backbone for C5,which is much smaller than HRNet18

ResNet18 one drive link

Resnet18 backbone for VisDrone mcmot

Resnet18 one drive link

Using YOLOV4 as detector

You can also refer to the ropo:MCMOT_YOLOV4
This is MCMOT with CenterNet detection frame work replaced with an anchor-based detection framework.

Using ByteTrack

You can also refer to the ropo:MCMOT-ByteTrack
Using YOLOX as front-end and using ByteTrack as back-end.

mcmot's People

Contributors

captaineven avatar cclauss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mcmot's Issues

类别的命名工规范

@CaptainEven
在实际应用中,会遇到,类别的标签不是符合0,1,2,3,,,(比如30个类,但是训练的数据中暂时没有其中的几个类,比如,17类和19类没有数据)那就会造成在训练的时候,触发这个错误,请问必须要将现有的类别然后0,1,2,3,,,这样的序列排列起来吗?
if int(reid_id) > opt.num_classes - 1:
print('[Err]: configuration conflict of reid_cls_ids and num_classes!')

数据集中的格式

@CaptainEven
请问数据集中的格式{类别,追踪id,坐标值},都是什么数据类型,类别和追踪id都是int?还是str,还是整体是dict?

how to train

Thank you for your wonderful work!!!
I have two questions about how to train self-annotated dataset:
1、Could you please show me yourself annotated dataset format screenshot, should there be all categories of targets in the annotation .txt file of each frame, like if i want to train bike,car and person, but the first frame just has car, this will be ok for training? And if i want to train other classes not the 5 classes detection & tracking: car, bicycle, person, cyclist, tricycle you list, could you please tell me where should i change in the code?
It would be better if you could share the dataset you annotated.
2、I'm not sure this code can use the DCNV2 to train in this repository, if not,could you please tell me if this is feasible?
Thank you again for this MCMOT contribution,Look forward to your reply!!!Thanks a lot.

test_det.py and test_emb.py

Hi, thanks for your nice work! Here is my question.
Could you tell me how to change the code of test_det.py and test_emb.py to apply for MCMOT?
Thanks in advance.

Train on custom dataset

I fallow the instruction of FairMOT to create the datasets for training. However, the "gen_labels_detrac_mcmot.py" file has several comments code. which one should I used to preparation my datasets ready to train.

if __name__ == '__main__':
    # preprocess(src_root='/mnt/diskb/even/Insight-MVT_Annotation_Train',
    #            dst_root='/mnt/diskb/even/dataset/DETRAC')
    #
    # gen_labels(xml_root='/mnt/diskb/even/DETRAC-Train-Annotations-XML',
    #            img_root='/mnt/diskb/even/dataset/DETRAC/images/train',
    #            label_root='/mnt/diskb/even/dataset/DETRAC/labels_with_ids/train',
    #            viz_root='/mnt/diskb/even/viz_result')

    # gen_dot_train_file(data_root='/mnt/diskb/even/dataset',
    #                    rel_path='/MCMOT/images',
    #                    out_root='/mnt/diskb/even/MCMOT/src/data',
    #                    f_name='mcmot.train')

    # cvt2voc_format(data_root='/mnt/diskb/even/dataset/MCMOT_DET')
    # pick_as_val(dot_train_f='/mnt/diskb/even/dataset/MCMOT/train_mcmot.txt')
    # pick_as_val(dot_train_f='/mnt/diskb/even/MCMOT/src/data/mcmot_det.train')

    # add_suffix_for_files(data_root='/mnt/diskb/even/MCMOT/results',
    #                      suffix='old',
    #                      mode='visdrone')

    # add_new_train_data(part_train_f_path='/mnt/diskb/maqiao/multiClass/c5_pc_20200714/train.txt',
    #                    data_root='/mnt/diskb/even/dataset/MCMOT_DET',
    #                    dot_train_f_path='/mnt/diskb/even/MCMOT/src/data/mcmot_det_bk.train',
    #                    dataset_prefix='/mnt/diskb/even/dataset/')

    # gen_dot_train_file_from_txt(txt_f_path='/users/duanyou/c5/all_pretrain/trainall.txt',
    #                             dst_root='/mnt/diskb/even/dataset/MCMOT_DET',
    #                             dot_train_f_path='/mnt/diskb/even/MCMOT/src/data/mcmot_det.train',
    #                             dataset_prefix='/mnt/diskb/even/dataset/')

    # clean_train_set(img_root='/mnt/diskb/even/dataset/DETRAC/images/train',
    #                 label_root='/mnt/diskb/even/dataset/DETRAC/labels_with_ids/train')
    #
    # count_files(img_root='/mnt/diskb/even/dataset/DETRAC/images/train',
    #             label_root='/mnt/diskb/even/dataset/DETRAC/labels_with_ids')

    print('Done')

About loss

你好, 看你使用了Arc margin, Focal loss ,Circle loss优化ReID 这些,请问你有实验那个是最优的吗?

有的类只做检测

@CaptainEven
请问,对于有的类别制作检测的问题,我没有搞懂这个问题
如果已经训练了一个30个类别的模型,只对其中的0,4,6做跟踪其他的做检测,那么只需要在reid_cls_ids中设置0,4,6就可以吗,我这样设置的,然后现实这个错误:
Fix size testing.
training chunk_sizes: [2, 3, 3, 2]
The output will be saved to /home/sunyue/MCMOT-master/src/lib/../../exp/mot/default13
Net input image size: 1088×608
[Err]: configuration conflict of reid_cls_ids and num_classes!
Traceback (most recent call last):
File "demo.py", line 219, in
run_demo(opt)
File "demo.py", line 41, in run_demo
result_root = opt.output_root if opt.output_root != '' else '.'
AttributeError: 'NoneType' object has no attribute 'output_root'

还是在训练的时候这么设置,在训练的时候设置也会出错。

master和visdrone的区别是什么?

请问,master和visdrone这两个版本的区别是什么?使用训练自己的数据集和训练visdrone数据集,visdrone和master这两个版本都可以吗

Pytorch XLA/TPU Support

Questions and Help

Hey @CaptainEven,

Once again Thank you! for your great repo. I was just wondering if you had TPU support for this accelerated training scheme or on the roadmap for the future.
After changing quite a lot in several functions (mainly on train.py, mot.py and base_trainer.py) I am not able to surpass an apparently .backward hang and I think I am going to default to GPUs for now. I am trying with xla and pytorch-lightning

多GPU训练问题

你好!我在Visdrone-MOT数据集上进行训练,目标包含10个类别,采用多GPU训练出现以下问题(losses.py426行),请问有什么解决方案吗?谢谢!单GPU训练是OK的。
image

request for some info about FairMOT

Hi, thanks for extending this project further. I am currently debugging the source codes. Could you please tell me
which folders/files under FairMOT/src/ are performing Re-ID and Detection and while files contain heatmap, center-offset, and box-size branches?

Evaluate Multi-Class Data Using Motmetrics

While writing to the results of tracking in the *.txt file, the predictions for each frame does not include object class information. How do you calculate the mot-metrics for the multi-class data?

2 out of 8 backbones throw size errors

Hi, apologies for creating multiple issues, but really liking this model!

At a default resolution of 1280x736 I'm able to train the following backbones without error:

  • resdcn_18
  • resdcn_34
  • resfpndcn_34
  • dla_34
  • hrnet_18
  • cspdarknet_53

However resdcn_50 and hrnet_32 both throw sizing errors:

resdcn_50:

RuntimeError: The size of tensor a (216) must match the size of tensor b (215) at non-singleton dimension 3

hrnet_32:

RuntimeError: Given transposed=1, weight of size 36 36 2 2, expected input[4, 64, 68, 120] to have 36 channels, but got 64 channels instead

Do you happen to recognize what the problem might be for either of these? Not sure if it's simply an unallowable input size or if network modifications are required.


Also I'm curious if you have a backbone recommendation for best detecting very small objects (areas of ~50 pixels) which also have high counts and are very clustered together (tinyperson crowd counting)?
And is there value in upsampling my 1280x720 resolution videos to something like 1920x1088 to assist small obj detection?

Really appreciate any advice you have time to offer!

C5

请问作者,C5是哪一个数据集呢?方便给一个链接吗?

运行有问题

你好,我在使用MCMOT_Visdrone分支的demo.py时候发现结果有问题。
命令行
python demo.py --input-video xxx.mp4 --arch hrnet_18 --load_model ../mcmot_last_track_hrnet_18_deconv.pth

MCMOT_DET分支结果
1597811152(1)
MCMOT_Visdrone分支结果
1597811181(1)

Please provide installation steps

Hello @CaptainEven ,

ThankYou for providing this code. I have followed the exact installation steps from https://github.com/ifzhang/FairMOT.

conda create -n MCMOT
conda activate MCMOT
conda install pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
cd ${MCMOT_ROOT}
pip install -r requirements.txt
We use DCNv2 in our backbone network and more details can be found in their repo.
git clone https://github.com/CharlesShang/DCNv2
cd DCNv2
./make.sh

Can you please confirm if you used the same steps?

I was able to successfully run the code from FairMOT repository, but i am facing issues while running the code in this repository.

ThankYou

Can't find clear instructions on how to run the demo.

Hi, I am trying to run the demo on a traffic video, but all my tests have returned videos with no detections, or videos with random detections. The environment I working on works perfectly with FairMOT, but doesn't work on MCMOT.

The command I am currently using is:

python demo.py --load_model ../models/mcmot_last_track_hrnet_18_deconv.pth --conf_thres 0.4 --input-video /path/to/video.mp4

The video processes with no errors and it also loads the checkpoint just fine, but it does not find any cars on several videos I've tried.

Could you post step by step instruction on how to run the demo on a video?

Thanks!

wrong detection for visdron validation dataset

Hi
I am trying to validate visdron validation dataset but the results, are not yet all good.

python demo.py

test_single(img_path='/home/tiru/Desktop/mcmot/MCMOT/data/VisDrone2019-MOT-val/sequences/uav0000339_00001_v/0000003.jpg', dev=torch.device('cpu'))

pre-trained model - model_path = './models/mcmot_last_track_resdcn_18_visdrone.pth'

0000003

[Question] DistributedDataParallel compatibility

Hi, I am considering trying to refactor this repo to support distributed training with DistributedDataParallel (or maybe Horovod). Do you happen to foresee any major issues with that working?

用demo.py在visdrone上运行

您好,非常感谢您的工作!
我在运行将您的demo.py用于visdrone时遇到了一些问题:

  1. visdrone数据集只提供了frames,我用24fps将frames合并成了视频,格式为avi,这样能否用于demo.py的video mode?

  2. 我用1生成的视频以及您提供的mcmot_last_track_resdcn_18_visdrone.pth模型运行demo,具体opt.py如下:
    `self.parser.add_argument('--task', default='mot', help='mot')
    self.parser.add_argument('--dataset', default='visdrone', help='jde')
    self.parser.add_argument('--exp_id', default='default')
    self.parser.add_argument('--test', action='store_true')
    self.parser.add_argument('--load_model',
    default='/home/suyx/MCMOT/models/mcmot_last_track_resdcn_18_visdrone.pth',
    help='path to pretrained model')
    self.parser.add_argument('--resume',
    action='store_true',
    help='resume an experiment. '
    'Reloaded the optimizer parameter and '
    'set load_model to model_last.pth '
    'in the exp dir if load_model is empty.')

     # system
     self.parser.add_argument('--gpus',
                              default='0',  # 0, 5, 6
                              help='-1 for CPU, use comma for multiple gpus')
     self.parser.add_argument('--num_workers',
                              type=int,
                              default=8,  # 8, 6, 4
                              help='dataloader threads. 0 for single-thread.')
     self.parser.add_argument('--not_cuda_benchmark', action='store_true',
                              help='disable when the input size is not fixed.')
     self.parser.add_argument('--seed', type=int, default=317,
                              help='random seed')  # from CornerNet
     self.parser.add_argument('--gen-scale',
                              type=bool,
                              default=True,
                              help='Whether to generate multi-scales')
     self.parser.add_argument('--is_debug',
                              type=bool,
                              default=False,  # 是否使用多线程加载数据, default: False
                              help='whether in debug mode or not')  # debug模式下只能使用单进程
    
     # log
     self.parser.add_argument('--print_iter', type=int, default=0,
                              help='disable progress bar and print to screen.')
     self.parser.add_argument('--hide_data_time', action='store_true',
                              help='not display time during training.')
     self.parser.add_argument('--save_all', action='store_true',
                              help='save model to disk every 5 epochs.')
     self.parser.add_argument('--metric', default='loss',
                              help='main metric to save best model')
     self.parser.add_argument('--vis_thresh', type=float, default=0.5,
                              help='visualization threshold.')
     # model: backbone and so on...
     self.parser.add_argument('--arch',
                              default='resdcn_18',
                              help='model architecture. Currently tested'
                                   'resdcn_18 |resdcn_34 | resdcn_50 | resfpndcn_34 |'
                                   'dla_34 | hrnet_32 | hrnet_18 | cspdarknet_53')
     self.parser.add_argument('--head_conv',
                              type=int,
                              default=-1,
                              help='conv layer channels for output head'
                                   '0 for no conv layer'
                                   '-1 for default setting: '
                                   '256 for resnets and 256 for dla.')
     self.parser.add_argument('--down_ratio',
                              type=int,
                              default=4,  # 输出特征图的下采样率 H=H_image/4 and W=W_image/4
                              help='output stride. Currently only supports 4.')
    
     # input
     self.parser.add_argument('--input_res',
                              type=int,
                              default=-1,
                              help='input height and width. -1 for default from '
                                   'dataset. Will be overriden by input_h | input_w')
     self.parser.add_argument('--input_h',
                              type=int,
                              default=-1,
                              help='input height. -1 for default from dataset.')
     self.parser.add_argument('--input_w',
                              type=int,
                              default=-1,
                              help='input width. -1 for default from dataset.')
    
     # train
     self.parser.add_argument('--lr',
                              type=float,
                              default=7e-5,  # 1e-4, 7e-5, 5e-5, 3e-5
                              help='learning rate for batch size 32.')
     self.parser.add_argument('--lr_step',
                              type=str,
                              default='10,20',  # 20,27
                              help='drop learning rate by 10.')
     self.parser.add_argument('--num_epochs',
                              type=int,
                              default=30,  # 30, 10, 3, 1
                              help='total training epochs.')
     self.parser.add_argument('--batch-size',
                              type=int,
                              default=10,  # 18, 16, 14, 12, 10, 8, 4
                              help='batch size')
     self.parser.add_argument('--master_batch_size', type=int, default=-1,
                              help='batch size on the master gpu.')
     self.parser.add_argument('--num_iters', type=int, default=-1,
                              help='default: #samples / batch_size.')
     self.parser.add_argument('--val_intervals', type=int, default=10,
                              help='number of epochs to run validation.')
     self.parser.add_argument('--trainval',
                              action='store_true',
                              help='include validation in training and '
                                   'test on test set')
    
     # test
     self.parser.add_argument('--K',
                              type=int,
                              default=200,  # 128
                              help='max number of output objects.')  # 一张图输出检测目标最大数量
     self.parser.add_argument('--not_prefetch_test',
                              action='store_true',
                              help='not use parallal data pre-processing.')
     self.parser.add_argument('--fix_res',
                              action='store_true',
                              help='fix testing resolution or keep '
                                   'the original resolution')
     self.parser.add_argument('--keep_res',
                              action='store_true',
                              help='keep the original resolution'
                                   ' during validation.')
     # tracking
     self.parser.add_argument(
         '--test_mot16', default=False, help='test mot16')
     self.parser.add_argument(
         '--val_mot15', default=False, help='val mot15')
     self.parser.add_argument(
         '--test_mot15', default=False, help='test mot15')
     self.parser.add_argument(
         '--val_mot16', default=False, help='val mot16 or mot15')
     self.parser.add_argument(
         '--test_mot17', default=False, help='test mot17')
     self.parser.add_argument(
         '--val_mot17', default=False, help='val mot17')
     self.parser.add_argument(
         '--val_mot20', default=False, help='val mot20')
     self.parser.add_argument(
         '--test_mot20', default=False, help='test mot20')
     self.parser.add_argument(
         '--conf_thres',
         type=float,
         default=0.4,  # 0.6, 0.4
         help='confidence thresh for tracking')  # heat-map置信度阈值
     self.parser.add_argument('--det_thres',
                              type=float,
                              default=0.3,
                              help='confidence thresh for detection')
     self.parser.add_argument('--nms_thres',
                              type=float,
                              default=0.4,
                              help='iou thresh for nms')
     self.parser.add_argument('--track_buffer',
                              type=int,
                              default=30,  # 30
                              help='tracking buffer')
     self.parser.add_argument('--min-box-area',
                              type=float,
                              default=200,
                              help='filter out tiny boxes')
    
     # 测试阶段的输入数据模式: video or image dir
     self.parser.add_argument('--input-mode',
                              type=str,
                              default='video',  # video or image_dir or img_path_list_txt
                              help='input data type(video or image dir)')
    
     # 输入的video文件路径
     self.parser.add_argument('--input-video',
                              type=str,
                              default='/home/suyx/MCMOT/dataset/val/VisDrone-val/videos/uav0000086_00000_v.avi',
                              help='path to the input video')
    
     # 输入的image目录
     self.parser.add_argument('--input-img',
                              type=str,
                              default='/users/duanyou/c5/all_pretrain/test.txt',  # ../images/
                              help='path to the input image directory or image file list(.txt)')
    
     self.parser.add_argument('--output-format',
                              type=str,
                              default='video',
                              help='video or text')
     self.parser.add_argument('--output-root',
                              type=str,
                              default='../results',
                              help='expected output root path')
     # mot: 选择数据集的配置文件
     self.parser.add_argument('--data_cfg', type=str,
                              default='../src/lib/cfg/visdrone.json',  # 'mot15.json', 'visdrone.json'
                              help='load data from cfg')
     # self.parser.add_argument('--data_cfg', type=str,
     #                          default='../src/lib/cfg/mcmot_det.json',  # mcmot.json, mcmot_det.json,
     #                          help='load data from cfg')
     self.parser.add_argument('--data_dir',
                              type=str,
                              default='/home/suyx/MCMOT/dataset')
    
     # loss
     self.parser.add_argument('--mse_loss',  # default: false
                              action='store_true',
                              help='use mse loss or focal loss to train '
                                   'keypoint heatmaps.')
     self.parser.add_argument('--reg_loss',
                              default='l1',
                              help='regression loss: sl1 | l1 | l2')  # sl1: smooth L1 loss
     self.parser.add_argument('--hm_weight',
                              type=float,
                              default=1,
                              help='loss weight for keypoint heatmaps.')
     self.parser.add_argument('--off_weight',
                              type=float,
                              default=1,
                              help='loss weight for keypoint local offsets.')
     self.parser.add_argument('--wh_weight',
                              type=float,
                              default=0.1,
                              help='loss weight for bounding box size.')
     self.parser.add_argument('--id_loss',
                              default='ce',
                              help='reid loss: ce | triplet')
     self.parser.add_argument('--id_weight',
                              type=float,
                              default=1,  # 0for detection only and 1 for detection and re-ida
                              help='loss weight for id')  # ReID feature extraction or not
     self.parser.add_argument('--reid_dim',
                              type=int,
                              default=128,  # 128, 256, 512
                              help='feature dim for reid')
     self.parser.add_argument('--input-wh',
                              type=tuple,
                              default=(1088, 608),  # (768, 448) or (1088, 608)
                              help='net input resplution')
     self.parser.add_argument('--multi-scale',
                              type=bool,
                              default=True,
                              help='Whether to use multi-scale training or not')
     # ----------------------1~10 object classes are what we need
     # pedestrian      (1),  --> 0
     # people          (2),  --> 1
     # bicycle         (3),  --> 2
     # car             (4),  --> 3
     # van             (5),  --> 4
     # truck           (6),  --> 5
     # tricycle        (7),  --> 6
     # awning-tricycle (8),  --> 7
     # bus             (9),  --> 8
     # motor           (10), --> 9
     # ----------------------
    
     # others          (11)
     self.parser.add_argument('--reid_cls_ids',
                              default='0,1,2,3,4,5,6,7,8,9',  # '0,1,2,3,4' or '0,1,2,3,4,5,6,7,8,9'
                              help='')  # the object classes need to do reid
    
     self.parser.add_argument('--norm_wh', action='store_true',
                              help='L1(\hat(y) / y, 1) or L1(\hat(y), y)')
     self.parser.add_argument('--dense_wh', action='store_true',
                              help='apply weighted regression near center or '
                                   'just apply regression on center point.')
     self.parser.add_argument('--cat_spec_wh',
                              action='store_true',
                              help='category specific bounding box size.')
     self.parser.add_argument('--not_reg_offset',
                              action='store_true',
                              help='not regress local offset.')
    

`
运行时log上看应该是没问题的:
Fix size testing.
training chunk_sizes: [10]
The output will be saved to /home/suyx/MCMOT/src/lib/../../exp/mot/default
Net input image size: 1088×608
heads: {'hm': 10, 'wh': 2, 'id': 128, 'reg': 2}
2020-10-21 11:17:03 [INFO]: Starting tracking...
2020-10-21 11:17:03 [INFO]: Starting tracking...
Lenth of the video: 464 frames
Creating model...
loaded /home/suyx/MCMOT/models/mcmot_last_track_resdcn_18_visdrone.pth, epoch 292020-10-21 11:17:11 [INFO]: Processing frame 0 (100000.00 fps)
2020-10-21 11:17:11 [INFO]: Processing frame 0 (100000.00 fps)
2020-10-21 11:17:14 [INFO]: Processing frame 20 (10.23 fps)
2020-10-21 11:17:14 [INFO]: Processing frame 20 (10.23 fps)
2020-10-21 11:17:17 [INFO]: Processing frame 40 (13.34 fps)
2020-10-21 11:17:17 [INFO]: Processing frame 40 (13.34 fps)
2020-10-21 11:17:20 [INFO]: Processing frame 60 (14.66 fps)
2020-10-21 11:17:20 [INFO]: Processing frame 60 (14.66 fps)
2020-10-21 11:17:23 [INFO]: Processing frame 80 (15.30 fps)
2020-10-21 11:17:23 [INFO]: Processing frame 80 (15.30 fps)
2020-10-21 11:17:25 [INFO]: Processing frame 100 (16.10 fps)
2020-10-21 11:17:25 [INFO]: Processing frame 100 (16.10 fps)
2020-10-21 11:17:28 [INFO]: Processing frame 120 (16.51 fps)
2020-10-21 11:17:28 [INFO]: Processing frame 120 (16.51 fps)
2020-10-21 11:17:31 [INFO]: Processing frame 140 (16.74 fps)
2020-10-21 11:17:31 [INFO]: Processing frame 140 (16.74 fps)
2020-10-21 11:17:34 [INFO]: Processing frame 160 (16.91 fps)
2020-10-21 11:17:34 [INFO]: Processing frame 160 (16.91 fps)
2020-10-21 11:17:36 [INFO]: Processing frame 180 (17.06 fps)
2020-10-21 11:17:36 [INFO]: Processing frame 180 (17.06 fps)
2020-10-21 11:17:39 [INFO]: Processing frame 200 (17.27 fps)
2020-10-21 11:17:39 [INFO]: Processing frame 200 (17.27 fps)
2020-10-21 11:17:42 [INFO]: Processing frame 220 (17.27 fps)
2020-10-21 11:17:42 [INFO]: Processing frame 220 (17.27 fps)
2020-10-21 11:17:44 [INFO]: Processing frame 240 (17.35 fps)
2020-10-21 11:17:44 [INFO]: Processing frame 240 (17.35 fps)
2020-10-21 11:17:47 [INFO]: Processing frame 260 (17.52 fps)
2020-10-21 11:17:47 [INFO]: Processing frame 260 (17.52 fps)
2020-10-21 11:17:49 [INFO]: Processing frame 280 (17.59 fps)
2020-10-21 11:17:49 [INFO]: Processing frame 280 (17.59 fps)
2020-10-21 11:17:51 [INFO]: Processing frame 300 (17.69 fps)
2020-10-21 11:17:51 [INFO]: Processing frame 300 (17.69 fps)
2020-10-21 11:17:54 [INFO]: Processing frame 320 (17.81 fps)
2020-10-21 11:17:54 [INFO]: Processing frame 320 (17.81 fps)
2020-10-21 11:17:56 [INFO]: Processing frame 340 (17.85 fps)
2020-10-21 11:17:56 [INFO]: Processing frame 340 (17.85 fps)
2020-10-21 11:17:59 [INFO]: Processing frame 360 (17.92 fps)
2020-10-21 11:17:59 [INFO]: Processing frame 360 (17.92 fps)
2020-10-21 11:18:03 [INFO]: Processing frame 380 (17.75 fps)
2020-10-21 11:18:03 [INFO]: Processing frame 380 (17.75 fps)
2020-10-21 11:18:06 [INFO]: Processing frame 400 (17.79 fps)
2020-10-21 11:18:06 [INFO]: Processing frame 400 (17.79 fps)
2020-10-21 11:18:09 [INFO]: Processing frame 420 (17.82 fps)
2020-10-21 11:18:09 [INFO]: Processing frame 420 (17.82 fps)
2020-10-21 11:18:12 [INFO]: Processing frame 440 (17.83 fps)
2020-10-21 11:18:12 [INFO]: Processing frame 440 (17.83 fps)
2020-10-21 11:18:14 [INFO]: Processing frame 460 (17.85 fps)
2020-10-21 11:18:14 [INFO]: Processing frame 460 (17.85 fps)
2020-10-21 11:18:15 [INFO]: save results to ../results/results.txt
2020-10-21 11:18:15 [INFO]: save results to ../results/results.txt
ffmpeg version 4.3 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 7.3.0 (crosstool-NG 1.23.0.449-a04d0)
configuration: --prefix=/home/suyx/miniconda3/envs/FairMOT --cc=/opt/conda/conda-bld/ffmpeg_1597178665428/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --disable-openssl --enable-avresample --enable-gnutls --enable-hardcoded-tables --enable-libfreetype --enable-libopenh264 --enable-pic --enable-pthreads --enable-shared --disable-static --enable-version3 --enable-zlib --enable-libmp3lame
libavutil 56. 51.100 / 56. 51.100
libavcodec 58. 91.100 / 58. 91.100
libavformat 58. 45.100 / 58. 45.100
libavdevice 58. 10.100 / 58. 10.100
libavfilter 7. 85.100 / 7. 85.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 7.100 / 5. 7.100
libswresample 3. 7.100 / 3. 7.100
Input #0, image2, from '../results/frame/%05d.jpg':
Duration: 00:00:18.52, start: 0.000000, bitrate: N/A
Stream #0:0: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 1920x1080 [SAR 1:1 DAR 16:9], 25 fps, 25 tbr, 25 tbn, 25 tbc
Please use -b:a or -b:v, -b is ambiguous
Stream mapping:
Stream #0:0 -> #0:0 (mjpeg (native) -> mpeg4 (native))
Press [q] to stop, [?] for help
[swscaler @ 0x55b903555480] deprecated pixel format used, make sure you did set range correctly
Output #0, mp4, to '../results/uav0000086_00000_v_track.mp4':
Metadata:
encoder : Lavf58.45.100
Stream #0:0: Video: mpeg4 (mp4v / 0x7634706D), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], q=2-31, 5000 kb/s, 25 fps, 12800 tbn, 25 tbc
Metadata:
encoder : Lavc58.91.100 mpeg4
Side data:
cpb: bitrate max/min/avg: 0/0/5000000 buffer size: 0 vbv_delay: N/A
frame= 463 fps= 38 q=9.5 Lsize= 11457kB time=00:00:18.48 bitrate=5078.8kbits/s speed= 1.5x
video:11454kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.026063%

但是最终得到的tracking后的视频里没有检测框,有一些帧里会有几个黑色区域
uav0000086_00000_v_track mp4_20201021_115020 538

请问您能说一下您在visdrone上跑跟踪时的做法吗?非常感谢!

[Question] Det vs Track confidence

I noticed that self.det_thresh = opt.conf_thres. I've seen in some models that they use a lower thresh to continue track compared to init new track, have you happened to see how that compares?

only one id

if only has one id for some class, then raise error

print(cls_id, nID)

self.emb_scale_dict[cls_id] = math.sqrt(2) * math.log(nID - 1)

where() missing 2 required positional argument: "input", "other"

您好,在跑train.py的时候遇到了这个错误 :
File "D:\code\pytorchCode\MCMOT\src\lib\trains\mot.py", line 198, in forward
inds = torch.where(cls_id_map == cls_id),
TypeError: where() missing 2 required positional argument: "input", "other"
在计算ReID loss这一块,inds = torch.where(cls_id_map == cls_id),我应该如何来改呢?来此where函数里,我应该填入哪两个参数呢?谢谢

../results/frame/%05d.jpg: No such file or directory

运行demo.py的时候出现了这样的错误:

ffmpeg version 4.2.4-1ubuntu0.1 Copyright (c) 2000-2020 the FFmpeg developers
built with gcc 9 (Ubuntu 9.3.0-10ubuntu2)
configuration: --prefix=/usr --extra-version=1ubuntu0.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-avresample --disable-filter=resample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librsvg --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-nvenc --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
[image2 @ 0x55b3739d8700] Could find no file with path '../results/frame/%05d.jpg' and index in the range 0-4
../results/frame/%05d.jpg: No such file or directory

No detection or tracking bbox in the output video

Dear @CaptainEven ,

python train.py --task mot --exp_id cater --gpus 1 --batch_size 8 --num_epochs 60 --lr_step '50' --data_cfg '../src/lib/cfg/cater.json' --load_model '../models/dla34-ba72cf86.pth' --output-root '../results' --arch dla_34

I used this command to train MCMOT for a task. I was able to successfully train the model for 60 epochs but the output video doesnt do any annotation or tracking. It is same as the input video.

This is the annotation for 1 frame of a video in the dataset.
2 1 0.262500 0.346388 0.114882 0.153176
3 1 0.318750 0.602854 0.217099 0.289465
4 1 0.465625 0.385947 0.089549 0.159198
1 1 0.581250 0.562053 0.081316 0.108421
4 2 0.275000 0.419104 0.143403 0.254940
2 2 0.725000 0.438366 0.060644 0.080858
2 3 0.434375 0.300891 0.051663 0.068884
4 3 0.628125 0.290727 0.122046 0.216971
4 4 0.803125 0.455257 0.145570 0.258791
4 5 0.509375 0.268900 0.036590 0.065049

Can you please help me figure out the reason why nothing is detected?

ThankYou

result.txt

你好,想请问下track生成的result.txt文件是包含视频中所检测到所有目标的信息吗,如class_id, box等?
我生成的result.txt有些奇怪,里面只有两个目标的信息,但是查看视频,不止两个目标。这该如何是好啊

版本问题导致的除号的变化

运行demo出的错:

2020-09-10 17:39:54 [INFO]: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
2020-09-10 17:39:54 [INFO]: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

由于没有指定文件的位置在哪里,所以,无法查找,请问,你在源码中哪里使用的不合适的“/”,因为我这里使用的python3,所以会出现这个错误,导致无法运行。如果你能帮助我,我将感激不尽。

Implementing on 50 salads dataset

Hi! I liked your approach and thanks for extending FairMOT for multi class tracking. However i would like to apply this approach on the 50 salads dataset. I am experimenting by annotating a few videos for both detection and tracking. Please let me know the steps i can take to run your code on my dataset. Also, please let me know the format for annotation files of detection and Ground truth for tracking.

a problem

I used your code for training. But nothing was detected during the test. I only have two categories. opt has been modified.
0 1 0.7802083333333333 0.5222222222222223 0.030208333333333334 0.16666666666666666
1 1 0.5213541666666667 0.4935185185185185 0.09895833333333333 0.13518518518518519
1 2 0.4075520833333333 0.46435185185185185 0.07760416666666667 0.12314814814814815
1 3 0.32265625 0.4634259259259259 0.0734375 0.09351851851851851
1 4 0.53359375 0.39166666666666666 0.0328125 0.05
1 5 0.2513020833333333 0.38657407407407407 0.07760416666666667 0.1824074074074074
1 6 0.4549479166666667 0.3837962962962963 0.09322916666666667 0.16944444444444445

This is my label file.

跑demo时报错name 'width' is not defined

复现代码时
root@c630585089e3:/home/envs/MCMOT/src# /home/envs/MCMOT/videos/MOT16-03.mp4
bash: /home/envs/MCMOT/videos/MOT16-03.mp4: Permission denied
root@c630585089e3:/home/envs/MCMOT/src# python3 demo.py
Fix size testing.
training chunk_sizes: [10]
The output will be saved to /home/envs/MCMOT/src/lib/../../exp/mot/default
Net input image size: 1088×608
heads: {'hm': 5, 'wh': 2, 'id': 128, 'reg': 2}
2020-09-12 08:38:02 [INFO]: Starting tracking...
2020-09-12 08:38:02 [INFO]: Starting tracking...
Lenth of the video: 1500 frames
Creating model...
loaded ../models/mcmot_last_track_resdcn_18.pth, epoch 9
2020-09-12 08:38:04 [INFO]: Processing frame 0 (100000.00 fps)
2020-09-12 08:38:04 [INFO]: Processing frame 0 (100000.00 fps)
2020-09-12 08:38:04 [INFO]: name 'width' is not defined
2020-09-12 08:38:04 [INFO]: name 'width' is not defined

gen_labels

本人小白一枚,请问数据集为MOT类型,有images和gt.txt文件,需要进行多类别追踪训练,生成labels时,需要对gen_labels_mot16_car.py进行哪些方面的修改呢?提前谢谢大佬解答!

Possible bug in Flip augmentation

When the image is flipped along the x-axis, the labels are modified by:

labels[:, 2] = 1 - labels[:, 2]

Since at this point the labels are in xywh format, this maintains the width which is correct, but I believe the new x1 value should be (1-x-w) instead of (1-x), or:

labels[:, 2] = 1 - (labels[:, 2] + labels[:, 4])

Or am I missing something?

webcam

很棒的工作,期待加入webcam

貌似有一点问题.

当我运行track.py时,代码报错,报错如下,可以帮忙看看吗:
Traceback (most recent call last):
File "track.py", line 363, in
save_videos=True)
File "track.py", line 232, in main
save_dir=output_dir, show_image=show_image, frame_rate=frame_rate)
File "track.py", line 132, in eval_seq
online_targets_dict = tracker.update(blob, img_0)
File "/media/goo/use2download/MCMOT/src/lib/tracker/multitracker.py", line 436, in update
cls_id_feature = cls_id_feats[cls_id][remain_inds]
IndexError: boolean index did not match indexed array along dimension 0; dimension is 270 but corresponding boolean dimension is 122

pretrained model

Is there a possibility to download the pretrained model, without the need of a Chinese phone number?
Thanks in advance!

Is the ID number is accumulation?

Thank you for sharing the code. It's great work :D

I use the mcmot_last_track_resdcn_18.pth try to predict custom video.
And find the same ID in same class on different object in different frame.
image
image

So my question : Is the ID number is accumulation? Or use the other way to assign ID?

Thank you :)

数据集的追踪ID可以从0开始吗?

@CaptainEven
请问,在labels_with_ids中的追踪的id必须从1开始吗?可以从0开始标吗,如果是从0开始标的,在源码中需要修改下面的的代码吗?
if self.opt.id_weight > 0:
# @even: 取output feature map的每个(y, x)处的目标类别
cls_id_map[0][ct_int[1], ct_int[0]] = cls_id # 1×H×W
# @even: 记录该类别对应的track ids
cls_tr_ids[cls_id][ct_int[1]][ct_int[0]] = label[1] -1 # track id从1开始的, 转换成从0开始
ids[k] = label[1] - 1 # 分类的idx: track id - 1

[Feature Request] TensorBoard Images

Hi, I think it would help many people in debugging if some training image results (or maybe even periodic validation results) could be written to the tensorboard events files :)

详细说明

您好,看到了将FAIRMOT升级到多类别,很佩服您的工作。有关于一些细节问题想请教:
1.多类别模型的数据集是您自己的数据集,还是说将公开数据集进行组合,例如原始的FAIRMOT数据加入UA-DETRAC车辆数据
2.效果图中显示的模型的训练细节是怎样的。
如果您方便的话还是请教一下这些问题,多谢。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.