dk-liang / cltr Goto Github PK

View Code? Open in Web Editor NEW

85.0 3.0 12.0 155.1 MB

[ECCV 2022] An End-to-End Transformer Model for Crowd Localization

License: MIT License

Python 99.51% Shell 0.49%

cltr's Introduction

CLTR (Crowd Localization TRansformer)

[Project page] [paper]

An official implementation of "An end to end transformer model for crowd localization" (Accepted by ECCV 2022).

Currently, the code of this version is not well organized, which may contain some obscure code comments.

Environment

python ==3.6
pytorch ==1.80
opencv-python
scipy
h5py
pillow
imageio
nni
mmcv
tensorboard

Datasets

Download JHU-CROWD ++ dataset from here
Download NWPU-Crowd dataset (resized) from Baidu, password: 04i4 or Onedrive

Prepare data

Generate point map

cd CLTR/data
For JHU-Crowd++ dataset: python prepare_jhu.py --data_path /xxx/xxx/jhu_crowd_v2.0
For NWPU-Crowd dataset: python prepare_nwpu.py --data_path /xxx/xxx/NWPU_CLTR

Generate image list

cd CLTR
python make_npydata.py --jhu_path /xxx/xxx/jhu_crowd_v2.0 --nwpu_path /xxx/xxx/NWPU_CLTR

Training

Example (some hyper-parameters may be different from the original paper):
cd CLTR
sh experiments/jhu.sh
or
sh experiments/nwpu.sh

Please change nproc_per_node and gpu_id of jhu.sh/nwpu.sh, if you do not have enogh GPU.
We have fixed all random seeds, i.e., different runs will report the same results under the same setting.
The model will be saved in CLTR/save_file/log_file
Note that using FPN will improve the performance, but we do not add it in this version.
Turning some hyper-parameters will also bring improvement (e.g., the image size, crop size, number of queries).

Here we give the comparison.

NWPU-Crowd (val set)	MAE	MSE
Original paper	61.9	246.3
This repo (training log)	51.3	116.7

Testing

Example:
python test.py --dataset jhu --pre model.pth --gpu_id 2,3
or
python test.py --dataset nwpu --pre model.pth --gpu_id 0,1

The model.pth can be obtained from the training phase.

Video Demo

Example:
python video_demo.py --video_path ./video_demo/demo.mp4 --num_queries 700 --pre video_model.pth

The "video_model.pth" (trained from NWPU-Crowd training set) can be downloaded from Baidu disk, password: rw6b or google drive.
The generated video will be named "out_video.avi"

Visiting bilibili or Youtube to watch the video demo.

Acknowledgement

Thanks for the following great work:

@inproceedings{carion2020end,
  title={End-to-end object detection with transformers},
  author={Carion, Nicolas and Massa, Francisco and Synnaeve, Gabriel and Usunier, Nicolas and Kirillov, Alexander and Zagoruyko, Sergey},
  booktitle={European conference on computer vision},
  pages={213--229},
  year={2020},
  organization={Springer}
}

@inproceedings{meng2021conditional,
  title={Conditional detr for fast training convergence},
  author={Meng, Depu and Chen, Xiaokang and Fan, Zejia and Zeng, Gang and Li, Houqiang and Yuan, Yuhui and Sun, Lei and Wang, Jingdong},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={3651--3660},
  year={2021}
}

Reference

If you find this project is useful, please cite:

@article{liang2022end,
  title={An end-to-end transformer model for crowd localization},
  author={Liang, Dingkang and Xu, Wei and Bai, Xiang},
  journal={European Conference on Computer Vision},
  year={2022}
}

cltr's People

Contributors

Stargazers

Watchers

Forkers

ai18435136351 mast1ren cxmmaycxm xiejiahao233 h-hui2277 agentvi sydai fanfanfan123456 cnwxi bozeklab kikiliu-c xijunke

cltr's Issues

training loss

when we reprodece your code, we find that the training loss could not reduce fast as your reported result. And the result of sha is 70.385 and 114.081. There is a large gap compared to your reported results.:

2022-09-17 01:55:57,093 - CLTR - INFO - begin test
2022-09-17 01:56:37,525 - CLTR - INFO - Testing Epoch:[1240/1500] mae=73.967 mse=118.639 best_mae=71.473
2022-09-17 01:56:44,394 - CLTR - INFO - Training Epoch:[1241/1500] loss=3.62841 lr=0.000100 epoch_time=6.869
2022-09-17 01:56:51,514 - CLTR - INFO - Training Epoch:[1242/1500] loss=3.46821 lr=0.000100 epoch_time=7.118
2022-09-17 01:56:58,223 - CLTR - INFO - Training Epoch:[1243/1500] loss=3.55540 lr=0.000100 epoch_time=6.708
2022-09-17 01:57:05,025 - CLTR - INFO - Training Epoch:[1244/1500] loss=3.51130 lr=0.000100 epoch_time=6.801
2022-09-17 01:57:12,066 - CLTR - INFO - Training Epoch:[1245/1500] loss=3.39875 lr=0.000100 epoch_time=7.040
2022-09-17 01:57:19,134 - CLTR - INFO - Training Epoch:[1246/1500] loss=3.46755 lr=0.000100 epoch_time=7.067
2022-09-17 01:57:26,227 - CLTR - INFO - Training Epoch:[1247/1500] loss=3.53449 lr=0.000100 epoch_time=7.092
2022-09-17 01:57:32,900 - CLTR - INFO - Training Epoch:[1248/1500] loss=3.61158 lr=0.000100 epoch_time=6.672
2022-09-17 01:57:39,863 - CLTR - INFO - Training Epoch:[1249/1500] loss=3.47778 lr=0.000100 epoch_time=6.962
2022-09-17 01:57:46,597 - CLTR - INFO - Training Epoch:[1250/1500] loss=3.47975 lr=0.000100 epoch_time=6.733
2022-09-17 01:57:53,532 - CLTR - INFO - Training Epoch:[1251/1500] loss=3.43447 lr=0.000100 epoch_time=6.934
2022-09-17 01:58:00,620 - CLTR - INFO - Training Epoch:[1252/1500] loss=3.51294 lr=0.000100 epoch_time=7.087
2022-09-17 01:58:07,619 - CLTR - INFO - Training Epoch:[1253/1500] loss=3.42724 lr=0.000100 epoch_time=6.997
2022-09-17 01:58:14,715 - CLTR - INFO - Training Epoch:[1254/1500] loss=3.52697 lr=0.000100 epoch_time=7.095
2022-09-17 01:58:21,246 - CLTR - INFO - Training Epoch:[1255/1500] loss=3.68258 lr=0.000100 epoch_time=6.530
2022-09-17 01:58:28,036 - CLTR - INFO - Training Epoch:[1256/1500] loss=3.47060 lr=0.000100 epoch_time=6.789
2022-09-17 01:58:35,009 - CLTR - INFO - Training Epoch:[1257/1500] loss=3.52460 lr=0.000100 epoch_time=6.972
2022-09-17 01:58:41,836 - CLTR - INFO - Training Epoch:[1258/1500] loss=3.50146 lr=0.000100 epoch_time=6.826
2022-09-17 01:58:48,671 - CLTR - INFO - Training Epoch:[1259/1500] loss=3.58137 lr=0.000100 epoch_time=6.834
2022-09-17 01:58:55,733 - CLTR - INFO - Training Epoch:[1260/1500] loss=3.44624 lr=0.000100 epoch_time=7.061
2022-09-17 01:58:55,734 - CLTR - INFO - begin test
2022-09-17 01:59:26,869 - CLTR - INFO - Testing Epoch:[1260/1500] mae=70.385 mse=114.081 best_mae=70.385
2022-09-17 01:59:33,723 - CLTR - INFO - Training Epoch:[1261/1500] loss=3.57292 lr=0.000100 epoch_time=6.854
2022-09-17 01:59:40,602 - CLTR - INFO - Training Epoch:[1262/1500] loss=3.52274 lr=0.000100 epoch_time=6.878
2022-09-17 01:59:47,400 - CLTR - INFO - Training Epoch:[1263/1500] loss=3.48122 lr=0.000100 epoch_time=6.797
2022-09-17 01:59:54,421 - CLTR - INFO - Training Epoch:[1264/1500] loss=3.45710 lr=0.000100 epoch_time=7.020
2022-09-17 02:00:01,293 - CLTR - INFO - Training Epoch:[1265/1500] loss=3.53121 lr=0.000100 epoch_time=6.871
2022-09-17 02:00:08,249 - CLTR - INFO - Training Epoch:[1266/1500] loss=3.53630 lr=0.000100 epoch_time=6.955
2022-09-17 02:00:15,238 - CLTR - INFO - Training Epoch:[1267/1500] loss=3.41089 lr=0.000100 epoch_time=6.988
2022-09-17 02:00:21,989 - CLTR - INFO - Training Epoch:[1268/1500] loss=3.55521 lr=0.000100 epoch_time=6.750
2022-09-17 02:00:28,751 - CLTR - INFO - Training Epoch:[1269/1500] loss=3.63312 lr=0.000100 epoch_time=6.761
2022-09-17 02:00:35,698 - CLTR - INFO - Training Epoch:[1270/1500] loss=3.41506 lr=0.000100 epoch_time=6.945
2022-09-17 02:00:42,523 - CLTR - INFO - Training Epoch:[1271/1500] loss=3.55783 lr=0.000100 epoch_time=6.824
2022-09-17 02:00:49,606 - CLTR - INFO - Training Epoch:[1272/1500] loss=3.44602 lr=0.000100 epoch_time=7.082
2022-09-17 02:00:56,604 - CLTR - INFO - Training Epoch:[1273/1500] loss=3.51345 lr=0.000100 epoch_time=6.997
2022-09-17 02:01:03,575 - CLTR - INFO - Training Epoch:[1274/1500] loss=3.42050 lr=0.000100 epoch_time=6.969
2022-09-17 02:01:10,444 - CLTR - INFO - Training Epoch:[1275/1500] loss=3.48487 lr=0.000100 epoch_time=6.868
2022-09-17 02:01:17,235 - CLTR - INFO - Training Epoch:[1276/1500] loss=3.48882 lr=0.000100 epoch_time=6.790
2022-09-17 02:01:24,114 - CLTR - INFO - Training Epoch:[1277/1500] loss=3.57055 lr=0.000100 epoch_time=6.878
2022-09-17 02:01:31,314 - CLTR - INFO - Training Epoch:[1278/1500] loss=3.45037 lr=0.000100 epoch_time=7.199
2022-09-17 02:01:38,042 - CLTR - INFO - Training Epoch:[1279/1500] loss=3.68802 lr=0.000100 epoch_time=6.726
2022-09-17 02:01:44,786 - CLTR - INFO - Training Epoch:[1280/1500] loss=3.51901 lr=0.000100 epoch_time=6.743
2022-09-17 02:01:44,787 - CLTR - INFO - begin test
2022-09-17 02:02:12,237 - CLTR - INFO - Testing Epoch:[1280/1500] mae=75.275 mse=123.192 best_mae=70.385
2022-09-17 02:02:19,001 - CLTR - INFO - Training Epoch:[1281/1500] loss=3.62225 lr=0.000100 epoch_time=6.763
2022-09-17 02:02:25,931 - CLTR - INFO - Training Epoch:[1282/1500] loss=3.52498 lr=0.000100 epoch_time=6.929
2022-09-17 02:02:32,735 - CLTR - INFO - Training Epoch:[1283/1500] loss=3.51817 lr=0.000100 epoch_time=6.803
2022-09-17 02:02:39,512 - CLTR - INFO - Training Epoch:[1284/1500] loss=3.59665 lr=0.000100 epoch_time=6.776
2022-09-17 02:02:46,403 - CLTR - INFO - Training Epoch:[1285/1500] loss=3.57462 lr=0.000100 epoch_time=6.889
2022-09-17 02:02:53,304 - CLTR - INFO - Training Epoch:[1286/1500] loss=3.45410 lr=0.000100 epoch_time=6.901
2022-09-17 02:03:00,091 - CLTR - INFO - Training Epoch:[1287/1500] loss=3.52542 lr=0.000100 epoch_time=6.786
2022-09-17 02:03:06,966 - CLTR - INFO - Training Epoch:[1288/1500] loss=3.43454 lr=0.000100 epoch_time=6.873
2022-09-17 02:03:14,036 - CLTR - INFO - Training Epoch:[1289/1500] loss=3.47092 lr=0.000100 epoch_time=7.069
2022-09-17 02:03:20,975 - CLTR - INFO - Training Epoch:[1290/1500] loss=3.50124 lr=0.000100 epoch_time=6.939
2022-09-17 02:03:27,941 - CLTR - INFO - Training Epoch:[1291/1500] loss=3.43352 lr=0.000100 epoch_time=6.964
2022-09-17 02:03:35,042 - CLTR - INFO - Training Epoch:[1292/1500] loss=3.39298 lr=0.000100 epoch_time=7.100
2022-09-17 02:03:42,024 - CLTR - INFO - Training Epoch:[1293/1500] loss=3.50791 lr=0.000100 epoch_time=6.981
2022-09-17 02:03:48,689 - CLTR - INFO - Training Epoch:[1294/1500] loss=3.48784 lr=0.000100 epoch_time=6.664
2022-09-17 02:03:55,622 - CLTR - INFO - Training Epoch:[1295/1500] loss=3.52300 lr=0.000100 epoch_time=6.932
2022-09-17 02:04:02,859 - CLTR - INFO - Training Epoch:[1296/1500] loss=3.28457 lr=0.000100 epoch_time=7.236
2022-09-17 02:04:09,545 - CLTR - INFO - Training Epoch:[1297/1500] loss=3.57269 lr=0.000100 epoch_time=6.685
2022-09-17 02:04:16,391 - CLTR - INFO - Training Epoch:[1298/1500] loss=3.59400 lr=0.000100 epoch_time=6.846
2022-09-17 02:04:23,223 - CLTR - INFO - Training Epoch:[1299/1500] loss=3.54231 lr=0.000100 epoch_time=6.831
2022-09-17 02:04:30,576 - CLTR - INFO - Training Epoch:[1300/1500] loss=3.40182 lr=0.000100 epoch_time=7.352
2022-09-17 02:04:30,577 - CLTR - INFO - begin test

Testing help

大佬您好，我按照您的说明文档进行了相应的训练，也达到了同样的训练效果；看到您给出了在nwpu验证集上的测试代码test.py，不过我想在自己的数据集上做个测试看看效果，请问我也需要对我自己的数据集先生成point map吗？（其实这里先生成point map的操作我也有些疑问：论文中说的该方法是区别于之前密度图方法的，那这里要先生成point map的目的是什么呢？）

代码小白，可能问的问题有点低级了...恳请大佬可以帮忙解答一下，如果能提供一下在别的数据集上的测试/推理代码就更好了

请问，单GPU怎么解决训练过程中mae,mse不变的问题？

How can I use the model trained from JHU-CROWD ++ training set to run video_demo.py

Thanks for the interesting work! How can I use the model trained from JHU-CROWD ++ training set to run video_demo.py ? I got size dismatch error.

Where is the npy file

elif args['dataset'] == 'nwpu':
    train_file = './npydata/nwpu_train.npy'
    test_file = './npydata/nwpu_val.npy'

the threshold in vedio.py

Is the threshold in vedio.py is 0.25? But in test.py and train.py is 0.35?

Reported JHU-CROWD++ results

Are the reported results on JHU-CROWD++ based on the test set or val set?
Supposing they are based on test set, do you have the overall result on val set too?

I'd like to know how to train the UCF-QNRF and ShanghaiTech dataset

why is train_loader and test_loader =0?

Not using distributed mode
model params: 43.446471
2023-03-20 08:19:37,507 - CLTR - INFO - model params: = 43.446
best result: 100000.0
2023-03-20 08:19:37,512 - CLTR - INFO - best result = 100000.000
0
mae 0.0 mse 0.0
0
mae 0.0 mse 0.0
0
mae 0.0 mse 0.0
0
mae 0.0 mse 0.0

Training help

In the readme document written by you, the model first processes the jhu population data set and generates npy data. I found that the model learning rate and mse and mae did not change after iterating several epochs. May I ask why? thank you

mae is same when training

mae is same when training my own dataset

How to understand Object Queries in Crowd Counting task？

I would like to ask the author about the explanation of object queries.

From DETR, we know that object queries, that is, the decoder of the transformer, will generate N predictions at one time. Among them, N is a pre-set integer that is at least greater than the number of objects in the picture, and then this N is the value of object queries.

However, in the crowd counting task here, there are often thousands or tens of thousands of people in a picture. I saw that the author set the object queries to 700 or 500. How do I understand this?
If object queries are defined in DETR, should it be set to a value of several thousand? But it feels so strange, can the author share his understanding of object queries? I would appreciate it.

Test the model wtihout ground truth

Hi,
Thanks for sharing your code. I was wondering if you have a test code that works with the images without ground truth? In your test.py it seems we need to have shape of kpoint in order to compute topk_values but what if we do not have the ground truth?

I want to ask about the output of your CLTR model

i see outputs['pred_logits'] which shape are (batch_size, num_queries, num_classes) but outputs['pred_points'] shape is (batch_size, num_queries, 3). What is that 3 stands for?. And does num_classes = 2 which is person head and background ?

If I want to train my own dataset, what parameters should I modify in the model code

If I want to train my own dataset, what parameters should I modify in which files

Error when running video_demo.py

I am running on google colab with GPU enabled. I downloaded the weights and have a video ready to play with. When I run the specified command, I get this error:

"RuntimeError: Attempting to deserialize object on CUDA device 1 but torch.cuda.device_count() is 1. Please use torch.load with map_location to map your storages to an existing device."

Can somebody help me? I do have a GPU ready, I do not know why it doesn't run.

KMO based Matcher

After reading your article and code, I am interested in the part of Matcher, but I do not find KMO based Matcher in the code. Is it in the module or something else?

提问

你好，可以开源一下另外两个数据集？

Pre-trained model for crowd images

Hi,

Do you have the pre-trained model for crowd images ?

Can the authors provide an explanation on why the KMO matcher isn't reflected in the code?

The matcher.py file in the repository implements the basic L1-based Hungarian matcher and does not reflect the KMO-based Hungarian bipartite matching claimed in the paper.

Looking through the closed issues, it seems like others have already pointed this out but the authors seem to have closed the issues without providing an explanation.

Given that the KMO matcher is really the crux of the paper, without which the model become merely a clone of the conditional DETR adapted to output points, this issue will inevitably be raised again in the future.

To avoid this issue being repeated raised in the repository, can the authors provide an explanation? Is it out of IP concerns? Is it still being implemented?

NWPU dataset

The download link for NWPU dataset does not work. I can not create an account on Baidu as I am located in Germany. Please can you provide the dataset on a different platform?
I tried to download the original one, but I was not able to recreate your results and inference was really bad.

How do I reproduce MAE performance for sha shb and qnrf datasets

How do I set the number of decoder layer? And hyperparameters like learning rate?

training issue

Thank you very much for your wonderful work！ When we use your released code to sh jhu.sh, the code is stuck. We have only one GPU and set the jhu.sh as "--nproc_per_node=1 --master_port 5228 train_distributed.py --gpu_id '0' ". The code turn to:

2022-09-16 12:59:31,334 - CLTR - INFO - => no checkpoint found at 'None'
best result: 100000.0
2022-09-16 12:59:31,335 - CLTR - INFO - best result = 100000.000
2022-09-16 12:59:31,364 - CLTR - INFO - best result=100000.000 start epoch=0.000
2022-09-16 12:59:31,364 - CLTR - INFO - start training!

and it stuck.

how to training with one device

i am interested in this great job! i wanna know how to training with one gpu device

Any more pretrained model released？

Will any more pretraiend model be released ? SHA&B and others?

Reproduce NWPU testset issue

Hi, I would like to know how to reproduce the performance in NWPU testset on the ranking website https://www.crowdbenchmark.com/index.html
There are two questions.

How to preprocess the images?
There is no processed test image in the files from your download link. By observing image size of dir images and images_2048. I guess the resize rule is let the width of image no more than 2048 otherwise reduce the length and width proportionally. But I got O_MAE more than 200 when using offered weights or the model train by myself(follow your instructions in README).
Should I use same code except the section dealing with ground-truth in dataset.py for test images?

Best AP model

How do you choose the model with highest ap?

Pretrained model

Hi, tks for the interesting work.

Can you upload model to google drive or onedrive?

I can't download model from baidu.

KMO based Matcher is not reflected in the code

感谢您公开了代码，仔细阅读了您的代码之后，我发现事实上您的代码中并没有体现原论文中提出的基于KMO的匹配方法，而是直接沿用了DETR的L1距离匹配。
请问作者计划公开这一部分的代码吗？非常感谢！

How to calculate the crowd localization metrics?

Training with NWPU

Hello,
I wanted to train the model on the NWPU dataset and since I am located in Germany I can't download the resized NWPU dataset from baidu - hence I took the 'preprocessing' from the JHU dataset and applied it to the original NWPU dataset. I started the training with the parameters given by your provided log-file, but the model didn't learn anything (MAE and MSE were constant). I tried it with a bigger learning rate, but that gave me an error with the cost matrix (matrix contains invalid numeric entries) when calculating the linear sum assignment (matcher.py line 80 for linear_sum_assignment from scipy.optimize).
I wondered if the resized NWPU dataset you provided on baidu has some other characteristics than applying the JHU preprocessing to the original NWPU? What could possibly be another reason for such a behavior?
Thanks.