gaopengcuhk / stable-pix2seq Goto Github PK
View Code? Open in Web Editor NEWA full-fledged version of Pix2Seq
License: Apache License 2.0
A full-fledged version of Pix2Seq
License: Apache License 2.0
What's the usage of weight[1998]=0.01 and weight[2000]=0.01?
I want to extend the code to the Panoptic Segmentation and wonder how to modify it
Can you share the pretrained weights for evaluation?
In the original paper, learning rate was set to 3e-3 and weight decay was set to 5e-2, why do u use the learning rate 1e-5 and weight decay 1e-4 in the code?
BTW, can u give the NLL_Loss when the model convergences, just for reference. Thanks!
I'm trying to understand the following code:
Stable-Pix2Seq/datasets/coco.py
Lines 23 to 31 in 1258730
This part is also different from the code in DETR.
I'm wondering what's the design principle of transforming two samples.
As I can see that the collate function actually just concatenates them together
Lines 268 to 271 in 1258730
Why the return values here contain two images and targets? Should not it be simply.
if self._transforms is not None: img, target = self._transforms(img, target) return img, target
Thank you for your implementation!!!
Can you share Performance Difference Compared With Original Paper and checkpoint?
Line 64 in 1258730
Why use MLP/FFN during training only ?
Hi ,
Thank you for making this implementation so fast ! How is the difference of this implementation with the reported numbers in the paper ?
您好,请问还在维护这个项目吗
I'm using V100 for experiments, but still out of memory in the middle of the training process. Not sure what would be the reason at this momnet
Namespace(aux_loss=True, backbone='resnet50', batch_size=4, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='./coco2017/', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=1024, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=300, eval=False, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, lr=0.0005, lr_backbone=1e-05, lr_drop=200, mask_loss_coef=1, masks=False, nheads=8, num_queries=100, num_workers=2, output_dir='./output', position_embedding='sine', pre_norm=False, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=8)
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /home/tiger/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth
100%|██████████| 97.8M/97.8M [00:09<00:00, 10.3MB/s]
number of params: 36104659
loading annotations into memory...
Done (t=13.57s)
creating index...
index created!
loading annotations into memory...
Done (t=0.44s)
creating index...
index created!
Start training
Epoch: [0] [ 0/3696] eta: 2:32:25 lr: 0.000100 loss: 7.6000 (7.6000) at: 7.6000 (7.6000) at_unscaled: 7.6000 (7.6000) time: 2.4743 data: 0.5030 max mem: 14737
Epoch: [0] [ 10/3696] eta: 0:59:14 lr: 0.000100 loss: 7.5261 (7.5307) at: 7.5261 (7.5307) at_unscaled: 7.5261 (7.5307) time: 0.9643 data: 0.0806 max mem: 25656
Epoch: [0] [ 20/3696] eta: 0:56:49 lr: 0.000100 loss: 7.4746 (7.4774) at: 7.4746 (7.4774) at_unscaled: 7.4746 (7.4774) time: 0.8501 data: 0.0390 max mem: 25656
Epoch: [0] [ 30/3696] eta: 0:54:22 lr: 0.000100 loss: 7.3449 (7.4215) at: 7.3449 (7.4215) at_unscaled: 7.3449 (7.4215) time: 0.8489 data: 0.0374 max mem: 25656
Epoch: [0] [ 40/3696] eta: 0:54:59 lr: 0.000100 loss: 7.2054 (7.3429) at: 7.2054 (7.3429) at_unscaled: 7.2054 (7.3429) time: 0.8761 data: 0.0356 max mem: 25656
Epoch: [0] [ 50/3696] eta: 0:53:30 lr: 0.000100 loss: 7.0288 (7.2657) at: 7.0288 (7.2657) at_unscaled: 7.0288 (7.2657) time: 0.8662 data: 0.0362 max mem: 25656
Epoch: [0] [ 60/3696] eta: 0:53:44 lr: 0.000100 loss: 6.8423 (7.1774) at: 6.8423 (7.1774) at_unscaled: 6.8423 (7.1774) time: 0.8553 data: 0.0368 max mem: 26623
Epoch: [0] [ 70/3696] eta: 0:53:36 lr: 0.000100 loss: 6.6867 (7.0967) at: 6.6867 (7.0967) at_unscaled: 6.6867 (7.0967) time: 0.9036 data: 0.0359 max mem: 26623
Epoch: [0] [ 80/3696] eta: 0:52:42 lr: 0.000100 loss: 6.5043 (7.0184) at: 6.5043 (7.0184) at_unscaled: 6.5043 (7.0184) time: 0.8368 data: 0.0351 max mem: 26623
Epoch: [0] [ 90/3696] eta: 0:52:17 lr: 0.000100 loss: 6.4531 (6.9577) at: 6.4531 (6.9577) at_unscaled: 6.4531 (6.9577) time: 0.8094 data: 0.0362 max mem: 26623
Epoch: [0] [ 100/3696] eta: 0:51:33 lr: 0.000100 loss: 6.4151 (6.8982) at: 6.4151 (6.8982) at_unscaled: 6.4151 (6.8982) time: 0.8019 data: 0.0386 max mem: 26623
Epoch: [0] [ 110/3696] eta: 0:51:10 lr: 0.000100 loss: 6.3319 (6.8437) at: 6.3319 (6.8437) at_unscaled: 6.3319 (6.8437) time: 0.7937 data: 0.0392 max mem: 26623
Epoch: [0] [ 120/3696] eta: 0:50:56 lr: 0.000100 loss: 6.2714 (6.7969) at: 6.2714 (6.7969) at_unscaled: 6.2714 (6.7969) time: 0.8268 data: 0.0377 max mem: 26623
Epoch: [0] [ 130/3696] eta: 0:50:36 lr: 0.000100 loss: 6.2584 (6.7519) at: 6.2584 (6.7519) at_unscaled: 6.2584 (6.7519) time: 0.8254 data: 0.0372 max mem: 26623
Epoch: [0] [ 140/3696] eta: 0:50:25 lr: 0.000100 loss: 6.2035 (6.7111) at: 6.2035 (6.7111) at_unscaled: 6.2035 (6.7111) time: 0.8266 data: 0.0372 max mem: 29528
Epoch: [0] [ 150/3696] eta: 0:49:55 lr: 0.000100 loss: 6.1476 (6.6716) at: 6.1476 (6.6716) at_unscaled: 6.1476 (6.6716) time: 0.8011 data: 0.0375 max mem: 29528
Epoch: [0] [ 160/3696] eta: 0:49:27 lr: 0.000100 loss: 6.0711 (6.6330) at: 6.0711 (6.6330) at_unscaled: 6.0711 (6.6330) time: 0.7585 data: 0.0372 max mem: 29528
Epoch: [0] [ 170/3696] eta: 0:49:10 lr: 0.000100 loss: 6.0247 (6.5969) at: 6.0247 (6.5969) at_unscaled: 6.0247 (6.5969) time: 0.7769 data: 0.0358 max mem: 29528
Epoch: [0] [ 180/3696] eta: 0:49:27 lr: 0.000100 loss: 5.9822 (6.5631) at: 5.9822 (6.5631) at_unscaled: 5.9822 (6.5631) time: 0.8812 data: 0.0361 max mem: 29528
Epoch: [0] [ 190/3696] eta: 0:49:06 lr: 0.000100 loss: 5.9351 (6.5278) at: 5.9351 (6.5278) at_unscaled: 5.9351 (6.5278) time: 0.8712 data: 0.0371 max mem: 29528
Epoch: [0] [ 200/3696] eta: 0:48:45 lr: 0.000100 loss: 5.8904 (6.4953) at: 5.8904 (6.4953) at_unscaled: 5.8904 (6.4953) time: 0.7744 data: 0.0355 max mem: 29528
Epoch: [0] [ 210/3696] eta: 0:48:35 lr: 0.000100 loss: 5.8645 (6.4635) at: 5.8645 (6.4635) at_unscaled: 5.8645 (6.4635) time: 0.7968 data: 0.0348 max mem: 29528
Epoch: [0] [ 220/3696] eta: 0:48:17 lr: 0.000100 loss: 5.8032 (6.4343) at: 5.8032 (6.4343) at_unscaled: 5.8032 (6.4343) time: 0.7998 data: 0.0354 max mem: 29528
Epoch: [0] [ 230/3696] eta: 0:47:58 lr: 0.000100 loss: 5.7949 (6.4067) at: 5.7949 (6.4067) at_unscaled: 5.7949 (6.4067) time: 0.7687 data: 0.0362 max mem: 29528
Epoch: [0] [ 240/3696] eta: 0:47:45 lr: 0.000100 loss: 5.7568 (6.3776) at: 5.7568 (6.3776) at_unscaled: 5.7568 (6.3776) time: 0.7808 data: 0.0371 max mem: 29528
Epoch: [0] [ 250/3696] eta: 0:47:30 lr: 0.000100 loss: 5.7063 (6.3502) at: 5.7063 (6.3502) at_unscaled: 5.7063 (6.3502) time: 0.7889 data: 0.0366 max mem: 29528
Epoch: [0] [ 260/3696] eta: 0:47:11 lr: 0.000100 loss: 5.6821 (6.3225) at: 5.6821 (6.3225) at_unscaled: 5.6821 (6.3225) time: 0.7617 data: 0.0362 max mem: 29528
Epoch: [0] [ 270/3696] eta: 0:47:00 lr: 0.000100 loss: 5.6091 (6.2965) at: 5.6091 (6.2965) at_unscaled: 5.6091 (6.2965) time: 0.7725 data: 0.0366 max mem: 29528
Epoch: [0] [ 280/3696] eta: 0:46:48 lr: 0.000100 loss: 5.6024 (6.2713) at: 5.6024 (6.2713) at_unscaled: 5.6024 (6.2713) time: 0.7982 data: 0.0366 max mem: 29528
Epoch: [0] [ 290/3696] eta: 0:46:48 lr: 0.000100 loss: 5.5578 (6.2455) at: 5.5578 (6.2455) at_unscaled: 5.5578 (6.2455) time: 0.8433 data: 0.0370 max mem: 29528
Epoch: [0] [ 300/3696] eta: 0:46:36 lr: 0.000100 loss: 5.5396 (6.2221) at: 5.5396 (6.2221) at_unscaled: 5.5396 (6.2221) time: 0.8398 data: 0.0373 max mem: 29528
Epoch: [0] [ 310/3696] eta: 0:46:23 lr: 0.000100 loss: 5.5059 (6.1994) at: 5.5059 (6.1994) at_unscaled: 5.5059 (6.1994) time: 0.7842 data: 0.0374 max mem: 29528
Epoch: [0] [ 320/3696] eta: 0:46:12 lr: 0.000100 loss: 5.4888 (6.1767) at: 5.4888 (6.1767) at_unscaled: 5.4888 (6.1767) time: 0.7882 data: 0.0370 max mem: 29528
Epoch: [0] [ 330/3696] eta: 0:45:58 lr: 0.000100 loss: 5.4756 (6.1560) at: 5.4756 (6.1560) at_unscaled: 5.4756 (6.1560) time: 0.7820 data: 0.0365 max mem: 29528
Epoch: [0] [ 340/3696] eta: 0:45:49 lr: 0.000100 loss: 5.4458 (6.1354) at: 5.4458 (6.1354) at_unscaled: 5.4458 (6.1354) time: 0.7886 data: 0.0363 max mem: 29528
Epoch: [0] [ 350/3696] eta: 0:45:42 lr: 0.000100 loss: 5.4504 (6.1157) at: 5.4504 (6.1157) at_unscaled: 5.4504 (6.1157) time: 0.8230 data: 0.0364 max mem: 29528
Epoch: [0] [ 360/3696] eta: 0:45:34 lr: 0.000100 loss: 5.4683 (6.0973) at: 5.4683 (6.0973) at_unscaled: 5.4683 (6.0973) time: 0.8292 data: 0.0370 max mem: 29528
Epoch: [0] [ 370/3696] eta: 0:45:30 lr: 0.000100 loss: 5.4665 (6.0802) at: 5.4665 (6.0802) at_unscaled: 5.4665 (6.0802) time: 0.8410 data: 0.0357 max mem: 29528
Epoch: [0] [ 380/3696] eta: 0:45:22 lr: 0.000100 loss: 5.4943 (6.0647) at: 5.4943 (6.0647) at_unscaled: 5.4943 (6.0647) time: 0.8443 data: 0.0360 max mem: 29528
Epoch: [0] [ 390/3696] eta: 0:45:13 lr: 0.000100 loss: 5.4801 (6.0489) at: 5.4801 (6.0489) at_unscaled: 5.4801 (6.0489) time: 0.8209 data: 0.0371 max mem: 29528
Epoch: [0] [ 400/3696] eta: 0:45:14 lr: 0.000100 loss: 5.4442 (6.0338) at: 5.4442 (6.0338) at_unscaled: 5.4442 (6.0338) time: 0.8706 data: 0.0372 max mem: 29528
Epoch: [0] [ 410/3696] eta: 0:45:03 lr: 0.000100 loss: 5.4351 (6.0182) at: 5.4351 (6.0182) at_unscaled: 5.4351 (6.0182) time: 0.8613 data: 0.0376 max mem: 29528
Epoch: [0] [ 420/3696] eta: 0:44:50 lr: 0.000100 loss: 5.3845 (6.0028) at: 5.3845 (6.0028) at_unscaled: 5.3845 (6.0028) time: 0.7759 data: 0.0373 max mem: 29528
Epoch: [0] [ 430/3696] eta: 0:45:03 lr: 0.000100 loss: 5.3922 (5.9884) at: 5.3922 (5.9884) at_unscaled: 5.3922 (5.9884) time: 0.9318 data: 0.0361 max mem: 29528
Epoch: [0] [ 440/3696] eta: 0:44:50 lr: 0.000100 loss: 5.4115 (5.9759) at: 5.4115 (5.9759) at_unscaled: 5.4115 (5.9759) time: 0.9331 data: 0.0361 max mem: 29528
Epoch: [0] [ 450/3696] eta: 0:44:43 lr: 0.000100 loss: 5.4180 (5.9631) at: 5.4180 (5.9631) at_unscaled: 5.4180 (5.9631) time: 0.8017 data: 0.0359 max mem: 29528
Epoch: [0] [ 460/3696] eta: 0:44:29 lr: 0.000100 loss: 5.3881 (5.9501) at: 5.3881 (5.9501) at_unscaled: 5.3881 (5.9501) time: 0.7948 data: 0.0355 max mem: 29528
Epoch: [0] [ 470/3696] eta: 0:44:18 lr: 0.000100 loss: 5.3906 (5.9391) at: 5.3906 (5.9391) at_unscaled: 5.3906 (5.9391) time: 0.7668 data: 0.0371 max mem: 29528
Epoch: [0] [ 480/3696] eta: 0:44:10 lr: 0.000100 loss: 5.3906 (5.9277) at: 5.3906 (5.9277) at_unscaled: 5.3906 (5.9277) time: 0.8013 data: 0.0390 max mem: 29528
Epoch: [0] [ 490/3696] eta: 0:44:03 lr: 0.000100 loss: 5.4143 (5.9179) at: 5.4143 (5.9179) at_unscaled: 5.4143 (5.9179) time: 0.8300 data: 0.0391 max mem: 29528
Epoch: [0] [ 500/3696] eta: 0:43:54 lr: 0.000100 loss: 5.4093 (5.9075) at: 5.4093 (5.9075) at_unscaled: 5.4093 (5.9075) time: 0.8303 data: 0.0378 max mem: 29528
Epoch: [0] [ 510/3696] eta: 0:43:43 lr: 0.000100 loss: 5.3890 (5.8972) at: 5.3890 (5.8972) at_unscaled: 5.3890 (5.8972) time: 0.7958 data: 0.0367 max mem: 29528
Epoch: [0] [ 520/3696] eta: 0:43:31 lr: 0.000100 loss: 5.3959 (5.8872) at: 5.3959 (5.8872) at_unscaled: 5.3959 (5.8872) time: 0.7730 data: 0.0355 max mem: 29528
Epoch: [0] [ 530/3696] eta: 0:43:22 lr: 0.000100 loss: 5.3743 (5.8775) at: 5.3743 (5.8775) at_unscaled: 5.3743 (5.8775) time: 0.7915 data: 0.0358 max mem: 29528
Epoch: [0] [ 540/3696] eta: 0:43:12 lr: 0.000100 loss: 5.3725 (5.8675) at: 5.3725 (5.8675) at_unscaled: 5.3725 (5.8675) time: 0.8013 data: 0.0355 max mem: 29528
Epoch: [0] [ 550/3696] eta: 0:43:02 lr: 0.000100 loss: 5.3403 (5.8580) at: 5.3403 (5.8580) at_unscaled: 5.3403 (5.8580) time: 0.7922 data: 0.0349 max mem: 29528
Epoch: [0] [ 560/3696] eta: 0:42:52 lr: 0.000100 loss: 5.3460 (5.8494) at: 5.3460 (5.8494) at_unscaled: 5.3460 (5.8494) time: 0.7893 data: 0.0355 max mem: 29528
Epoch: [0] [ 570/3696] eta: 0:42:43 lr: 0.000100 loss: 5.3509 (5.8408) at: 5.3509 (5.8408) at_unscaled: 5.3509 (5.8408) time: 0.7901 data: 0.0359 max mem: 29528
Epoch: [0] [ 580/3696] eta: 0:42:31 lr: 0.000100 loss: 5.3509 (5.8328) at: 5.3509 (5.8328) at_unscaled: 5.3509 (5.8328) time: 0.7762 data: 0.0358 max mem: 29528
Epoch: [0] [ 590/3696] eta: 0:42:22 lr: 0.000100 loss: 5.3572 (5.8243) at: 5.3572 (5.8243) at_unscaled: 5.3572 (5.8243) time: 0.7785 data: 0.0351 max mem: 29528
Epoch: [0] [ 600/3696] eta: 0:42:11 lr: 0.000100 loss: 5.3541 (5.8163) at: 5.3541 (5.8163) at_unscaled: 5.3541 (5.8163) time: 0.7857 data: 0.0343 max mem: 29528
Epoch: [0] [ 610/3696] eta: 0:41:59 lr: 0.000100 loss: 5.3445 (5.8085) at: 5.3445 (5.8085) at_unscaled: 5.3445 (5.8085) time: 0.7585 data: 0.0351 max mem: 29528
Epoch: [0] [ 620/3696] eta: 0:41:54 lr: 0.000100 loss: 5.3499 (5.8015) at: 5.3499 (5.8015) at_unscaled: 5.3499 (5.8015) time: 0.8055 data: 0.0354 max mem: 29528
Epoch: [0] [ 630/3696] eta: 0:41:42 lr: 0.000100 loss: 5.3499 (5.7940) at: 5.3499 (5.7940) at_unscaled: 5.3499 (5.7940) time: 0.8031 data: 0.0343 max mem: 29528
Epoch: [0] [ 640/3696] eta: 0:41:31 lr: 0.000100 loss: 5.3273 (5.7865) at: 5.3273 (5.7865) at_unscaled: 5.3273 (5.7865) time: 0.7553 data: 0.0356 max mem: 29528
Epoch: [0] [ 650/3696] eta: 0:41:22 lr: 0.000100 loss: 5.3314 (5.7792) at: 5.3314 (5.7792) at_unscaled: 5.3314 (5.7792) time: 0.7825 data: 0.0378 max mem: 29528
Epoch: [0] [ 660/3696] eta: 0:41:16 lr: 0.000100 loss: 5.3259 (5.7719) at: 5.3259 (5.7719) at_unscaled: 5.3259 (5.7719) time: 0.8199 data: 0.0371 max mem: 29528
Epoch: [0] [ 670/3696] eta: 0:41:06 lr: 0.000100 loss: 5.2930 (5.7651) at: 5.2930 (5.7651) at_unscaled: 5.2930 (5.7651) time: 0.8170 data: 0.0351 max mem: 29528
Epoch: [0] [ 680/3696] eta: 0:40:57 lr: 0.000100 loss: 5.2930 (5.7582) at: 5.2930 (5.7582) at_unscaled: 5.2930 (5.7582) time: 0.7851 data: 0.0354 max mem: 29528
Epoch: [0] [ 690/3696] eta: 0:40:49 lr: 0.000100 loss: 5.2727 (5.7514) at: 5.2727 (5.7514) at_unscaled: 5.2727 (5.7514) time: 0.8068 data: 0.0353 max mem: 29528
Epoch: [0] [ 700/3696] eta: 0:40:41 lr: 0.000100 loss: 5.2917 (5.7451) at: 5.2917 (5.7451) at_unscaled: 5.2917 (5.7451) time: 0.8184 data: 0.0348 max mem: 29528
Epoch: [0] [ 710/3696] eta: 0:40:31 lr: 0.000100 loss: 5.2949 (5.7387) at: 5.2949 (5.7387) at_unscaled: 5.2949 (5.7387) time: 0.7904 data: 0.0358 max mem: 29528
Epoch: [0] [ 720/3696] eta: 0:40:21 lr: 0.000100 loss: 5.2874 (5.7325) at: 5.2874 (5.7325) at_unscaled: 5.2874 (5.7325) time: 0.7719 data: 0.0376 max mem: 29528
Epoch: [0] [ 730/3696] eta: 0:40:10 lr: 0.000100 loss: 5.2801 (5.7262) at: 5.2801 (5.7262) at_unscaled: 5.2801 (5.7262) time: 0.7581 data: 0.0372 max mem: 29528
Epoch: [0] [ 740/3696] eta: 0:40:02 lr: 0.000100 loss: 5.2634 (5.7196) at: 5.2634 (5.7196) at_unscaled: 5.2634 (5.7196) time: 0.7769 data: 0.0357 max mem: 29528
Epoch: [0] [ 750/3696] eta: 0:39:53 lr: 0.000100 loss: 5.2367 (5.7135) at: 5.2367 (5.7135) at_unscaled: 5.2367 (5.7135) time: 0.8039 data: 0.0365 max mem: 29528
Epoch: [0] [ 760/3696] eta: 0:39:43 lr: 0.000100 loss: 5.2874 (5.7082) at: 5.2874 (5.7082) at_unscaled: 5.2874 (5.7082) time: 0.7800 data: 0.0367 max mem: 29528
Epoch: [0] [ 770/3696] eta: 0:39:33 lr: 0.000100 loss: 5.2954 (5.7024) at: 5.2954 (5.7024) at_unscaled: 5.2954 (5.7024) time: 0.7681 data: 0.0356 max mem: 29528
Epoch: [0] [ 780/3696] eta: 0:39:23 lr: 0.000100 loss: 5.3127 (5.6975) at: 5.3127 (5.6975) at_unscaled: 5.3127 (5.6975) time: 0.7632 data: 0.0361 max mem: 29528
Epoch: [0] [ 790/3696] eta: 0:39:14 lr: 0.000100 loss: 5.3130 (5.6919) at: 5.3130 (5.6919) at_unscaled: 5.3130 (5.6919) time: 0.7715 data: 0.0359 max mem: 29528
Epoch: [0] [ 800/3696] eta: 0:39:06 lr: 0.000100 loss: 5.2498 (5.6860) at: 5.2498 (5.6860) at_unscaled: 5.2498 (5.6860) time: 0.7954 data: 0.0369 max mem: 29528
Epoch: [0] [ 810/3696] eta: 0:38:58 lr: 0.000100 loss: 5.2336 (5.6804) at: 5.2336 (5.6804) at_unscaled: 5.2336 (5.6804) time: 0.8095 data: 0.0380 max mem: 29528
Epoch: [0] [ 820/3696] eta: 0:38:50 lr: 0.000100 loss: 5.2354 (5.6755) at: 5.2354 (5.6755) at_unscaled: 5.2354 (5.6755) time: 0.8130 data: 0.0356 max mem: 29528
Epoch: [0] [ 830/3696] eta: 0:38:39 lr: 0.000100 loss: 5.2691 (5.6704) at: 5.2691 (5.6704) at_unscaled: 5.2691 (5.6704) time: 0.7757 data: 0.0355 max mem: 29528
Epoch: [0] [ 840/3696] eta: 0:38:31 lr: 0.000100 loss: 5.2588 (5.6653) at: 5.2588 (5.6653) at_unscaled: 5.2588 (5.6653) time: 0.7692 data: 0.0369 max mem: 29528
Epoch: [0] [ 850/3696] eta: 0:38:23 lr: 0.000100 loss: 5.2564 (5.6606) at: 5.2564 (5.6606) at_unscaled: 5.2564 (5.6606) time: 0.8133 data: 0.0363 max mem: 29528
Epoch: [0] [ 860/3696] eta: 0:38:15 lr: 0.000100 loss: 5.2448 (5.6556) at: 5.2448 (5.6556) at_unscaled: 5.2448 (5.6556) time: 0.8129 data: 0.0352 max mem: 29528
Epoch: [0] [ 870/3696] eta: 0:38:05 lr: 0.000100 loss: 5.2326 (5.6506) at: 5.2326 (5.6506) at_unscaled: 5.2326 (5.6506) time: 0.7795 data: 0.0351 max mem: 29528
Epoch: [0] [ 880/3696] eta: 0:37:56 lr: 0.000100 loss: 5.2049 (5.6456) at: 5.2049 (5.6456) at_unscaled: 5.2049 (5.6456) time: 0.7750 data: 0.0364 max mem: 29528
Epoch: [0] [ 890/3696] eta: 0:37:47 lr: 0.000100 loss: 5.2049 (5.6407) at: 5.2049 (5.6407) at_unscaled: 5.2049 (5.6407) time: 0.7812 data: 0.0367 max mem: 29528
Epoch: [0] [ 900/3696] eta: 0:37:37 lr: 0.000100 loss: 5.1690 (5.6354) at: 5.1690 (5.6354) at_unscaled: 5.1690 (5.6354) time: 0.7607 data: 0.0348 max mem: 29528
Epoch: [0] [ 910/3696] eta: 0:37:31 lr: 0.000100 loss: 5.1836 (5.6309) at: 5.1836 (5.6309) at_unscaled: 5.1836 (5.6309) time: 0.8035 data: 0.0355 max mem: 29528
Epoch: [0] [ 920/3696] eta: 0:37:22 lr: 0.000100 loss: 5.2129 (5.6261) at: 5.2129 (5.6261) at_unscaled: 5.2129 (5.6261) time: 0.8221 data: 0.0381 max mem: 29528
Epoch: [0] [ 930/3696] eta: 0:37:13 lr: 0.000100 loss: 5.1586 (5.6210) at: 5.1586 (5.6210) at_unscaled: 5.1586 (5.6210) time: 0.7758 data: 0.0377 max mem: 29528
Epoch: [0] [ 940/3696] eta: 0:37:05 lr: 0.000100 loss: 5.1586 (5.6162) at: 5.1586 (5.6162) at_unscaled: 5.1586 (5.6162) time: 0.7975 data: 0.0355 max mem: 29528
Epoch: [0] [ 950/3696] eta: 0:36:56 lr: 0.000100 loss: 5.1713 (5.6120) at: 5.1713 (5.6120) at_unscaled: 5.1713 (5.6120) time: 0.7970 data: 0.0358 max mem: 29528
Epoch: [0] [ 960/3696] eta: 0:36:47 lr: 0.000100 loss: 5.1839 (5.6077) at: 5.1839 (5.6077) at_unscaled: 5.1839 (5.6077) time: 0.7714 data: 0.0367 max mem: 29528
Epoch: [0] [ 970/3696] eta: 0:36:38 lr: 0.000100 loss: 5.1800 (5.6036) at: 5.1800 (5.6036) at_unscaled: 5.1800 (5.6036) time: 0.7812 data: 0.0363 max mem: 29528
Epoch: [0] [ 980/3696] eta: 0:36:30 lr: 0.000100 loss: 5.2028 (5.5995) at: 5.2028 (5.5995) at_unscaled: 5.2028 (5.5995) time: 0.7996 data: 0.0349 max mem: 29528
Epoch: [0] [ 990/3696] eta: 0:36:23 lr: 0.000100 loss: 5.2028 (5.5954) at: 5.2028 (5.5954) at_unscaled: 5.2028 (5.5954) time: 0.8110 data: 0.0353 max mem: 29528
Epoch: [0] [1000/3696] eta: 0:36:14 lr: 0.000100 loss: 5.1880 (5.5914) at: 5.1880 (5.5914) at_unscaled: 5.1880 (5.5914) time: 0.7950 data: 0.0369 max mem: 29528
Epoch: [0] [1010/3696] eta: 0:36:04 lr: 0.000100 loss: 5.1773 (5.5870) at: 5.1773 (5.5870) at_unscaled: 5.1773 (5.5870) time: 0.7645 data: 0.0368 max mem: 29528
Epoch: [0] [1020/3696] eta: 0:35:57 lr: 0.000100 loss: 5.2493 (5.5836) at: 5.2493 (5.5836) at_unscaled: 5.2493 (5.5836) time: 0.7915 data: 0.0360 max mem: 29528
Epoch: [0] [1030/3696] eta: 0:35:49 lr: 0.000100 loss: 5.1982 (5.5793) at: 5.1982 (5.5793) at_unscaled: 5.1982 (5.5793) time: 0.8164 data: 0.0363 max mem: 29528
Epoch: [0] [1040/3696] eta: 0:35:41 lr: 0.000100 loss: 5.1446 (5.5754) at: 5.1446 (5.5754) at_unscaled: 5.1446 (5.5754) time: 0.8053 data: 0.0375 max mem: 29528
Epoch: [0] [1050/3696] eta: 0:35:31 lr: 0.000100 loss: 5.1319 (5.5714) at: 5.1319 (5.5714) at_unscaled: 5.1319 (5.5714) time: 0.7766 data: 0.0359 max mem: 29528
Epoch: [0] [1060/3696] eta: 0:35:22 lr: 0.000100 loss: 5.2017 (5.5679) at: 5.2017 (5.5679) at_unscaled: 5.2017 (5.5679) time: 0.7481 data: 0.0365 max mem: 29528
Epoch: [0] [1070/3696] eta: 0:35:13 lr: 0.000100 loss: 5.2017 (5.5642) at: 5.2017 (5.5642) at_unscaled: 5.2017 (5.5642) time: 0.7754 data: 0.0387 max mem: 29528
Epoch: [0] [1080/3696] eta: 0:35:03 lr: 0.000100 loss: 5.1192 (5.5603) at: 5.1192 (5.5603) at_unscaled: 5.1192 (5.5603) time: 0.7605 data: 0.0383 max mem: 29528
Epoch: [0] [1090/3696] eta: 0:34:56 lr: 0.000100 loss: 5.1105 (5.5560) at: 5.1105 (5.5560) at_unscaled: 5.1105 (5.5560) time: 0.7700 data: 0.0379 max mem: 29528
Epoch: [0] [1100/3696] eta: 0:34:47 lr: 0.000100 loss: 5.1321 (5.5524) at: 5.1321 (5.5524) at_unscaled: 5.1321 (5.5524) time: 0.8007 data: 0.0380 max mem: 29528
Epoch: [0] [1110/3696] eta: 0:34:39 lr: 0.000100 loss: 5.1603 (5.5489) at: 5.1603 (5.5489) at_unscaled: 5.1603 (5.5489) time: 0.7850 data: 0.0382 max mem: 29528
Epoch: [0] [1120/3696] eta: 0:34:30 lr: 0.000100 loss: 5.1443 (5.5452) at: 5.1443 (5.5452) at_unscaled: 5.1443 (5.5452) time: 0.7765 data: 0.0383 max mem: 29528
Epoch: [0] [1130/3696] eta: 0:34:21 lr: 0.000100 loss: 5.1185 (5.5413) at: 5.1185 (5.5413) at_unscaled: 5.1185 (5.5413) time: 0.7790 data: 0.0372 max mem: 29528
Epoch: [0] [1140/3696] eta: 0:34:13 lr: 0.000100 loss: 5.0800 (5.5374) at: 5.0800 (5.5374) at_unscaled: 5.0800 (5.5374) time: 0.7986 data: 0.0356 max mem: 29528
Epoch: [0] [1150/3696] eta: 0:34:04 lr: 0.000100 loss: 5.1101 (5.5337) at: 5.1101 (5.5337) at_unscaled: 5.1101 (5.5337) time: 0.7654 data: 0.0345 max mem: 29528
Epoch: [0] [1160/3696] eta: 0:33:56 lr: 0.000100 loss: 5.1744 (5.5307) at: 5.1744 (5.5307) at_unscaled: 5.1744 (5.5307) time: 0.7695 data: 0.0344 max mem: 29528
Epoch: [0] [1170/3696] eta: 0:33:47 lr: 0.000100 loss: 5.1829 (5.5277) at: 5.1829 (5.5277) at_unscaled: 5.1829 (5.5277) time: 0.7968 data: 0.0362 max mem: 29528
Epoch: [0] [1180/3696] eta: 0:33:40 lr: 0.000100 loss: 5.1845 (5.5246) at: 5.1845 (5.5246) at_unscaled: 5.1845 (5.5246) time: 0.8120 data: 0.0374 max mem: 29528
Epoch: [0] [1190/3696] eta: 0:33:32 lr: 0.000100 loss: 5.1798 (5.5216) at: 5.1798 (5.5216) at_unscaled: 5.1798 (5.5216) time: 0.8169 data: 0.0371 max mem: 29528
Epoch: [0] [1200/3696] eta: 0:33:23 lr: 0.000100 loss: 5.1929 (5.5188) at: 5.1929 (5.5188) at_unscaled: 5.1929 (5.5188) time: 0.7739 data: 0.0361 max mem: 29528
Epoch: [0] [1210/3696] eta: 0:33:16 lr: 0.000100 loss: 5.1929 (5.5158) at: 5.1929 (5.5158) at_unscaled: 5.1929 (5.5158) time: 0.7985 data: 0.0340 max mem: 29528
Epoch: [0] [1220/3696] eta: 0:33:07 lr: 0.000100 loss: 5.1322 (5.5126) at: 5.1322 (5.5126) at_unscaled: 5.1322 (5.5126) time: 0.8027 data: 0.0350 max mem: 29528
Epoch: [0] [1230/3696] eta: 0:32:59 lr: 0.000100 loss: 5.1595 (5.5096) at: 5.1595 (5.5096) at_unscaled: 5.1595 (5.5096) time: 0.7881 data: 0.0374 max mem: 29528
Epoch: [0] [1240/3696] eta: 0:32:50 lr: 0.000100 loss: 5.1620 (5.5067) at: 5.1620 (5.5067) at_unscaled: 5.1620 (5.5067) time: 0.7849 data: 0.0365 max mem: 29528
Epoch: [0] [1250/3696] eta: 0:32:42 lr: 0.000100 loss: 5.1620 (5.5038) at: 5.1620 (5.5038) at_unscaled: 5.1620 (5.5038) time: 0.7893 data: 0.0357 max mem: 29528
Epoch: [0] [1260/3696] eta: 0:32:34 lr: 0.000100 loss: 5.1245 (5.5005) at: 5.1245 (5.5005) at_unscaled: 5.1245 (5.5005) time: 0.8002 data: 0.0359 max mem: 29528
Epoch: [0] [1270/3696] eta: 0:32:26 lr: 0.000100 loss: 5.1023 (5.4975) at: 5.1023 (5.4975) at_unscaled: 5.1023 (5.4975) time: 0.8015 data: 0.0362 max mem: 29528
Epoch: [0] [1280/3696] eta: 0:32:17 lr: 0.000100 loss: 5.1132 (5.4946) at: 5.1132 (5.4946) at_unscaled: 5.1132 (5.4946) time: 0.7906 data: 0.0349 max mem: 29528
Epoch: [0] [1290/3696] eta: 0:32:09 lr: 0.000100 loss: 5.1292 (5.4918) at: 5.1292 (5.4918) at_unscaled: 5.1292 (5.4918) time: 0.7743 data: 0.0334 max mem: 29528
Epoch: [0] [1300/3696] eta: 0:32:01 lr: 0.000100 loss: 5.1292 (5.4890) at: 5.1292 (5.4890) at_unscaled: 5.1292 (5.4890) time: 0.7875 data: 0.0339 max mem: 29528
Epoch: [0] [1310/3696] eta: 0:31:54 lr: 0.000100 loss: 5.1232 (5.4863) at: 5.1232 (5.4863) at_unscaled: 5.1232 (5.4863) time: 0.8117 data: 0.0343 max mem: 29528
Epoch: [0] [1320/3696] eta: 0:31:45 lr: 0.000100 loss: 5.1016 (5.4832) at: 5.1016 (5.4832) at_unscaled: 5.1016 (5.4832) time: 0.8161 data: 0.0341 max mem: 29528
Epoch: [0] [1330/3696] eta: 0:31:38 lr: 0.000100 loss: 5.0905 (5.4805) at: 5.0905 (5.4805) at_unscaled: 5.0905 (5.4805) time: 0.8149 data: 0.0343 max mem: 29528
Traceback (most recent call last):
File "main.py", line 257, in <module>
main(args)
File "main.py", line 207, in main
args.clip_max_norm, learning_rate_schedule)
File "/opt/tiger/intro/Stable-Pix2Seq/engine.py", line 98, in train_one_epoch
losses.backward()
File "/home/tiger/.local/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/tiger/.local/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 216.00 MiB (GPU 7; 31.75 GiB total capacity; 29.63 GiB already allocated; 213.75 MiB free; 29.95 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/tiger/.local/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in <module>
main()
File "/home/tiger/.local/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/tiger/.local/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'main.py', '--coco_path', './coco2017/', '--batch_size', '4', '--lr', '0.0005', '--output_dir', './output']' returned non-zero exit status 1.
Killing subprocess 5627
Killing subprocess 5628
Killing subprocess 5629
Killing subprocess 5630
Killing subprocess 5631
Killing subprocess 5632
Killing subprocess 5633
Is there any trained model for testing?
Thank you!
How to do visualization analysis? Could you share some scripts?
Thanks
Thank you for your work, I have a question about sequence embedding. The screenshot is from transformer.py
When you get sequence embedding, the position embedding has already been added to sequence embedding as fllows:
Why do you input the same position embedding into decoder layer ? After this operation, position embedding is added to sequence embedding twice.
I was training 'Stable Pix2Seq', everything goes fine until the 3rd training epoch. I wonder if there's any accumulate operation or some tensors or variable should have been deleted.
can you share the command line when training panoptic seg. I using the command:
python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --coco_path ... --coco_panoptic_path ... --masks
but, there are some error.
I try to use this code for instance segmentation。I using the command :python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --coco_path ... --masks
But I have encountered some errors. Does this code still not support segmentation tasks?
best regards,
Zhao
I want to extract the token embedding as shown in figure 11 of the paper.
However, when looking at the code, I see that the tokens are predicted by feeding the output feature map to a mlp whose last layer's dimension is 2003 (maybe number of tokens). Hence, the model do not learn the token embedding actually and we can't get the learned token embedding.
Am I missing something ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.