gaopengcuhk / stable-pix2seq Goto Github PK

View Code? Open in Web Editor NEW

235.0 235.0 20.0 132 KB

A full-fledged version of Pix2Seq

License: Apache License 2.0

Python 100.00%

stable-pix2seq's People

Contributors

Stargazers

Watchers

stable-pix2seq's Issues

About the details of the code

What's the usage of weight[1998]=0.01 and weight[2000]=0.01？
I want to extend the code to the Panoptic Segmentation and wonder how to modify it

Pretrained models

Can you share the pretrained weights for evaluation?

about settings of learning rate

In the original paper, learning rate was set to 3e-3 and weight decay was set to 5e-2, why do u use the learning rate 1e-5 and weight decay 1e-4 in the code?
BTW, can u give the NLL_Loss when the model convergences, just for reference. Thanks!

Why do we want to create two samples in `get_item`

I'm trying to understand the following code:

Stable-Pix2Seq/datasets/coco.py

Lines 23 to 31 in 1258730

    
           def __getitem__(self, idx): 
        
               img, target = super(CocoDetection, self).__getitem__(idx) 
        
               image_id = self.ids[idx] 
        
               target = {'image_id': image_id, 'annotations': target} 
        
               img, target = self.prepare(img, target) 
        
               if self._transforms is not None: 
        
                   img1, target1 = self._transforms(img, target) 
        
                   img2, target2 = self._transforms(img, target) 
        
               return img1, img2, target1, target2

This part is also different from the code in DETR.
I'm wondering what's the design principle of transforming two samples.

As I can see that the collate function actually just concatenates them together

Stable-Pix2Seq/util/misc.py

Lines 268 to 271 in 1258730

    
           def collate_fn(batch): 
        
               batch = list(zip(*batch)) 
        
               batch[0] = nested_tensor_from_tensor_list(batch[0] + batch[1]) 
        
               return tuple([batch[0], batch[2] + batch[3]])

Return values in the coco.py __get_item__ method

Why the return values here contain two images and targets? Should not it be simply.
if self._transforms is not None: img, target = self._transforms(img, target) return img, target

hello, Can you explain what is the causal mask in the paper?I am quite confused,thanks.

Performance Difference Compared With Original Paper？

Thank you for your implementation!!!
Can you share Performance Difference Compared With Original Paper and checkpoint?

Why use MLP/FFN during training only ?

Stable-Pix2Seq/models/detr.py

Line 64 in 1258730

if self.training:

Why use MLP/FFN during training only ?

Performance compared to paper

Hi ,

Thank you for making this implementation so fast ! How is the difference of this implementation with the reported numbers in the paper ?

您好，还在维护这个项目吗

您好，请问还在维护这个项目吗

CUDA Out-of-memory using V100

I'm using V100 for experiments, but still out of memory in the middle of the training process. Not sure what would be the reason at this momnet

Namespace(aux_loss=True, backbone='resnet50', batch_size=4, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='./coco2017/', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=1024, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=300, eval=False, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, lr=0.0005, lr_backbone=1e-05, lr_drop=200, mask_loss_coef=1, masks=False, nheads=8, num_queries=100, num_workers=2, output_dir='./output', position_embedding='sine', pre_norm=False, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=8)
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /home/tiger/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth
100%|██████████| 97.8M/97.8M [00:09<00:00, 10.3MB/s]
number of params: 36104659
loading annotations into memory...
Done (t=13.57s)
creating index...
index created!
loading annotations into memory...
Done (t=0.44s)
creating index...
index created!
Start training
Epoch: [0]  [   0/3696]  eta: 2:32:25  lr: 0.000100  loss: 7.6000 (7.6000)  at: 7.6000 (7.6000)  at_unscaled: 7.6000 (7.6000)  time: 2.4743  data: 0.5030  max mem: 14737
Epoch: [0]  [  10/3696]  eta: 0:59:14  lr: 0.000100  loss: 7.5261 (7.5307)  at: 7.5261 (7.5307)  at_unscaled: 7.5261 (7.5307)  time: 0.9643  data: 0.0806  max mem: 25656
Epoch: [0]  [  20/3696]  eta: 0:56:49  lr: 0.000100  loss: 7.4746 (7.4774)  at: 7.4746 (7.4774)  at_unscaled: 7.4746 (7.4774)  time: 0.8501  data: 0.0390  max mem: 25656
Epoch: [0]  [  30/3696]  eta: 0:54:22  lr: 0.000100  loss: 7.3449 (7.4215)  at: 7.3449 (7.4215)  at_unscaled: 7.3449 (7.4215)  time: 0.8489  data: 0.0374  max mem: 25656
Epoch: [0]  [  40/3696]  eta: 0:54:59  lr: 0.000100  loss: 7.2054 (7.3429)  at: 7.2054 (7.3429)  at_unscaled: 7.2054 (7.3429)  time: 0.8761  data: 0.0356  max mem: 25656
Epoch: [0]  [  50/3696]  eta: 0:53:30  lr: 0.000100  loss: 7.0288 (7.2657)  at: 7.0288 (7.2657)  at_unscaled: 7.0288 (7.2657)  time: 0.8662  data: 0.0362  max mem: 25656
Epoch: [0]  [  60/3696]  eta: 0:53:44  lr: 0.000100  loss: 6.8423 (7.1774)  at: 6.8423 (7.1774)  at_unscaled: 6.8423 (7.1774)  time: 0.8553  data: 0.0368  max mem: 26623
Epoch: [0]  [  70/3696]  eta: 0:53:36  lr: 0.000100  loss: 6.6867 (7.0967)  at: 6.6867 (7.0967)  at_unscaled: 6.6867 (7.0967)  time: 0.9036  data: 0.0359  max mem: 26623
Epoch: [0]  [  80/3696]  eta: 0:52:42  lr: 0.000100  loss: 6.5043 (7.0184)  at: 6.5043 (7.0184)  at_unscaled: 6.5043 (7.0184)  time: 0.8368  data: 0.0351  max mem: 26623
Epoch: [0]  [  90/3696]  eta: 0:52:17  lr: 0.000100  loss: 6.4531 (6.9577)  at: 6.4531 (6.9577)  at_unscaled: 6.4531 (6.9577)  time: 0.8094  data: 0.0362  max mem: 26623
Epoch: [0]  [ 100/3696]  eta: 0:51:33  lr: 0.000100  loss: 6.4151 (6.8982)  at: 6.4151 (6.8982)  at_unscaled: 6.4151 (6.8982)  time: 0.8019  data: 0.0386  max mem: 26623
Epoch: [0]  [ 110/3696]  eta: 0:51:10  lr: 0.000100  loss: 6.3319 (6.8437)  at: 6.3319 (6.8437)  at_unscaled: 6.3319 (6.8437)  time: 0.7937  data: 0.0392  max mem: 26623
Epoch: [0]  [ 120/3696]  eta: 0:50:56  lr: 0.000100  loss: 6.2714 (6.7969)  at: 6.2714 (6.7969)  at_unscaled: 6.2714 (6.7969)  time: 0.8268  data: 0.0377  max mem: 26623
Epoch: [0]  [ 130/3696]  eta: 0:50:36  lr: 0.000100  loss: 6.2584 (6.7519)  at: 6.2584 (6.7519)  at_unscaled: 6.2584 (6.7519)  time: 0.8254  data: 0.0372  max mem: 26623
Epoch: [0]  [ 140/3696]  eta: 0:50:25  lr: 0.000100  loss: 6.2035 (6.7111)  at: 6.2035 (6.7111)  at_unscaled: 6.2035 (6.7111)  time: 0.8266  data: 0.0372  max mem: 29528
Epoch: [0]  [ 150/3696]  eta: 0:49:55  lr: 0.000100  loss: 6.1476 (6.6716)  at: 6.1476 (6.6716)  at_unscaled: 6.1476 (6.6716)  time: 0.8011  data: 0.0375  max mem: 29528
Epoch: [0]  [ 160/3696]  eta: 0:49:27  lr: 0.000100  loss: 6.0711 (6.6330)  at: 6.0711 (6.6330)  at_unscaled: 6.0711 (6.6330)  time: 0.7585  data: 0.0372  max mem: 29528
Epoch: [0]  [ 170/3696]  eta: 0:49:10  lr: 0.000100  loss: 6.0247 (6.5969)  at: 6.0247 (6.5969)  at_unscaled: 6.0247 (6.5969)  time: 0.7769  data: 0.0358  max mem: 29528
Epoch: [0]  [ 180/3696]  eta: 0:49:27  lr: 0.000100  loss: 5.9822 (6.5631)  at: 5.9822 (6.5631)  at_unscaled: 5.9822 (6.5631)  time: 0.8812  data: 0.0361  max mem: 29528
Epoch: [0]  [ 190/3696]  eta: 0:49:06  lr: 0.000100  loss: 5.9351 (6.5278)  at: 5.9351 (6.5278)  at_unscaled: 5.9351 (6.5278)  time: 0.8712  data: 0.0371  max mem: 29528
Epoch: [0]  [ 200/3696]  eta: 0:48:45  lr: 0.000100  loss: 5.8904 (6.4953)  at: 5.8904 (6.4953)  at_unscaled: 5.8904 (6.4953)  time: 0.7744  data: 0.0355  max mem: 29528
Epoch: [0]  [ 210/3696]  eta: 0:48:35  lr: 0.000100  loss: 5.8645 (6.4635)  at: 5.8645 (6.4635)  at_unscaled: 5.8645 (6.4635)  time: 0.7968  data: 0.0348  max mem: 29528
Epoch: [0]  [ 220/3696]  eta: 0:48:17  lr: 0.000100  loss: 5.8032 (6.4343)  at: 5.8032 (6.4343)  at_unscaled: 5.8032 (6.4343)  time: 0.7998  data: 0.0354  max mem: 29528
Epoch: [0]  [ 230/3696]  eta: 0:47:58  lr: 0.000100  loss: 5.7949 (6.4067)  at: 5.7949 (6.4067)  at_unscaled: 5.7949 (6.4067)  time: 0.7687  data: 0.0362  max mem: 29528
Epoch: [0]  [ 240/3696]  eta: 0:47:45  lr: 0.000100  loss: 5.7568 (6.3776)  at: 5.7568 (6.3776)  at_unscaled: 5.7568 (6.3776)  time: 0.7808  data: 0.0371  max mem: 29528
Epoch: [0]  [ 250/3696]  eta: 0:47:30  lr: 0.000100  loss: 5.7063 (6.3502)  at: 5.7063 (6.3502)  at_unscaled: 5.7063 (6.3502)  time: 0.7889  data: 0.0366  max mem: 29528
Epoch: [0]  [ 260/3696]  eta: 0:47:11  lr: 0.000100  loss: 5.6821 (6.3225)  at: 5.6821 (6.3225)  at_unscaled: 5.6821 (6.3225)  time: 0.7617  data: 0.0362  max mem: 29528
Epoch: [0]  [ 270/3696]  eta: 0:47:00  lr: 0.000100  loss: 5.6091 (6.2965)  at: 5.6091 (6.2965)  at_unscaled: 5.6091 (6.2965)  time: 0.7725  data: 0.0366  max mem: 29528
Epoch: [0]  [ 280/3696]  eta: 0:46:48  lr: 0.000100  loss: 5.6024 (6.2713)  at: 5.6024 (6.2713)  at_unscaled: 5.6024 (6.2713)  time: 0.7982  data: 0.0366  max mem: 29528
Epoch: [0]  [ 290/3696]  eta: 0:46:48  lr: 0.000100  loss: 5.5578 (6.2455)  at: 5.5578 (6.2455)  at_unscaled: 5.5578 (6.2455)  time: 0.8433  data: 0.0370  max mem: 29528
Epoch: [0]  [ 300/3696]  eta: 0:46:36  lr: 0.000100  loss: 5.5396 (6.2221)  at: 5.5396 (6.2221)  at_unscaled: 5.5396 (6.2221)  time: 0.8398  data: 0.0373  max mem: 29528
Epoch: [0]  [ 310/3696]  eta: 0:46:23  lr: 0.000100  loss: 5.5059 (6.1994)  at: 5.5059 (6.1994)  at_unscaled: 5.5059 (6.1994)  time: 0.7842  data: 0.0374  max mem: 29528
Epoch: [0]  [ 320/3696]  eta: 0:46:12  lr: 0.000100  loss: 5.4888 (6.1767)  at: 5.4888 (6.1767)  at_unscaled: 5.4888 (6.1767)  time: 0.7882  data: 0.0370  max mem: 29528
Epoch: [0]  [ 330/3696]  eta: 0:45:58  lr: 0.000100  loss: 5.4756 (6.1560)  at: 5.4756 (6.1560)  at_unscaled: 5.4756 (6.1560)  time: 0.7820  data: 0.0365  max mem: 29528
Epoch: [0]  [ 340/3696]  eta: 0:45:49  lr: 0.000100  loss: 5.4458 (6.1354)  at: 5.4458 (6.1354)  at_unscaled: 5.4458 (6.1354)  time: 0.7886  data: 0.0363  max mem: 29528
Epoch: [0]  [ 350/3696]  eta: 0:45:42  lr: 0.000100  loss: 5.4504 (6.1157)  at: 5.4504 (6.1157)  at_unscaled: 5.4504 (6.1157)  time: 0.8230  data: 0.0364  max mem: 29528
Epoch: [0]  [ 360/3696]  eta: 0:45:34  lr: 0.000100  loss: 5.4683 (6.0973)  at: 5.4683 (6.0973)  at_unscaled: 5.4683 (6.0973)  time: 0.8292  data: 0.0370  max mem: 29528
Epoch: [0]  [ 370/3696]  eta: 0:45:30  lr: 0.000100  loss: 5.4665 (6.0802)  at: 5.4665 (6.0802)  at_unscaled: 5.4665 (6.0802)  time: 0.8410  data: 0.0357  max mem: 29528
Epoch: [0]  [ 380/3696]  eta: 0:45:22  lr: 0.000100  loss: 5.4943 (6.0647)  at: 5.4943 (6.0647)  at_unscaled: 5.4943 (6.0647)  time: 0.8443  data: 0.0360  max mem: 29528
Epoch: [0]  [ 390/3696]  eta: 0:45:13  lr: 0.000100  loss: 5.4801 (6.0489)  at: 5.4801 (6.0489)  at_unscaled: 5.4801 (6.0489)  time: 0.8209  data: 0.0371  max mem: 29528
Epoch: [0]  [ 400/3696]  eta: 0:45:14  lr: 0.000100  loss: 5.4442 (6.0338)  at: 5.4442 (6.0338)  at_unscaled: 5.4442 (6.0338)  time: 0.8706  data: 0.0372  max mem: 29528
Epoch: [0]  [ 410/3696]  eta: 0:45:03  lr: 0.000100  loss: 5.4351 (6.0182)  at: 5.4351 (6.0182)  at_unscaled: 5.4351 (6.0182)  time: 0.8613  data: 0.0376  max mem: 29528
Epoch: [0]  [ 420/3696]  eta: 0:44:50  lr: 0.000100  loss: 5.3845 (6.0028)  at: 5.3845 (6.0028)  at_unscaled: 5.3845 (6.0028)  time: 0.7759  data: 0.0373  max mem: 29528
Epoch: [0]  [ 430/3696]  eta: 0:45:03  lr: 0.000100  loss: 5.3922 (5.9884)  at: 5.3922 (5.9884)  at_unscaled: 5.3922 (5.9884)  time: 0.9318  data: 0.0361  max mem: 29528
Epoch: [0]  [ 440/3696]  eta: 0:44:50  lr: 0.000100  loss: 5.4115 (5.9759)  at: 5.4115 (5.9759)  at_unscaled: 5.4115 (5.9759)  time: 0.9331  data: 0.0361  max mem: 29528
Epoch: [0]  [ 450/3696]  eta: 0:44:43  lr: 0.000100  loss: 5.4180 (5.9631)  at: 5.4180 (5.9631)  at_unscaled: 5.4180 (5.9631)  time: 0.8017  data: 0.0359  max mem: 29528
Epoch: [0]  [ 460/3696]  eta: 0:44:29  lr: 0.000100  loss: 5.3881 (5.9501)  at: 5.3881 (5.9501)  at_unscaled: 5.3881 (5.9501)  time: 0.7948  data: 0.0355  max mem: 29528
Epoch: [0]  [ 470/3696]  eta: 0:44:18  lr: 0.000100  loss: 5.3906 (5.9391)  at: 5.3906 (5.9391)  at_unscaled: 5.3906 (5.9391)  time: 0.7668  data: 0.0371  max mem: 29528
Epoch: [0]  [ 480/3696]  eta: 0:44:10  lr: 0.000100  loss: 5.3906 (5.9277)  at: 5.3906 (5.9277)  at_unscaled: 5.3906 (5.9277)  time: 0.8013  data: 0.0390  max mem: 29528
Epoch: [0]  [ 490/3696]  eta: 0:44:03  lr: 0.000100  loss: 5.4143 (5.9179)  at: 5.4143 (5.9179)  at_unscaled: 5.4143 (5.9179)  time: 0.8300  data: 0.0391  max mem: 29528
Epoch: [0]  [ 500/3696]  eta: 0:43:54  lr: 0.000100  loss: 5.4093 (5.9075)  at: 5.4093 (5.9075)  at_unscaled: 5.4093 (5.9075)  time: 0.8303  data: 0.0378  max mem: 29528
Epoch: [0]  [ 510/3696]  eta: 0:43:43  lr: 0.000100  loss: 5.3890 (5.8972)  at: 5.3890 (5.8972)  at_unscaled: 5.3890 (5.8972)  time: 0.7958  data: 0.0367  max mem: 29528
Epoch: [0]  [ 520/3696]  eta: 0:43:31  lr: 0.000100  loss: 5.3959 (5.8872)  at: 5.3959 (5.8872)  at_unscaled: 5.3959 (5.8872)  time: 0.7730  data: 0.0355  max mem: 29528
Epoch: [0]  [ 530/3696]  eta: 0:43:22  lr: 0.000100  loss: 5.3743 (5.8775)  at: 5.3743 (5.8775)  at_unscaled: 5.3743 (5.8775)  time: 0.7915  data: 0.0358  max mem: 29528
Epoch: [0]  [ 540/3696]  eta: 0:43:12  lr: 0.000100  loss: 5.3725 (5.8675)  at: 5.3725 (5.8675)  at_unscaled: 5.3725 (5.8675)  time: 0.8013  data: 0.0355  max mem: 29528
Epoch: [0]  [ 550/3696]  eta: 0:43:02  lr: 0.000100  loss: 5.3403 (5.8580)  at: 5.3403 (5.8580)  at_unscaled: 5.3403 (5.8580)  time: 0.7922  data: 0.0349  max mem: 29528
Epoch: [0]  [ 560/3696]  eta: 0:42:52  lr: 0.000100  loss: 5.3460 (5.8494)  at: 5.3460 (5.8494)  at_unscaled: 5.3460 (5.8494)  time: 0.7893  data: 0.0355  max mem: 29528
Epoch: [0]  [ 570/3696]  eta: 0:42:43  lr: 0.000100  loss: 5.3509 (5.8408)  at: 5.3509 (5.8408)  at_unscaled: 5.3509 (5.8408)  time: 0.7901  data: 0.0359  max mem: 29528
Epoch: [0]  [ 580/3696]  eta: 0:42:31  lr: 0.000100  loss: 5.3509 (5.8328)  at: 5.3509 (5.8328)  at_unscaled: 5.3509 (5.8328)  time: 0.7762  data: 0.0358  max mem: 29528
Epoch: [0]  [ 590/3696]  eta: 0:42:22  lr: 0.000100  loss: 5.3572 (5.8243)  at: 5.3572 (5.8243)  at_unscaled: 5.3572 (5.8243)  time: 0.7785  data: 0.0351  max mem: 29528
Epoch: [0]  [ 600/3696]  eta: 0:42:11  lr: 0.000100  loss: 5.3541 (5.8163)  at: 5.3541 (5.8163)  at_unscaled: 5.3541 (5.8163)  time: 0.7857  data: 0.0343  max mem: 29528
Epoch: [0]  [ 610/3696]  eta: 0:41:59  lr: 0.000100  loss: 5.3445 (5.8085)  at: 5.3445 (5.8085)  at_unscaled: 5.3445 (5.8085)  time: 0.7585  data: 0.0351  max mem: 29528
Epoch: [0]  [ 620/3696]  eta: 0:41:54  lr: 0.000100  loss: 5.3499 (5.8015)  at: 5.3499 (5.8015)  at_unscaled: 5.3499 (5.8015)  time: 0.8055  data: 0.0354  max mem: 29528
Epoch: [0]  [ 630/3696]  eta: 0:41:42  lr: 0.000100  loss: 5.3499 (5.7940)  at: 5.3499 (5.7940)  at_unscaled: 5.3499 (5.7940)  time: 0.8031  data: 0.0343  max mem: 29528
Epoch: [0]  [ 640/3696]  eta: 0:41:31  lr: 0.000100  loss: 5.3273 (5.7865)  at: 5.3273 (5.7865)  at_unscaled: 5.3273 (5.7865)  time: 0.7553  data: 0.0356  max mem: 29528
Epoch: [0]  [ 650/3696]  eta: 0:41:22  lr: 0.000100  loss: 5.3314 (5.7792)  at: 5.3314 (5.7792)  at_unscaled: 5.3314 (5.7792)  time: 0.7825  data: 0.0378  max mem: 29528
Epoch: [0]  [ 660/3696]  eta: 0:41:16  lr: 0.000100  loss: 5.3259 (5.7719)  at: 5.3259 (5.7719)  at_unscaled: 5.3259 (5.7719)  time: 0.8199  data: 0.0371  max mem: 29528
Epoch: [0]  [ 670/3696]  eta: 0:41:06  lr: 0.000100  loss: 5.2930 (5.7651)  at: 5.2930 (5.7651)  at_unscaled: 5.2930 (5.7651)  time: 0.8170  data: 0.0351  max mem: 29528
Epoch: [0]  [ 680/3696]  eta: 0:40:57  lr: 0.000100  loss: 5.2930 (5.7582)  at: 5.2930 (5.7582)  at_unscaled: 5.2930 (5.7582)  time: 0.7851  data: 0.0354  max mem: 29528
Epoch: [0]  [ 690/3696]  eta: 0:40:49  lr: 0.000100  loss: 5.2727 (5.7514)  at: 5.2727 (5.7514)  at_unscaled: 5.2727 (5.7514)  time: 0.8068  data: 0.0353  max mem: 29528
Epoch: [0]  [ 700/3696]  eta: 0:40:41  lr: 0.000100  loss: 5.2917 (5.7451)  at: 5.2917 (5.7451)  at_unscaled: 5.2917 (5.7451)  time: 0.8184  data: 0.0348  max mem: 29528
Epoch: [0]  [ 710/3696]  eta: 0:40:31  lr: 0.000100  loss: 5.2949 (5.7387)  at: 5.2949 (5.7387)  at_unscaled: 5.2949 (5.7387)  time: 0.7904  data: 0.0358  max mem: 29528
Epoch: [0]  [ 720/3696]  eta: 0:40:21  lr: 0.000100  loss: 5.2874 (5.7325)  at: 5.2874 (5.7325)  at_unscaled: 5.2874 (5.7325)  time: 0.7719  data: 0.0376  max mem: 29528
Epoch: [0]  [ 730/3696]  eta: 0:40:10  lr: 0.000100  loss: 5.2801 (5.7262)  at: 5.2801 (5.7262)  at_unscaled: 5.2801 (5.7262)  time: 0.7581  data: 0.0372  max mem: 29528
Epoch: [0]  [ 740/3696]  eta: 0:40:02  lr: 0.000100  loss: 5.2634 (5.7196)  at: 5.2634 (5.7196)  at_unscaled: 5.2634 (5.7196)  time: 0.7769  data: 0.0357  max mem: 29528
Epoch: [0]  [ 750/3696]  eta: 0:39:53  lr: 0.000100  loss: 5.2367 (5.7135)  at: 5.2367 (5.7135)  at_unscaled: 5.2367 (5.7135)  time: 0.8039  data: 0.0365  max mem: 29528
Epoch: [0]  [ 760/3696]  eta: 0:39:43  lr: 0.000100  loss: 5.2874 (5.7082)  at: 5.2874 (5.7082)  at_unscaled: 5.2874 (5.7082)  time: 0.7800  data: 0.0367  max mem: 29528
Epoch: [0]  [ 770/3696]  eta: 0:39:33  lr: 0.000100  loss: 5.2954 (5.7024)  at: 5.2954 (5.7024)  at_unscaled: 5.2954 (5.7024)  time: 0.7681  data: 0.0356  max mem: 29528
Epoch: [0]  [ 780/3696]  eta: 0:39:23  lr: 0.000100  loss: 5.3127 (5.6975)  at: 5.3127 (5.6975)  at_unscaled: 5.3127 (5.6975)  time: 0.7632  data: 0.0361  max mem: 29528
Epoch: [0]  [ 790/3696]  eta: 0:39:14  lr: 0.000100  loss: 5.3130 (5.6919)  at: 5.3130 (5.6919)  at_unscaled: 5.3130 (5.6919)  time: 0.7715  data: 0.0359  max mem: 29528
Epoch: [0]  [ 800/3696]  eta: 0:39:06  lr: 0.000100  loss: 5.2498 (5.6860)  at: 5.2498 (5.6860)  at_unscaled: 5.2498 (5.6860)  time: 0.7954  data: 0.0369  max mem: 29528
Epoch: [0]  [ 810/3696]  eta: 0:38:58  lr: 0.000100  loss: 5.2336 (5.6804)  at: 5.2336 (5.6804)  at_unscaled: 5.2336 (5.6804)  time: 0.8095  data: 0.0380  max mem: 29528
Epoch: [0]  [ 820/3696]  eta: 0:38:50  lr: 0.000100  loss: 5.2354 (5.6755)  at: 5.2354 (5.6755)  at_unscaled: 5.2354 (5.6755)  time: 0.8130  data: 0.0356  max mem: 29528
Epoch: [0]  [ 830/3696]  eta: 0:38:39  lr: 0.000100  loss: 5.2691 (5.6704)  at: 5.2691 (5.6704)  at_unscaled: 5.2691 (5.6704)  time: 0.7757  data: 0.0355  max mem: 29528
Epoch: [0]  [ 840/3696]  eta: 0:38:31  lr: 0.000100  loss: 5.2588 (5.6653)  at: 5.2588 (5.6653)  at_unscaled: 5.2588 (5.6653)  time: 0.7692  data: 0.0369  max mem: 29528
Epoch: [0]  [ 850/3696]  eta: 0:38:23  lr: 0.000100  loss: 5.2564 (5.6606)  at: 5.2564 (5.6606)  at_unscaled: 5.2564 (5.6606)  time: 0.8133  data: 0.0363  max mem: 29528
Epoch: [0]  [ 860/3696]  eta: 0:38:15  lr: 0.000100  loss: 5.2448 (5.6556)  at: 5.2448 (5.6556)  at_unscaled: 5.2448 (5.6556)  time: 0.8129  data: 0.0352  max mem: 29528
Epoch: [0]  [ 870/3696]  eta: 0:38:05  lr: 0.000100  loss: 5.2326 (5.6506)  at: 5.2326 (5.6506)  at_unscaled: 5.2326 (5.6506)  time: 0.7795  data: 0.0351  max mem: 29528
Epoch: [0]  [ 880/3696]  eta: 0:37:56  lr: 0.000100  loss: 5.2049 (5.6456)  at: 5.2049 (5.6456)  at_unscaled: 5.2049 (5.6456)  time: 0.7750  data: 0.0364  max mem: 29528
Epoch: [0]  [ 890/3696]  eta: 0:37:47  lr: 0.000100  loss: 5.2049 (5.6407)  at: 5.2049 (5.6407)  at_unscaled: 5.2049 (5.6407)  time: 0.7812  data: 0.0367  max mem: 29528
Epoch: [0]  [ 900/3696]  eta: 0:37:37  lr: 0.000100  loss: 5.1690 (5.6354)  at: 5.1690 (5.6354)  at_unscaled: 5.1690 (5.6354)  time: 0.7607  data: 0.0348  max mem: 29528
Epoch: [0]  [ 910/3696]  eta: 0:37:31  lr: 0.000100  loss: 5.1836 (5.6309)  at: 5.1836 (5.6309)  at_unscaled: 5.1836 (5.6309)  time: 0.8035  data: 0.0355  max mem: 29528
Epoch: [0]  [ 920/3696]  eta: 0:37:22  lr: 0.000100  loss: 5.2129 (5.6261)  at: 5.2129 (5.6261)  at_unscaled: 5.2129 (5.6261)  time: 0.8221  data: 0.0381  max mem: 29528
Epoch: [0]  [ 930/3696]  eta: 0:37:13  lr: 0.000100  loss: 5.1586 (5.6210)  at: 5.1586 (5.6210)  at_unscaled: 5.1586 (5.6210)  time: 0.7758  data: 0.0377  max mem: 29528
Epoch: [0]  [ 940/3696]  eta: 0:37:05  lr: 0.000100  loss: 5.1586 (5.6162)  at: 5.1586 (5.6162)  at_unscaled: 5.1586 (5.6162)  time: 0.7975  data: 0.0355  max mem: 29528
Epoch: [0]  [ 950/3696]  eta: 0:36:56  lr: 0.000100  loss: 5.1713 (5.6120)  at: 5.1713 (5.6120)  at_unscaled: 5.1713 (5.6120)  time: 0.7970  data: 0.0358  max mem: 29528
Epoch: [0]  [ 960/3696]  eta: 0:36:47  lr: 0.000100  loss: 5.1839 (5.6077)  at: 5.1839 (5.6077)  at_unscaled: 5.1839 (5.6077)  time: 0.7714  data: 0.0367  max mem: 29528
Epoch: [0]  [ 970/3696]  eta: 0:36:38  lr: 0.000100  loss: 5.1800 (5.6036)  at: 5.1800 (5.6036)  at_unscaled: 5.1800 (5.6036)  time: 0.7812  data: 0.0363  max mem: 29528
Epoch: [0]  [ 980/3696]  eta: 0:36:30  lr: 0.000100  loss: 5.2028 (5.5995)  at: 5.2028 (5.5995)  at_unscaled: 5.2028 (5.5995)  time: 0.7996  data: 0.0349  max mem: 29528
Epoch: [0]  [ 990/3696]  eta: 0:36:23  lr: 0.000100  loss: 5.2028 (5.5954)  at: 5.2028 (5.5954)  at_unscaled: 5.2028 (5.5954)  time: 0.8110  data: 0.0353  max mem: 29528
Epoch: [0]  [1000/3696]  eta: 0:36:14  lr: 0.000100  loss: 5.1880 (5.5914)  at: 5.1880 (5.5914)  at_unscaled: 5.1880 (5.5914)  time: 0.7950  data: 0.0369  max mem: 29528
Epoch: [0]  [1010/3696]  eta: 0:36:04  lr: 0.000100  loss: 5.1773 (5.5870)  at: 5.1773 (5.5870)  at_unscaled: 5.1773 (5.5870)  time: 0.7645  data: 0.0368  max mem: 29528
Epoch: [0]  [1020/3696]  eta: 0:35:57  lr: 0.000100  loss: 5.2493 (5.5836)  at: 5.2493 (5.5836)  at_unscaled: 5.2493 (5.5836)  time: 0.7915  data: 0.0360  max mem: 29528
Epoch: [0]  [1030/3696]  eta: 0:35:49  lr: 0.000100  loss: 5.1982 (5.5793)  at: 5.1982 (5.5793)  at_unscaled: 5.1982 (5.5793)  time: 0.8164  data: 0.0363  max mem: 29528
Epoch: [0]  [1040/3696]  eta: 0:35:41  lr: 0.000100  loss: 5.1446 (5.5754)  at: 5.1446 (5.5754)  at_unscaled: 5.1446 (5.5754)  time: 0.8053  data: 0.0375  max mem: 29528
Epoch: [0]  [1050/3696]  eta: 0:35:31  lr: 0.000100  loss: 5.1319 (5.5714)  at: 5.1319 (5.5714)  at_unscaled: 5.1319 (5.5714)  time: 0.7766  data: 0.0359  max mem: 29528
Epoch: [0]  [1060/3696]  eta: 0:35:22  lr: 0.000100  loss: 5.2017 (5.5679)  at: 5.2017 (5.5679)  at_unscaled: 5.2017 (5.5679)  time: 0.7481  data: 0.0365  max mem: 29528
Epoch: [0]  [1070/3696]  eta: 0:35:13  lr: 0.000100  loss: 5.2017 (5.5642)  at: 5.2017 (5.5642)  at_unscaled: 5.2017 (5.5642)  time: 0.7754  data: 0.0387  max mem: 29528
Epoch: [0]  [1080/3696]  eta: 0:35:03  lr: 0.000100  loss: 5.1192 (5.5603)  at: 5.1192 (5.5603)  at_unscaled: 5.1192 (5.5603)  time: 0.7605  data: 0.0383  max mem: 29528
Epoch: [0]  [1090/3696]  eta: 0:34:56  lr: 0.000100  loss: 5.1105 (5.5560)  at: 5.1105 (5.5560)  at_unscaled: 5.1105 (5.5560)  time: 0.7700  data: 0.0379  max mem: 29528
Epoch: [0]  [1100/3696]  eta: 0:34:47  lr: 0.000100  loss: 5.1321 (5.5524)  at: 5.1321 (5.5524)  at_unscaled: 5.1321 (5.5524)  time: 0.8007  data: 0.0380  max mem: 29528
Epoch: [0]  [1110/3696]  eta: 0:34:39  lr: 0.000100  loss: 5.1603 (5.5489)  at: 5.1603 (5.5489)  at_unscaled: 5.1603 (5.5489)  time: 0.7850  data: 0.0382  max mem: 29528
Epoch: [0]  [1120/3696]  eta: 0:34:30  lr: 0.000100  loss: 5.1443 (5.5452)  at: 5.1443 (5.5452)  at_unscaled: 5.1443 (5.5452)  time: 0.7765  data: 0.0383  max mem: 29528
Epoch: [0]  [1130/3696]  eta: 0:34:21  lr: 0.000100  loss: 5.1185 (5.5413)  at: 5.1185 (5.5413)  at_unscaled: 5.1185 (5.5413)  time: 0.7790  data: 0.0372  max mem: 29528
Epoch: [0]  [1140/3696]  eta: 0:34:13  lr: 0.000100  loss: 5.0800 (5.5374)  at: 5.0800 (5.5374)  at_unscaled: 5.0800 (5.5374)  time: 0.7986  data: 0.0356  max mem: 29528
Epoch: [0]  [1150/3696]  eta: 0:34:04  lr: 0.000100  loss: 5.1101 (5.5337)  at: 5.1101 (5.5337)  at_unscaled: 5.1101 (5.5337)  time: 0.7654  data: 0.0345  max mem: 29528
Epoch: [0]  [1160/3696]  eta: 0:33:56  lr: 0.000100  loss: 5.1744 (5.5307)  at: 5.1744 (5.5307)  at_unscaled: 5.1744 (5.5307)  time: 0.7695  data: 0.0344  max mem: 29528
Epoch: [0]  [1170/3696]  eta: 0:33:47  lr: 0.000100  loss: 5.1829 (5.5277)  at: 5.1829 (5.5277)  at_unscaled: 5.1829 (5.5277)  time: 0.7968  data: 0.0362  max mem: 29528
Epoch: [0]  [1180/3696]  eta: 0:33:40  lr: 0.000100  loss: 5.1845 (5.5246)  at: 5.1845 (5.5246)  at_unscaled: 5.1845 (5.5246)  time: 0.8120  data: 0.0374  max mem: 29528
Epoch: [0]  [1190/3696]  eta: 0:33:32  lr: 0.000100  loss: 5.1798 (5.5216)  at: 5.1798 (5.5216)  at_unscaled: 5.1798 (5.5216)  time: 0.8169  data: 0.0371  max mem: 29528
Epoch: [0]  [1200/3696]  eta: 0:33:23  lr: 0.000100  loss: 5.1929 (5.5188)  at: 5.1929 (5.5188)  at_unscaled: 5.1929 (5.5188)  time: 0.7739  data: 0.0361  max mem: 29528
Epoch: [0]  [1210/3696]  eta: 0:33:16  lr: 0.000100  loss: 5.1929 (5.5158)  at: 5.1929 (5.5158)  at_unscaled: 5.1929 (5.5158)  time: 0.7985  data: 0.0340  max mem: 29528
Epoch: [0]  [1220/3696]  eta: 0:33:07  lr: 0.000100  loss: 5.1322 (5.5126)  at: 5.1322 (5.5126)  at_unscaled: 5.1322 (5.5126)  time: 0.8027  data: 0.0350  max mem: 29528
Epoch: [0]  [1230/3696]  eta: 0:32:59  lr: 0.000100  loss: 5.1595 (5.5096)  at: 5.1595 (5.5096)  at_unscaled: 5.1595 (5.5096)  time: 0.7881  data: 0.0374  max mem: 29528
Epoch: [0]  [1240/3696]  eta: 0:32:50  lr: 0.000100  loss: 5.1620 (5.5067)  at: 5.1620 (5.5067)  at_unscaled: 5.1620 (5.5067)  time: 0.7849  data: 0.0365  max mem: 29528
Epoch: [0]  [1250/3696]  eta: 0:32:42  lr: 0.000100  loss: 5.1620 (5.5038)  at: 5.1620 (5.5038)  at_unscaled: 5.1620 (5.5038)  time: 0.7893  data: 0.0357  max mem: 29528
Epoch: [0]  [1260/3696]  eta: 0:32:34  lr: 0.000100  loss: 5.1245 (5.5005)  at: 5.1245 (5.5005)  at_unscaled: 5.1245 (5.5005)  time: 0.8002  data: 0.0359  max mem: 29528
Epoch: [0]  [1270/3696]  eta: 0:32:26  lr: 0.000100  loss: 5.1023 (5.4975)  at: 5.1023 (5.4975)  at_unscaled: 5.1023 (5.4975)  time: 0.8015  data: 0.0362  max mem: 29528
Epoch: [0]  [1280/3696]  eta: 0:32:17  lr: 0.000100  loss: 5.1132 (5.4946)  at: 5.1132 (5.4946)  at_unscaled: 5.1132 (5.4946)  time: 0.7906  data: 0.0349  max mem: 29528
Epoch: [0]  [1290/3696]  eta: 0:32:09  lr: 0.000100  loss: 5.1292 (5.4918)  at: 5.1292 (5.4918)  at_unscaled: 5.1292 (5.4918)  time: 0.7743  data: 0.0334  max mem: 29528
Epoch: [0]  [1300/3696]  eta: 0:32:01  lr: 0.000100  loss: 5.1292 (5.4890)  at: 5.1292 (5.4890)  at_unscaled: 5.1292 (5.4890)  time: 0.7875  data: 0.0339  max mem: 29528
Epoch: [0]  [1310/3696]  eta: 0:31:54  lr: 0.000100  loss: 5.1232 (5.4863)  at: 5.1232 (5.4863)  at_unscaled: 5.1232 (5.4863)  time: 0.8117  data: 0.0343  max mem: 29528
Epoch: [0]  [1320/3696]  eta: 0:31:45  lr: 0.000100  loss: 5.1016 (5.4832)  at: 5.1016 (5.4832)  at_unscaled: 5.1016 (5.4832)  time: 0.8161  data: 0.0341  max mem: 29528
Epoch: [0]  [1330/3696]  eta: 0:31:38  lr: 0.000100  loss: 5.0905 (5.4805)  at: 5.0905 (5.4805)  at_unscaled: 5.0905 (5.4805)  time: 0.8149  data: 0.0343  max mem: 29528
Traceback (most recent call last):
  File "main.py", line 257, in <module>
    main(args)
  File "main.py", line 207, in main
    args.clip_max_norm, learning_rate_schedule)
  File "/opt/tiger/intro/Stable-Pix2Seq/engine.py", line 98, in train_one_epoch
    losses.backward()
  File "/home/tiger/.local/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/tiger/.local/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: CUDA out of memory. Tried to allocate 216.00 MiB (GPU 7; 31.75 GiB total capacity; 29.63 GiB already allocated; 213.75 MiB free; 29.95 GiB reserved in total by PyTorch)
Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/tiger/.local/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in <module>
    main()
  File "/home/tiger/.local/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
    sigkill_handler(signal.SIGTERM, None)  # not coming back
  File "/home/tiger/.local/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
    raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'main.py', '--coco_path', './coco2017/', '--batch_size', '4', '--lr', '0.0005', '--output_dir', './output']' returned non-zero exit status 1.
Killing subprocess 5627
Killing subprocess 5628
Killing subprocess 5629
Killing subprocess 5630
Killing subprocess 5631
Killing subprocess 5632
Killing subprocess 5633

Have you tried to train from scratch and use large learning rate?

trained model

Is there any trained model for testing?
Thank you！

How to do visualization analysis?

How to do visualization analysis? Could you share some scripts?
Thanks

details of transformer code

Thank you for your work, I have a question about sequence embedding. The screenshot is from transformer.py
When you get sequence embedding, the position embedding has already been added to sequence embedding as fllows:

Why do you input the same position embedding into decoder layer ? After this operation, position embedding is added to sequence embedding twice.

CUDA out of memory during training

I was training 'Stable Pix2Seq', everything goes fine until the 3rd training epoch. I wonder if there's any accumulate operation or some tensors or variable should have been deleted.

How to using for panoptic segmentation

can you share the command line when training panoptic seg. I using the command:
python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --coco_path ... --coco_panoptic_path ... --masks
but, there are some error.

Does it support instance segmentation？

I try to use this code for instance segmentation。I using the command ：python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --coco_path ... --masks

But I have encountered some errors. Does this code still not support segmentation tasks?

best regards,
Zhao

Extract token embedding

I want to extract the token embedding as shown in figure 11 of the paper.

However, when looking at the code, I see that the tokens are predicted by feeding the output feature map to a mlp whose last layer's dimension is 2003 (maybe number of tokens). Hence, the model do not learn the token embedding actually and we can't get the learned token embedding.

Am I missing something ?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	def __getitem__(self, idx):
	img, target = super(CocoDetection, self).__getitem__(idx)
	image_id = self.ids[idx]
	target = {'image_id': image_id, 'annotations': target}
	img, target = self.prepare(img, target)
	if self._transforms is not None:
	img1, target1 = self._transforms(img, target)
	img2, target2 = self._transforms(img, target)
	return img1, img2, target1, target2

	def collate_fn(batch):
	batch = list(zip(*batch))
	batch[0] = nested_tensor_from_tensor_list(batch[0] + batch[1])
	return tuple([batch[0], batch[2] + batch[3]])