gaopengcuhk / pretrained-pix2seq Goto Github PK

View Code? Open in Web Editor NEW

59.0 59.0 5.0 39 KB

Replication of Pix2Seq with Pretrained Model

Python 99.86% Shell 0.14%

pretrained-pix2seq's People

Contributors

Stargazers

Watchers

Forkers

hanqiu-hq sharpiless woody8657 allanj kevinyc5

pretrained-pix2seq's Issues

Why much slower than Stable-Pix2Seq

I found that it takes about 75 minutes per epoch during I was training 'Pre-trained Pix2Seq', while only takes 50 minutes per epoch in 'Stable Pix2Seq'. Why? Where's the differences between them?

About your setting 'mask' equals to 'False'

Hi, thanks for sharing!
I am wondering that why you set all elements of mask equal to False in pix2seq.py?

        src, mask = features[-1].decompose() 
        assert mask is not None
        mask = torch.zeros_like(mask).bool()

Will you try to align the accuracy to the paper based on the official tensorflow code ?

The offical code: https://github.com/google-research/pix2seq

Pretrained Model Link

Hi , Will you make the checkpoint of the pretrained model public here in this branch ?

hi, great work! We also try to reimplement the Pix2Seq, we find the absolute coordinate is useful, which is similar to your LargeScaleJitter (pad or crop the image to the fix desired size),
the absolute coordinate means that normalized the position by dividing the fix size.
boxes = boxes / 1333. instead of boxes = boxes / torch.tensor([w, h, w, h], dtype=torch.float32),Then, padding or croppinf the image to the fix desired size is not necessary.

A detail about your code

Hi! Thank you for your great work.
And I don't understand the detail about the vocal_embed:
In your code, 'self.vocal_embed = nn.Embedding(self.num_vocal-2, d_model)', here why subtract 2? and what is the mean of 2?
Thanks

About Cusom Datasets

Thanks for your great work, I successfully tested on the coco dataset.
I want to test on my own datasets(coco format) that are 13 classes from 1-13 catID. I just edit the num_classes(91 to 14) in 'build' function at pix2seq.py. Is that corect?
But I got this error:

File "main.py", line 260, in <module> main(args) File "main.py", line 194, in main train_stats = train_one_epoch( File "/home/dsm/graduate/Pretrained-Pix2Seq/engine.py", line 27, in train_one_epoch for samples, targets in metric_logger.log_every(data_loader, print_freq, header): File "/home/dsm/graduate/Pretrained-Pix2Seq/util/misc.py", line 224, in log_every for obj in iterable: File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 517, in __next__ data = self._next_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1182, in _next_data idx, data = self._get_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1148, in _get_data success, data = self._try_get_data() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 986, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/usr/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 282, in rebuild_storage_fd fd = df.detach() File "/usr/lib/python3.8/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/usr/lib/python3.8/multiprocessing/reduction.py", line 189, in recv_handle return recvfds(s, 1)[0] File "/usr/lib/python3.8/multiprocessing/reduction.py", line 164, in recvfds raise RuntimeError('received %d items of ancdata' % RuntimeError: received 0 items of ancdata

Some problems about your great work

Hi! Thank you for your great work.
And I have some problems about your work:
in the paper,there are some dropouts for class when building input seq,but in your work i never find them,so do you ingore them? and if i want to use dropout, do i need to add a drop token in the vocal?