intuitionmachines / origaminet Goto Github PK

Public implementation of our CVPR Paper "OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page TextRecognition by learning to unfold"

Python 98.80% Shell 1.20%

ocr handwritten-text-recognition text-recognition cvpr2020

origaminet's People

Contributors

Stargazers

Watchers

Forkers

gztangde cqray1990 mzpmzk yanqi1811 shengzhang90 xiaoyubing summerlvsong wangjianyuweg happog killsking xrosliang alavertu kapitsa2811 euphoriayan zhang-deep liuzhuang1024 narab yangyingxiang mashrurmorshed gabrielcalazansdm itsnamgyu kurt-stolle elanning myheart-d swipswaps lmpan benjamesbabala eldervald yeahqing martinhoang11 lighttoyang mhhamdan jmyasir lucasvalentim lyttonkeepfoing kvangreen aniketgurav mosi1372 yutingli0606

origaminet's Issues

Why bAcc is always 0

I have trained the model on icdar datasets. But I observed that bAcc is always 0 even when the iteration is 22000. Is this normal? Or something wrong? Could you give me sone light on the training of the model?

Import error: No module named 'apex.parallel'

I install all the dependencies with ' pip install -r requirements.txt', written in README.md.
But when I try to run the training script 'python train.py --gin iam/iam.gin',
an import error occurs: Import error: No module named 'apex.parallel'
I think I need to install NVIDIA/apex, but after I install the NVIDIA/apex, the problems still remain.
What else should I install or do I install NVIDIA/apex incorrectly ?

What GPUs were used for training on ICDAR2017 HTR?

How many GPUs were used during training on ICDAR2017 HTR and what was their memory capacity or other important stats (if it is necessary to mention)?

ZeroDivisionError: float division by zero

Hello，When I use DP or DDP distribute mode, i met this error. Below is a detailed error log.

Traceback (most recent call last):
File "train.py", line 353, in
train(opt)
File "/ssd/exec/xuejt/home/anaconda3/envs/py36/lib/python3.6/site-packages/gin/config.py", line 1069, in gin_wrapper
utils.augment_exception_message_and_reraise(e, err_str)
File "/ssd/exec/xuejt/home/anaconda3/envs/py36/lib/python3.6/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
raise proxy.with_traceback(exception.traceback) from None
File "/ssd/exec/xuejt/home/anaconda3/envs/py36/lib/python3.6/site-packages/gin/config.py", line 1046, in gin_wrapper
return fn(*new_args, **new_kwargs)
File "train.py", line 227, in train
scaled_loss.backward()
File "/ssd/exec/xuejt/home/anaconda3/envs/py36/lib/python3.6/contextlib.py", line 88, in exit
next(self.gen)
File "/ssd/exec/xuejt/home/anaconda3/envs/py36/lib/python3.6/site-packages/apex/amp/handle.py", line 123, in scale_loss
optimizer._post_amp_backward(loss_scaler)
File "/ssd/exec/xuejt/home/anaconda3/envs/py36/lib/python3.6/site-packages/apex/amp/_process_optimizer.py", line 249, in post_backward_no_master_weights
post_backward_models_are_masters(scaler, params, stashed_grads)
File "/ssd/exec/xuejt/home/anaconda3/envs/py36/lib/python3.6/site-packages/apex/amp/_process_optimizer.py", line 135, in post_backward_models_are_masters
scale_override=(grads_have_scale, stashed_have_scale, out_scale))
File "/ssd/exec/xuejt/home/anaconda3/envs/py36/lib/python3.6/site-packages/apex/amp/scaler.py", line 183, in unscale_with_stashed
out_scale/grads_have_scale,
ZeroDivisionError: float division by zero

demo

I have trained my own data, but no demo is provided in the code. I want to see the recognition effect. Can you provide the demo code？

Batch Replay

Thanks so much for the paper and repo.

I was wondering if you could explain the batch replay that appears in the training loop.

Thanks!

CUDA out of memory error

Hi, when tried this over Colab, I'm encountering the following error with a batch size of 2

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 11.17 GiB total capacity; 10.76 GiB already allocated; 12.81 MiB free; 101.88 MiB cached)
In call to configurable 'train' (<function train at 0x7fb2fb7f1e60>)

There's no error when the batch size is 1. Is there a way to fix this or do I really need a GPU with a higher memory??

For a GPU with 12Gb memory, is the batch size limited to only 1?

When will the code be available ?

Hi,

Thanks for the paper, I'd be interested in trying to evaluate your results on different datasets than IAM. When will the code be made available ?

Thanks,

Thomas Delteil

CER stuck at 6.5% for IAM

I used the config in iam.gin, but can only get 6.5% CER, instead of the 4.85% CER.

Horovod with pytorch issue

Hi,
I followed all installation as suggested by the repository readme file. It gave me following error when I tried to run it 👍

File "train.py", line 12, in
check_extension('horovod.torch', 'HOROVOD_WITH_PYTORCH', file, 'mpi_lib_v2')
File "/home/annapurna/anaconda3/envs/origami/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 14, in
check_extension('horovod.torch', 'HOROVOD_WITH_PYTORCH', file, 'mpi_lib', '_mpi_lib')
File "/home/annapurna/anaconda3/envs/origami/lib/python3.6/site-packages/horovod/common/util.py", line 56, in check_extension
'Horovod with %s=1 to debug the build error.' % (ext_name, ext_env_var))
ImportError: Extension horovod.torch has not been built. If this is not expected, reinstall Horovod with HOROVOD_WITH_PYTORCH=1 to debug the build error.

Please suggest as to what has gone wrong?
I tried to install horovod with pytorch as well but still facing the same issue.

How long did you spend on training? And how many GPUs?

Inputting text into the model

I've been inspecting the code and was wondering, what is the purpose of inputting text into the model?

In the OrigamiNet class in cnv_model.py, the forward method seems to accept a list t=[], or text, that has no apparent use.

In train.py, text is passed into the model in this fashion:

preds = model(image,text).float()

And in validation, an empty string is passed:

preds = model(image, '')

Can you please provide pretrained model/weights/*.pth files?

This will help to save time and electric energy on checking the model.

P.S.: Sorry if this is an inappropriate question.

Training on IAM full paragraph, but the CER is stuck at ~0.7

Hi, I was trying to train the OrigamiNet on the IAM full paragraph dataset. Due to hardware limitations, I could only use batch size of 1. But the CER is kinda stuck at ~0.7 for more than 20k steps. Is this normal? Or are there any tweaks I can make to overcome this. Thanks!