simplify23 / cdistnet Goto Github PK

Official Pytorch implementations of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition（IJCV）

License: Apache License 2.0

Jupyter Notebook 53.39% Python 46.61%

cdistnet's Introduction

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The official code of CDistNet.

Paper Link : Arxiv Link

What's News

[2023-08]🌟 Our paper is accepted by IJCV
[2022-01]🌟 Our code is released in github
[2021-11]🌟 The paper can be read in Arixv: http://arxiv.org/abs/2111.11011

To Do List

Two New Datasets

we test other sota method in HA-IC13 and CA-IC13 datasets.

CDistNet has a performance advantage over other SOTA methods as the character distance increases (1-6)

HA-IC13

Method	1	2	3	4	5	6	Code & Pretrain model
VisionLAN (ICCV 2021)	93.58	92.88	89.97	82.26	72.23	61.03	Offical Code
ABINet (CVPR 2021 )	95.92	95.22	91.95	85.76	73.75	64.99	Offical Code
RobustScanner* (ECCV 2020)	96.15	95.33	93.23	88.91	81.10	71.53	--
Transformer-baseline*	96.27	95.45	92.42	86.46	79.35	72.46	--
CDistNet	96.62	96.15	94.28	89.96	83.43	77.71	--

CA-IC13

Method	1	2	3	4	5	6	Code & Pretrain model
VisionLAN (ICCV 2021)	94.87	92.77	84.01	75.03	64.29	52.74	Offical Code
ABINet (CVPR 2021 )	96.62	95.92	87.86	76.31	65.46	54.49	Offical Code
RobustScanner* (ECCV 2020)	95.22	94.87	85.30	76.55	68.38	60.79	--
Transformer-baseline*	95.68	94.40	85.88	75.85	65.93	58.58	--
CDistNet	96.27	95.57	88.45	79.58	70.36	63.13	--

Datasets

The datasets are same as ABINet

Training datasets
1. MJSynth (MJ):
  - LMDB dataset BaiduNetdisk(passwd:n23k)
2. SynthText (ST):
  - LMDB dataset BaiduNetdisk(passwd:n23k)
Evaluation & Test datasets, LMDB datasets can be downloaded from BaiduNetdisk(passwd:1dbv), GoogleDrive.
1. ICDAR 2013 (IC13)
2. ICDAR 2015 (IC15)
3. IIIT5K Words (IIIT)
4. Street View Text (SVT)
5. Street View Text-Perspective (SVTP)
6. CUTE80 (CUTE)
Augment IC13
- HA-IC13 & CA-IC13 : BaiduNetdisk(passwd:d6jd), GoogleDrive

The structure of dataset directory is

dataset
├── eval
│   ├── CUTE80
│   ├── IC13_857
│   ├── IC15_1811
│   ├── IIIT5k_3000
│   ├── SVT
│   └── SVTP
├── train
│   ├── MJ
│   │   ├── MJ_test
│   │   ├── MJ_train
│   │   └── MJ_valid
│   └── ST

Environment

package you can find in env_cdistnet.yaml.

#Installed
conda create -n CDistNet python=3.7
conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=9.2 -c pytorch
pip install opencv-python mmcv notebook numpy einops tensorboardX Pillow thop timm tornado tqdm matplotlib lmdb

Pretrained Models

Get the pretrained models from BaiduNetdisk(passwd:d6jd), GoogleDrive. (We both offer training log and result.csv in same file.) The pretrained model should set in models/reconstruct_CDistNetv3_3_10

Performances of the pretrained models are summaried as follows:

Train

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --config=configs/CDistNet_config.py

Eval

CUDA_VISIBLE_DEVICES=0 python eval.py --config=configs/CDistNet_config.py

Citation

@article{Zheng2021CDistNetPM,
  title={CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition},
  author={Tianlun Zheng and Zhineng Chen and Shancheng Fang and Hongtao Xie and Yu-Gang Jiang},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.11011}
}

cdistnet's People

Contributors

Stargazers

Watchers

Forkers

huyhoang17 zhinchenfd fireae shubham303 rubatoo pashteticus hell-to-heaven namdo281 nabang1010 seonwhee-genome clickshn ashishpapanai licy5152 ccshi attendfov dilithjay

cdistnet's Issues

Questions on Conv2d of Transformer Layers

Hello, while examining the code,

I noticed that most of the nn.Linear() operations are replaced with nn.Conv2d(kernel_size=(1,1)) operations

when comparing nn.Transformer and the implementation of the code.

Is there a benefit for such replacement?

How many layers of MDCDP net did you use in best performing english model?

Any way to train only on language and not images?

I have a relatively small dataset of a different format (license plates) and it often gets license plate format wrong.

I was wondering if there was a way to train the model on just a bunch of text string data without feeding any images at all in order to enforce the format.

Please let me know if it is possible to train the language/semantic model independently, by just feeding string text data of words, without corresponding images.

train for other language

hello thanks for your paper and released codes
I want to train your code for other language but I see in lmdbdataset that you use English char and limit the max length to 30 that is true?
I should change line 245 and 246?

`def len(self):
return self.length

def get(self,idx):
    with self.env.begin(write=False) as txn:
        image_key, label_key = f'image-{idx+1:09d}', f'label-{idx+1:09d}'
        label = str(txn.get(label_key.encode()), 'utf-8')  # label
        label = re.sub('[^0-9a-zA-Z]+', '', label)
        label = label[:30]`

(CDistNet) C:\<path>\CDistNet>python test.py --i_path ..\examples\300_0.jpg 
configs/CDistNet_config.py
<class 'str'>
Traceback (most recent call last):
  File "test.py", line 175, in <module>
    main()
  File "test.py", line 168, in main
    test_one(cfg, args)
  File "test.py", line 126, in test_one
    en = get_parameter_number(model.transformer.encoder)
  File "C:\<path>\miniconda3\envs\CDistNet\lib\site-packages\torch\nn\modules\module.py", line 1178, in __getattr__ 
    type(self).__name__, name))
AttributeError: 'CDistNet' object has no attribute 'transformer'