Coder Social home page Coder Social logo

simplify23 / cdistnet Goto Github PK

View Code? Open in Web Editor NEW
106.0 12.0 18.0 1.66 MB

Official Pytorch implementations of CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition(IJCV)

License: Apache License 2.0

Jupyter Notebook 53.39% Python 46.61%

cdistnet's Introduction

CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition

The official code of CDistNet.

Paper Link : Arxiv Link

What's News

  • [2023-08]🌟 Our paper is accepted by IJCV
  • [2022-01]🌟 Our code is released in github
  • [2021-11]🌟 The paper can be read in Arixv: http://arxiv.org/abs/2111.11011

pipline

To Do List

  • HA-IC13 & CA-IC13
  • Pre-train model
  • Cleaned Code
  • Document
  • Distributed Training

Two New Datasets

we test other sota method in HA-IC13 and CA-IC13 datasets.

HA_CA CDistNet has a performance advantage over other SOTA methods as the character distance increases (1-6)

HA-IC13

Method 1 2 3 4 5 6 Code & Pretrain model
VisionLAN (ICCV 2021) 93.58 92.88 89.97 82.26 72.23 61.03 Offical Code
ABINet (CVPR 2021 ) 95.92 95.22 91.95 85.76 73.75 64.99 Offical Code
RobustScanner* (ECCV 2020) 96.15 95.33 93.23 88.91 81.10 71.53 --
Transformer-baseline* 96.27 95.45 92.42 86.46 79.35 72.46 --
CDistNet 96.62 96.15 94.28 89.96 83.43 77.71 --

CA-IC13

Method 1 2 3 4 5 6 Code & Pretrain model
VisionLAN (ICCV 2021) 94.87 92.77 84.01 75.03 64.29 52.74 Offical Code
ABINet (CVPR 2021 ) 96.62 95.92 87.86 76.31 65.46 54.49 Offical Code
RobustScanner* (ECCV 2020) 95.22 94.87 85.30 76.55 68.38 60.79 --
Transformer-baseline* 95.68 94.40 85.88 75.85 65.93 58.58 --
CDistNet 96.27 95.57 88.45 79.58 70.36 63.13 --

Datasets

The datasets are same as ABINet

Environment

package you can find in env_cdistnet.yaml.

#Installed
conda create -n CDistNet python=3.7
conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=9.2 -c pytorch
pip install opencv-python mmcv notebook numpy einops tensorboardX Pillow thop timm tornado tqdm matplotlib lmdb

Pretrained Models

Get the pretrained models from BaiduNetdisk(passwd:d6jd), GoogleDrive. (We both offer training log and result.csv in same file.) The pretrained model should set in models/reconstruct_CDistNetv3_3_10

Performances of the pretrained models are summaried as follows:

Train

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --config=configs/CDistNet_config.py

Eval

CUDA_VISIBLE_DEVICES=0 python eval.py --config=configs/CDistNet_config.py

Citation

@article{Zheng2021CDistNetPM,
  title={CDistNet: Perceiving Multi-Domain Character Distance for Robust Text Recognition},
  author={Tianlun Zheng and Zhineng Chen and Shancheng Fang and Hongtao Xie and Yu-Gang Jiang},
  journal={ArXiv},
  year={2021},
  volume={abs/2111.11011}
}

cdistnet's People

Contributors

simplify23 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cdistnet's Issues

Questions on Conv2d of Transformer Layers

Hello, while examining the code,

I noticed that most of the nn.Linear() operations are replaced with nn.Conv2d(kernel_size=(1,1)) operations

when comparing nn.Transformer and the implementation of the code.

Is there a benefit for such replacement?

Any way to train only on language and not images?

I have a relatively small dataset of a different format (license plates) and it often gets license plate format wrong.

I was wondering if there was a way to train the model on just a bunch of text string data without feeding any images at all in order to enforce the format.

Please let me know if it is possible to train the language/semantic model independently, by just feeding string text data of words, without corresponding images.

train for other language

hello thanks for your paper and released codes
I want to train your code for other language but I see in lmdbdataset that you use English char and limit the max length to 30 that is true?
I should change line 245 and 246?

`def len(self):
return self.length

def get(self,idx):
    with self.env.begin(write=False) as txn:
        image_key, label_key = f'image-{idx+1:09d}', f'label-{idx+1:09d}'
        label = str(txn.get(label_key.encode()), 'utf-8')  # label
        label = re.sub('[^0-9a-zA-Z]+', '', label)
        label = label[:30]`

Can it work with phone number ?

I trained the model with billboards but when I inference it, it doesn't work well with sequences of numbers or phone numbers. Can you help me? Thanks very much.

accuracy is lower than other models

I tried to train your model but I got accuracy is lower than other transformer models. could you please let me know how can I got higher accuracy ?

CDistNetv2

@simplify23 are you planning to release CDistNetv2 code?
waiting for light weight and faster module

Open source license?

Are you willing to specify an open-source license such as a MIT License?
The github has no license specified.

Attention Maps

Could you please release the code to generate the attention maps as published in the paper

Inference Time

I tried your network and got a good result but I faced the problem of inference speed. could you please let me know I can increase the speed of recognition?

Missing transformer

When trying to run test.py I get the following error:

(CDistNet) C:\<path>\CDistNet>python test.py --i_path ..\examples\300_0.jpg 
configs/CDistNet_config.py
<class 'str'>
Traceback (most recent call last):
  File "test.py", line 175, in <module>
    main()
  File "test.py", line 168, in main
    test_one(cfg, args)
  File "test.py", line 126, in test_one
    en = get_parameter_number(model.transformer.encoder)
  File "C:\<path>\miniconda3\envs\CDistNet\lib\site-packages\torch\nn\modules\module.py", line 1178, in __getattr__ 
    type(self).__name__, name))
AttributeError: 'CDistNet' object has no attribute 'transformer'

about inference

How to set the parameters of input_char,such as predict a new image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.