Coder Social home page Coder Social logo

lisp2021 / swindocsegmenter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ayanban011/swindocsegmenter

0.0 0.0 0.0 19.91 MB

An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation [ICDAR 2023] (Oral)

License: Apache License 2.0

Shell 0.10% C++ 0.90% Python 57.60% Cuda 8.18% Jupyter Notebook 33.22%

swindocsegmenter's Introduction

SwinDocSegmenter

Description

Pytorch implementation of the paper SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation. This model is implemented on top of the detectron2 framework. The proposed model can be used to analysis the complex layouts including magazines, Scientific Reports, historical documents, patents and so on as shown in the following examples.

Magazines Scientific Reports
1 2
Tables Others
1 2

Getting Started

Step 1: Clone this repository and change directory to repository root

git clone https://github.com/ayanban011/SwinDocSegmenter.git 
cd SwinDocSegmenter

Step 2: Setup and activate the conda environment with required dependencies:

follow the installation instructions

Step 3: For testing our model, download the best pretrained model weights from the Model Zoo

python ./train_net.py \
    --config-file maskdino_R50_bs16_50ep_4s_dowsample1_2048.yaml \
    --eval-only \
    --num-gpus 1 \
    MODEL.WEIGHTS ./model_final.pth

Step 4: For training the model from scratch, use this magic command for training on 'n' GPUs:

python train_net.py --num-gpus 1 --config-file config_path SOLVER.IMS_PER_BATCH SET_TO_SOME_REASONABLE_VALUE SOLVER.BASE_LR SET_TO_SOME_REASONABLE_VALUE

Step 4: For training the model from scratch, use this magic command for training on 'n' GPUs:

In train_net.py

def main(args):
    register_coco_instances("dataset_train",{},"path to the ground truth json file","path to the training image folder")
    register_coco_instances("dataset_val",{},"path to the ground truth json file","path to the validation image folder")

    MetadataCatalog.get("dataset_train").thing_classes = ['name of the classes']
    MetadataCatalog.get("dataset_val").thing_classes = ['name of the classes']
    ...
if __name__ == "__main__":
    ...
    MetadataCatalog.get("dataset_train").thing_classes = ['name of the classes']
    MetadataCatalog.get("dataset_val").thing_classes = ['name of the classes']
    ...

In Config File

...
SEM_SEG_HEAD:
    ...
    NUM_CLASSES: #no. of classes
...
DATASETS:
  TRAIN: ("dataset_train",)
  TEST: ("dataset_val",)
...

Model Zoo

In this section, we release the pre-trained weights for all the best DocEnTr model variants trained on benchmark datasets.

Dataset Config-file Weights AP
PublayNet config-publay model 93.72
Prima config-prima model 54.39
HJ Dataset config-hj model 84.65
TableBank config-table model 98.04
DoclayNet config-doclay model 76.85

Citation

If you find this useful for your research, please cite it as follows:

@article{banerjee2023swindocsegmenter,
  title={SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation},
  author={Banerjee, Ayan and Biswas, Sanket and Llad{\'o}s, Josep and Pal, Umapada},
  journal={arXiv preprint arXiv:2305.04609},
  year={2023}
}

Acknowledgement

Many thanks to these excellent opensource projects

Authors

Conclusion

Thank you for your interest in our work, and sorry if there are any bugs.

swindocsegmenter's People

Contributors

ayanban011 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.