Coder Social home page Coder Social logo

udop's Introduction

Unifying Vision, Text, and Layout for Universal Document Processing

Zineng Tang, Ziyi Yang, Guoxin Wang, Yuwei Fang, Yang Liu, Chenguang Zhu, Michael Zeng, Cha Zhang, Mohit Bansal

Code Release Here

Code is rehosted at part of the i-code project

Open Source Checklist:

  • Release Model (Encoder + Text decoder)
  • Release Most Scripts
  • Vision Decoder / Weights (Due to fake document generation ethical consideration, we plan to release this functionality as an Azure API)
  • Demos

Introduction

UDOP unifies vision, text, and layout through vision-text-layout Transformer and unified generative pretraining tasks including vision task, text task, layout task, and mixed task. We show the task prompts (left) and task targets (right) for all self-supervised objectives (joint text-layout reconstruction, visual text recognition, layout modeling, and masked autoencoding) and two example supervised objectives (question answering and layout analysis).

udop's People

Contributors

eify avatar microsoft-github-operations[bot] avatar microsoftopensource avatar zinengtang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

udop's Issues

Happy to support this model in ๐Ÿค— Transformers

Hi there,

Impressive work! We've got multiple Document AI models in ๐Ÿค— Transformers, including:

  • LayoutLM
  • LayoutLMv2
  • LayoutLMv3
  • LiLT
  • Table Transformer
  • MarkupLM
  • Donut

So happy to add support for UDOP as well! Let me know whether you are interested in integrating the model in the library.

Number of Epochs for Fine-tuning Tasks

Hi Authors @zinengtang,
Appreciate your interesting paper UDOP. Wanted to know the specifics on fine-tuning. Appendix C.6 in the paper does provide some details but the crucial detail of number of epochs is missing. Specifically:

  1. Number of epochs fine-tuned for each downstream task?
  2. Are all downstream tasks fine-tuned for the same number of epochs?

p.s. I posted this question in i-code repo too but that repo seems to be inactive.

Question regarding Training of the

Thank you for publishing UDOP, a real step towards unified document processing!

Can you give Information about the training time (on the used equipment) for the self-supervised learning and the curriculum learning parts?

thank you in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.