Coder Social home page Coder Social logo

docvqa's Introduction

Document Visual Question Answering (DocVQA)

This repo hosts the basic functional code for our approach entitled HyperDQA in the Document Visual Question Answering competition hosted as a part of Workshop on Text and Documents in Deep Learning Era at CVPR2020. Our approach stands at position 4 on the Leaderboard.

Read more about our approach in this blogpost!

Installation

Virtual Environment Python 3 (Recommended)

  1. Clone the repository
git clone https://github.com/anisha2102/docvqa.git
  1. Install libraries
pip install -r requirements.txt

Downloads

  1. Download the dataset The dataset for Task 1 can be downloaded from the Competition Website from the Downloads Section. The dataset consists of document images and their corresponding OCR transcriptions.

  2. Download the pretrained model Download the pretrained model for LayoutLM-Base, Uncased from here

Prepare dataset

python create_dataset.py \
         <data-ocr-folder> \
         <data-documents-folder> \
         <path-to-train_v1.0.json> \
         <train-output-json-path> \
         <validation-output-json-path>

Train the model

CUDA_VISIBLE_DEVICES=0 python run_docvqa.py \
    --data_dir <data-folder> \
    --model_type layoutlm \
    --model_name_or_path <pretrained-model-path> \ #example ./models/layoutlm-base-uncased
    --do_lower_case \
    --max_seq_length 512 \
    --do_train \
    --num_train_epochs 15 \
    --logging_steps 500 \
    --evaluate_during_training \
    --save_steps 500 \
    --do_eval \
    --output_dir  <data-folder>/<exp-folder> \
    --per_gpu_train_batch_size 8 \
    --overwrite_output_dir \
    --cache_dir <data-folder>/models \
    --skip_match_answers \
    --val_json <train-output-json-path> \
    --train_json <train-output-json-path> \

Model Checkpoints

Download the pytorch_model.bin file from the link below and copy it to the models folder. Google Drive Link

Demo

Try out the demo on a sample datapoint with demo.ipynb

Acknowledgements

The code and pretrained models are based on LayoutLM and HuggingFace Transformers. Many thanks for their amazing open source contributions.

docvqa's People

Contributors

anisha2102 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

docvqa's Issues

Prepare dataset

Hi, Thank you for opening great code!
I have a question about preparing dataset.

In the section of Prepare dataset,
Does each train, val, test datasets have two json files(train, val)?

Using document chunks with no answer for training

Hi!

In the utils_docvqa.py script the convert_examples_to_features function is responsible for generating the features for training. I see you use a sliding window approach to cut documents that are longer than the maximum sequence length, so you end up doc parts with no answer in them.

Why do you include those doc spans with no answer into the feature set at the end?
utils_docvqa.py line 308-327: here you state in the comment that you throw those docs out:

      if is_training and not example.is_impossible:
        # For training, if our document chunk does not contain an annotation
        # we throw it out, since there is nothing to predict.

but then you just set the start and end positions to 0, and include the feature into the final feature set. Is there a reason for that?
Why not just skip those examples and not include them into the feature set?

Thanks for the answer!

the f1 and EM score too low when eval

Hi,thanks for your code,when I use my trained moel on the val dataset,the exact match score is lower than 3 and the f1 score is lower than 10,is this normal and how can I improve the score?

Details of install requirements

Hi,
Thank you for opening your code.
I want to know the details of the version about the install requirements.
There are some collisions between modules.

Could you provide me a list of 'conda list'?

Thank you.

Question in blogpost

Hi, thank you for opening great code.
I have a question from your blog post.

As you mentioned blogpost, before you fine-tune the language model (layoutlm) by docvqa dataset, you pre-trained the layoutlm using the squad dataset.

In the training phase, how to set up the 2d positional encoding? as far as I know, there are no 2d positional information in the squad dataset.

Thank you :)

Usage

I notice the code is in Tensorflow but the Google Drive link is a PyTorch model. Will there be a TF Model in the future?
Is the PyTorch model a trained model?

AttributeError: 'DocvqaExample' object has no attribute 'answers'

I encountered an error.

Traceback (most recent call last):
File "run_docvqa.py", line 850, in
main()
File "run_docvqa.py", line 744, in main
args, train_dataset, model, tokenizer, labels, pad_token_label_id
File "run_docvqa.py", line 251, in train
mode="dev",
File "run_docvqa.py", line 366, in evaluate
results = squad_evaluate(examples, predictions)
File "my_path/squad_metrics.py", line 212, in squad_evaluate
qas_id_to_has_answer = {example.qas_id: bool(example.answers) for example in examples}
File "my_path/squad_metrics.py", line 212, in
qas_id_to_has_answer = {example.qas_id: bool(example.answers) for example in examples}
AttributeError: 'DocvqaExample' object has no attribute 'answers'

class DocvqaExample(object):
    """A single training/test example for token classification."""


    def __init__(self,
               qas_id,
               question_text,
               doc_tokens,
               orig_answer_text=None,
               start_position=None,
               end_position=None,
               is_impossible=False,
               boxes = []):
        self.qas_id = qas_id
        self.question_text = question_text
        self.doc_tokens = doc_tokens
        self.orig_answer_text = orig_answer_text
        self.start_position = start_position
        self.end_position = end_position
        self.is_impossible = is_impossible
        self.boxes = boxes

I checked that class DocvqaExample has no attribute 'answers'.

def squad_evaluate(examples, preds, no_answer_probs=None, no_answer_probability_threshold=1.0):
    qas_id_to_has_answer = {example.qas_id: bool(example.answers) for example in examples}
    has_answer_qids = [qas_id for qas_id, has_answer in qas_id_to_has_answer.items() if has_answer]
    no_answer_qids = [qas_id for qas_id, has_answer in qas_id_to_has_answer.items() if not has_answer]

    if no_answer_probs is None:
        no_answer_probs = {k: 0.0 for k in preds}

    exact, f1 = get_raw_scores(examples, preds)

    exact_threshold = apply_no_ans_threshold(
        exact, no_answer_probs, qas_id_to_has_answer, no_answer_probability_threshold
    )
    f1_threshold = apply_no_ans_threshold(f1, no_answer_probs, qas_id_to_has_answer, no_answer_probability_threshold)

    evaluation = make_eval_dict(exact_threshold, f1_threshold)

    if has_answer_qids:
        has_ans_eval = make_eval_dict(exact_threshold, f1_threshold, qid_list=has_answer_qids)
        merge_eval(evaluation, has_ans_eval, "HasAns")

    if no_answer_qids:
        no_ans_eval = make_eval_dict(exact_threshold, f1_threshold, qid_list=no_answer_qids)
        merge_eval(evaluation, no_ans_eval, "NoAns")

    if no_answer_probs:
        find_all_best_thresh(evaluation, preds, exact, f1, no_answer_probs, qas_id_to_has_answer)

    return evaluation

My transformers version is 2.8.0.
What should I do?
Please, anyone give me some help.

Version of transformers library

Can I get more information about Transformers version? I get a lot of errors when running run_docvqa.py cause Transformers version is not matched with another library. Thank you so much

Not able to post process the model output.

I am not able to process the output. Please refer to the error log.

Traceback (most recent call last)
Input In [12], in <cell line: 7>()
     23 eval_feature = features[example_index.item()]
     24 unique_id = int(eval_feature.unique_id)
---> 26 output = [to_list(output[i]) for output in outputs]
     27 print("type: ",output)
     28 start_logits, end_logits = output

Input In [12], in <listcomp>(.0)
     23 eval_feature = features[example_index.item()]
     24 unique_id = int(eval_feature.unique_id)
---> 26 output = [to_list(output[i]) for output in outputs]
     27 print("type: ",output)
     28 start_logits, end_logits = output

Input In [12], in to_list(tensor)
      4 def to_list(tensor):
----> 5     return tensor.detach().cpu().tolist()

AttributeError: 'str' object has no attribute 'detach'

How to create sample_data.json?

Firstly thanks for making this open-source! I was looking through the example and was wondering how you get the sample_data.json file, as the DocVQA dataset task 1's ocr results .json files look very different. Thanks!

cannot find "test.txt"file

when I run run_docvqa.py:
with open(os.path.join(args.data_dir,"test.txt"),"r") as f:
cannot find this txt file,can you show me where can I find this file or how can I create this file ?

your model from google drive link seems corrupted

can you please upload the model for demo please.
ValueError: The state dictionary of the model you are training to load is corrupted. Are you sure it was properly saved?

I guess model file is corrupted.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.