anisha2102 / docvqa Goto Github PK

Document Visual Question Answering

License: MIT License

Python 93.13% Jupyter Notebook 6.87%

visual-question-answering computer-vision deep-learning document-analysis

docvqa's Introduction

Document Visual Question Answering (DocVQA)

This repo hosts the basic functional code for our approach entitled HyperDQA in the Document Visual Question Answering competition hosted as a part of Workshop on Text and Documents in Deep Learning Era at CVPR2020. Our approach stands at position 4 on the Leaderboard.

Read more about our approach in this blogpost!

Installation

Virtual Environment Python 3 (Recommended)

Clone the repository

git clone https://github.com/anisha2102/docvqa.git

Install libraries

pip install -r requirements.txt

Downloads

Download the dataset The dataset for Task 1 can be downloaded from the Competition Website from the Downloads Section. The dataset consists of document images and their corresponding OCR transcriptions.
Download the pretrained model Download the pretrained model for LayoutLM-Base, Uncased from here

Prepare dataset

python create_dataset.py \
         <data-ocr-folder> \
         <data-documents-folder> \
         <path-to-train_v1.0.json> \
         <train-output-json-path> \
         <validation-output-json-path>

Train the model

CUDA_VISIBLE_DEVICES=0 python run_docvqa.py \
    --data_dir <data-folder> \
    --model_type layoutlm \
    --model_name_or_path <pretrained-model-path> \ #example ./models/layoutlm-base-uncased
    --do_lower_case \
    --max_seq_length 512 \
    --do_train \
    --num_train_epochs 15 \
    --logging_steps 500 \
    --evaluate_during_training \
    --save_steps 500 \
    --do_eval \
    --output_dir  <data-folder>/<exp-folder> \
    --per_gpu_train_batch_size 8 \
    --overwrite_output_dir \
    --cache_dir <data-folder>/models \
    --skip_match_answers \
    --val_json <train-output-json-path> \
    --train_json <train-output-json-path> \

Model Checkpoints

Download the pytorch_model.bin file from the link below and copy it to the models folder. Google Drive Link

Demo

Try out the demo on a sample datapoint with demo.ipynb

Acknowledgements

The code and pretrained models are based on LayoutLM and HuggingFace Transformers. Many thanks for their amazing open source contributions.

docvqa's People

Contributors

Stargazers

Watchers

docvqa's Issues

Prepare dataset

Hi, Thank you for opening great code!
I have a question about preparing dataset.

In the section of Prepare dataset,
Does each train, val, test datasets have two json files(train, val)?

Using document chunks with no answer for training

Hi!

In the utils_docvqa.py script the convert_examples_to_features function is responsible for generating the features for training. I see you use a sliding window approach to cut documents that are longer than the maximum sequence length, so you end up doc parts with no answer in them.

Why do you include those doc spans with no answer into the feature set at the end?
utils_docvqa.py line 308-327: here you state in the comment that you throw those docs out:

      if is_training and not example.is_impossible:
        # For training, if our document chunk does not contain an annotation
        # we throw it out, since there is nothing to predict.

but then you just set the start and end positions to 0, and include the feature into the final feature set. Is there a reason for that?
Why not just skip those examples and not include them into the feature set?

Thanks for the answer!

Your model from Google Drive seems corrupted

Cannot Download the model from Google Drive . It shows the model is corrupted.

the f1 and EM score too low when eval

Hi,thanks for your code,when I use my trained moel on the val dataset,the exact match score is lower than 3 and the f1 score is lower than 10,is this normal and how can I improve the score?

Details of install requirements

Hi,
Thank you for opening your code.
I want to know the details of the version about the install requirements.
There are some collisions between modules.

Could you provide me a list of 'conda list'?

Thank you.

Question in blogpost

Hi, thank you for opening great code.
I have a question from your blog post.

As you mentioned blogpost, before you fine-tune the language model (layoutlm) by docvqa dataset, you pre-trained the layoutlm using the squad dataset.

In the training phase, how to set up the 2d positional encoding? as far as I know, there are no 2d positional information in the squad dataset.

Thank you :)

Usage

I notice the code is in Tensorflow but the Google Drive link is a PyTorch model. Will there be a TF Model in the future?
Is the PyTorch model a trained model?

question about performance in DocAVQ

I use your code, the performance on the DocAVQ dataset can only reach 49. Can you provide your trained model？

AttributeError: 'DocvqaExample' object has no attribute 'answers'

I encountered an error.

Traceback (most recent call last):
File "run_docvqa.py", line 850, in
main()
File "run_docvqa.py", line 744, in main
args, train_dataset, model, tokenizer, labels, pad_token_label_id
File "run_docvqa.py", line 251, in train
mode="dev",
File "run_docvqa.py", line 366, in evaluate
results = squad_evaluate(examples, predictions)
File "my_path/squad_metrics.py", line 212, in squad_evaluate
qas_id_to_has_answer = {example.qas_id: bool(example.answers) for example in examples}
File "my_path/squad_metrics.py", line 212, in
qas_id_to_has_answer = {example.qas_id: bool(example.answers) for example in examples}
AttributeError: 'DocvqaExample' object has no attribute 'answers'

class DocvqaExample(object):
    """A single training/test example for token classification."""


    def __init__(self,
               qas_id,
               question_text,
               doc_tokens,
               orig_answer_text=None,
               start_position=None,
               end_position=None,
               is_impossible=False,
               boxes = []):
        self.qas_id = qas_id
        self.question_text = question_text
        self.doc_tokens = doc_tokens
        self.orig_answer_text = orig_answer_text
        self.start_position = start_position
        self.end_position = end_position
        self.is_impossible = is_impossible
        self.boxes = boxes

I checked that class DocvqaExample has no attribute 'answers'.

def squad_evaluate(examples, preds, no_answer_probs=None, no_answer_probability_threshold=1.0):
    qas_id_to_has_answer = {example.qas_id: bool(example.answers) for example in examples}
    has_answer_qids = [qas_id for qas_id, has_answer in qas_id_to_has_answer.items() if has_answer]
    no_answer_qids = [qas_id for qas_id, has_answer in qas_id_to_has_answer.items() if not has_answer]

    if no_answer_probs is None:
        no_answer_probs = {k: 0.0 for k in preds}

    exact, f1 = get_raw_scores(examples, preds)

    exact_threshold = apply_no_ans_threshold(
        exact, no_answer_probs, qas_id_to_has_answer, no_answer_probability_threshold
    )
    f1_threshold = apply_no_ans_threshold(f1, no_answer_probs, qas_id_to_has_answer, no_answer_probability_threshold)

    evaluation = make_eval_dict(exact_threshold, f1_threshold)

    if has_answer_qids:
        has_ans_eval = make_eval_dict(exact_threshold, f1_threshold, qid_list=has_answer_qids)
        merge_eval(evaluation, has_ans_eval, "HasAns")

    if no_answer_qids:
        no_ans_eval = make_eval_dict(exact_threshold, f1_threshold, qid_list=no_answer_qids)
        merge_eval(evaluation, no_ans_eval, "NoAns")

    if no_answer_probs:
        find_all_best_thresh(evaluation, preds, exact, f1, no_answer_probs, qas_id_to_has_answer)

    return evaluation

My transformers version is 2.8.0.
What should I do?
Please, anyone give me some help.

How to run it on the test sample image.

Version of transformers library

Can I get more information about Transformers version? I get a lot of errors when running run_docvqa.py cause Transformers version is not matched with another library. Thank you so much

Not able to post process the model output.

I am not able to process the output. Please refer to the error log.

Traceback (most recent call last)
Input In [12], in <cell line: 7>()
     23 eval_feature = features[example_index.item()]
     24 unique_id = int(eval_feature.unique_id)
---> 26 output = [to_list(output[i]) for output in outputs]
     27 print("type: ",output)
     28 start_logits, end_logits = output

Input In [12], in <listcomp>(.0)
     23 eval_feature = features[example_index.item()]
     24 unique_id = int(eval_feature.unique_id)
---> 26 output = [to_list(output[i]) for output in outputs]
     27 print("type: ",output)
     28 start_logits, end_logits = output

Input In [12], in to_list(tensor)
      4 def to_list(tensor):
----> 5     return tensor.detach().cpu().tolist()

AttributeError: 'str' object has no attribute 'detach'

anisha2102 / docvqa Goto Github PK

docvqa's Introduction

Document Visual Question Answering (DocVQA)

Installation

Virtual Environment Python 3 (Recommended)

Downloads

Prepare dataset

Train the model

Model Checkpoints

Demo

Acknowledgements

docvqa's People

Contributors

Stargazers

Watchers

Forkers

docvqa's Issues

Recommend Projects

Recommend Topics

Recommend Org