wenjinw / latin-prompt Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
See results @ https://wandb.ai/jordy-vlan/Layout/runs/fw39mx08/overview?workspace=user-jordy-vlan
For LLama-v2-chat 13B you report val ANLS on DocVQA of 0.4435
, whereas my reproduction reaches 0.6239
; any idea as to why/how?
Exact command used:
LATIN-Prompt/examples/llama_docvqa_due_azure.py --model_name_or_path llama2-7b-chat --dataset_name docvqa_due_azure --output_dir outputs --results_dir results --datas_dir /data/users/sbiswas/DocVQA --wandb_project Layout --run_name llama2-7b-chat__Prompt_task_instruction_space__docvqa_due_azure --prompt task_instruction_space --per_device_eval_batch_size 2
What differs is the eval_batch_size=2
and I am using "NousResearch/Llama-2-7b-chat-hf"
instead of the official checkpoint.
You can check my fork here: https://github.com/Jordy-VL/LATIN-Prompt
; I have added some niceties regarding:
Hey guys!
Very interesting topic and high quality paper from the research team.
After reading the paper and reproducing it with reference to the repository, I had a few questions that led me to raise an issue.
Is it correct that you used the boundingboxes of lines and the boundingboxes of texts in the original OCR provided by RRC leaderboard, and if so, was the performance not good when you used the boundingboxes of texts?
I was wondering if the ANLS values in your paper were calculated with your own code or the results you submitted to the RRC Leaderboard?
If the answer to 1 is LINE, is the overall process correct to use LINE OCR TEXT and TEXT_BOXES to get LAYOUT_RECOVER using SPACE_LAYOUT function and PROMPT_TASK to request GPT-3.5 API?
Thanks.
(P.S. There seems to be a typo in the title of the README.md :) Promot )
Will you opensource your implementation for Alpaca LATIN-tuning? :)
Thank you for sharing the code and for the nice paper.
Can you direct me to where I can find the group devision for InfographicVQA dataset? (table 5)
If it is not public yet, can you make it available?
First off, wanted to say that this is a very interesting way to tackle the issue of layout within a prompt.
Do you have any plans to experiment with fine-tuning GPT3.5 Turbo with the DocVQA dataset? Would be interesting to see if results improve from zero-shot.
One of the innovations brought by your work is to have structure-preserving OCR (implemented as whitespacing and newlines) for the documents as part of the prompts.
From the first Figure in the paper I reckoned you gave literally the string "5_" to indicate 5 whitespaces. What is now the most correct way? Doesn't the tokenizer of the LLM just filter away multiple whitespaces?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.