tiger-ai-lab / mantis Goto Github PK
View Code? Open in Web Editor NEWOfficial code for Paper "Mantis: Multi-Image Instruction Tuning"
Home Page: https://tiger-ai-lab.github.io/Mantis/
License: Apache License 2.0
Official code for Paper "Mantis: Multi-Image Instruction Tuning"
Home Page: https://tiger-ai-lab.github.io/Mantis/
License: Apache License 2.0
Excellent work! BTW, Does the model support Chinese?
Hi, wanna ask ,does mantis used image separator between images sending to LLM? From i can tell, llava doesn't have it and the data used in Mantis doesn't provide a str for separator too.
Also, do u think which way is better? If consider video frames input as well
Hi,
Thank you for open-sourcing this great work. I appreciate the team's efforts in putting this together.
I have a question about the evaluation criteria in the mantis-eval, "short-answer" question specifically. It looks like the correctness of "short-answer" is judged by exact match between model's output and the reference answer, without further parsing(see the edit below). But the prompt template for this type of question also instructs the model to output both analysis and final answer.
In this case, I noticed that a model would give the correct answer (for example, "Yes") followed by some reasoning, but such an answer wouldn't be counted as correct because of how the exact match works.
Could you help me understand why it's written like this? Does it make sense to improve the matching rule? Thanks.
Edit:
I just saw that there is parsing on the model's output that only takes the outputs after "Final Answer: ". This makes much more sense. However, I noticed that sometimes a model would answer correctly but with more than one word.
Do you think it makes sense to loosen the matching criteria? Alternatively, I think it also makes sense to make the instruction more clear in the prompt template, for example, by adding one more sentence like "Answer the question in a single word or phrase."
Nice work! Thanks for contribution.
We are carrying out instruction tuning experiments with Mantis-8B-siglip-llama3
. The pretraining and instruction finetuning with lora work fine, except for full param finetuning. The warning below came up and finetuning got stuck. I put this here for others reference.
Invalidate trace cache @ step 344: expected module 345, but got module 1
Referring to the issue, this might be due to how accelerate
or deepspeed
is installed. Noticing that there is no version specifications in setup.py
from this repo, may we ask the exact versions you use for fine-tuning, for some dependencies like torch
, accelerate
or deepspeed
?
Thanks in advance.
Why not use q-bench2-a1-pair-test.json for q-bench2?
I'm working on fine-tuning Idefics2 with multiple images in instruction
I follow this script for full fine-tuning: https://github.com/TIGER-AI-Lab/Mantis/blob/89d34077bd87b66eaadc13117add553e3a3d4c0b/mantis/train/scripts/train_idefics2_full.sh
Here is the command
NCCL_DEBUG=WARN accelerate launch --config_file=./accelerate_configs/accelerate_config_zero3.yaml \
--machine_rank 0 --main_process_ip 10.29.35.44 --main_process_port 12956 \
--num_machines 1 --num_processes 8 \
train_idefics2.py \
--model_name_or_path HuggingFaceM4/idefics2-8b \
--data_config_file custom_data_config.yaml \
--data_format chat \
--run_name 240523_idefics2_mantis \
--output_dir 240523_idefics2_mantis \
--num_train_epochs 3 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 2 \
--evaluation_strategy "steps" \
--save_strategy "steps" \
--save_steps 200 \
--eval_steps 200 \
--save_total_limit 5 \
--learning_rate 2e-5 \
--weight_decay 0.01 \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 10 \
--gradient_checkpointing True \
--dataloader_num_workers 5 \
--report_to wandb \
--do_train \
--lora_enabled False \
--qlora_enabled False \
--dora_enabled False \
--max_seq_len 512 \
--fp16 \
--attn_implementation eager
Error i got is
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/user/Mantis/mantis/models/idefics2/modeling_idefics2.py", line 1677, in forward
[rank0]: inputs_embeds = self.inputs_merger(
[rank0]: File "/home/user/Mantis/mantis/models/idefics2/modeling_idefics2.py", line 1564, in inputs_merger
[rank0]: new_inputs_embeds[special_image_token_mask] = reshaped_image_hidden_states
[rank0]: RuntimeError: shape mismatch: value tensor of shape [256, 4096] cannot be broadcast to indexing result of shape [192, 4096]
Any suggestions how to fix it?
Thanks in advance
The FUYU model implementation currently lacks support for multi-GPU setups. This issue has already been addressed and fixed in Huggingface's transformers repository.
Here's the link to the PR in the Hugging Face repository: huggingface/transformers#29880
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.