cerc-aai / robin Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Hi! Thanks for your great work!
I want to ask if it is possible if you can kindly share mm_projector.bin file from your pretrained checkpoints for cerc-aai/mistral-7b-oh-siglip-so400m-finetune-lora model?
How best to create synthetic data using GPT-4V to improve multi image reasoning capabilities?
Add an argument that can be passed to deepspeed in the pretrain and finetune launch scripts. This argument will be used to pick which VE class to use.
Types needed: CLIP, OCLIP, timm
For OCLIP models it is also necessary to pass the config of the VE to enable local models (see load_model in open_clip.py), and if the model uses Timm in OpenClip
Working directory: robin/model/multimodal_encoder/
Additional objectives:
model_name == ...
Strongly linked to #25
Look into multi-GPU batch inference
Setup code for individual clusters more cleanly
Make rank0_print from train.py available in all the training files and replace all the prints with rank0_print
How much impact does training on paired vs interleaved data have on performance
A major documentation over-haul will be needed before next release
Hello there!
Thanks for the awesome work which I am currently using in my own work. I would like to cite your repo. How would you prefer I do this?
I've conducted additional evaluations for the various model configurations available in the Robin repository. My intention is to provide these results to the community for further insights and potential improvements.
Methodology:
The evaluations were conducted using three benchmarks: MM-Vet, SEED-Benchv1, and MMBench. Below is a summary of the models evaluated along with their corresponding results using the original LlaVA evaluation scripts:
Model Name | Image Model | Text Model | MM-Vet | SEED-Benchv1 | MMBench |
---|---|---|---|---|---|
liuhaotian/llava-v1.5-7b | CLIP-ViT-L/14 336 | lmsys/vicuna-7b-v1.5 | 31.1 | 58.60 | 64.3 |
liuhaotian/llava-v1-7b | CLIP-ViT-L/14 336 | lmsys/vicuna-7b-v1.3 | 28.1 | 33.52 | 59.2 |
liuhaotian/llava-v1.5-7b | CLIP-ViT-L/14 336 | meta-llama/Llama-2-7b-chat-hf | 30.1 | 54.68 | 56.78 |
agi-collective/mistral-7b-siglip-so400m-finetune-lora | SigLIP--ViT-L/14 384 | mistralai/Mistral-7B-v0.1 | 25.7 | 53.33 | 57.47 |
agi-collective/mistral-7b-oh-siglip-so400m-frozen-ve-finetune-lora | SigLIP--ViT-L/14 384 | teknium/OpenHermes-2.5-Mistral-7B | 35.8 | 57.39 | 63.8 |
Observations and Considerations:
I hope these results are helpful for the ongoing development and refinement of the models in the Robin repository. Your work in creating and maintaining these models is highly appreciated by the community.
Thank you for your dedication to advancing the field of AI.
Hi there!
Great work on training the model and the release!
Llava models should be compatible with HF transfomrers library, we recently added Llava 1.5, BakLlava and Vip-llava as well. If the architecture is exactly the same as Llava, with minor changes (different LM and different CLIP encoder), the integration should work out of the box using this conversion script: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py
You just need to push or save the raw state_dict of your model.
Here are some examples of converted models: https://huggingface.co/llava-hf
I can also help if needed!
Thanks again and looking forward to hearing from you
Hello there!
Is it possible to prompt with multiple images? If so, it's not obvious to me how this can be done without altering your code.
For example
from robin.serve.pipeline import LlavaMistralPipeline
pipe = LlavaMistralPipeline(
model_path="agi-collective/mistral-7b-oh-siglip-so400m-finetune-lora",
model_base="teknium/OpenHermes-2.5-Mistral-7B",
)
messages = [
{"role": "USER", "content": "compare and contrast these images", "image": ['path/to/img/1.jpg', 'path/to/img/2.jpg']},
]
messages = pipe(messages)
I can make a PR if this has not been implemented yet :)
Thank you!
Write a script (python and/or bash) to convert an intermediate finetuning checkpoint to the final model format for upload to HF and evaluation
Experiment with different architectures/data to improve OCR capabilities
Create a python class that is the only thing the end user should interact with
the creation of an instance of this class should be: model = Robin('agi-collective/[model_name]')
Base LLM, VE type... can be found in the model config.json
There should be optional arguments like temperature, conversation type...
There should be a .predict('[text]', '[image_path/URL]')
function
Any other standard or useful arguments are welcome!
The forward pass of training and LlavaMistralPipeline can be good inspiration.
The aim is to simplify the overly complex serve folder, remove all the redundant code in cli, pipeline..., do the same in eval, and have a simple Robin.predict([text], [image])
call
Do a literature review on different methods that can be used to implement video understanding and implement a PoC
Creation of scripts that automatically download all of the data for each benchmark
I'm trying to run this on an ubuntu 20.04 instance with an Nvidia A10 GPU and am getting this error:
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/aiworker/monorepo/scripts/agi_llava.py", line 1, in <module>
from robin.serve.pipeline import LlavaMistralPipeline
File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/robin/__init__.py", line 1, in <module>
from .model import LlavaLlamaForCausalLM
File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/robin/model/__init__.py", line 1, in <module>
from .language_model.llava_llama import LlavaLlamaForCausalLM, LlavaConfig
File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/robin/model/language_model/llava_llama.py", line 22, in <module>
from transformers import AutoConfig, AutoModelForCausalLM, \
File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1336, in __getattr__
value = getattr(module, name)
File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1335, in __getattr__
module = self._get_module(self._class_to_module[name])
File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1347, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
It seems to be some incompatibility with the flash-attn package but I have yet to find the root cause. Any ideas? Perhaps dependencies need to be updated?
If it doesn't work see fix in previous codebase: https://github.com/AGI-Collective/robin_llava-private/pull/3
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.