Coder Social home page Coder Social logo

robin's People

Contributors

abdul avatar abdullamatar avatar alexis-bx avatar chunyuanli avatar daniel-z-kaplan avatar didier-durand avatar diracdeltas avatar eggry avatar eltociear avatar filipe-m-almeida avatar haotian-liu avatar hill2hill avatar hyj1991 avatar kshitijkg avatar kznrluk avatar li-qingyun avatar monatis avatar omahs avatar paradoxzw avatar payne911 avatar phantask avatar quentin-anthony avatar rmst avatar satyajitghana avatar shimonmalnick avatar simonw avatar subash-lamichhane avatar yvrjsharma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

robin's Issues

Pretrained projector checkpoint

Hi! Thanks for your great work!
I want to ask if it is possible if you can kindly share mm_projector.bin file from your pretrained checkpoints for cerc-aai/mistral-7b-oh-siglip-so400m-finetune-lora model?

Change config style in file/code

Add the option to run with a config file instead of passing all the arguments in the launch script (keep both possible)
Try removing if statements that check for things like: model_path, model_base, model_name, move to config. (linked to issues #18 #26 )

Add VE-Type argument

Add an argument that can be passed to deepspeed in the pretrain and finetune launch scripts. This argument will be used to pick which VE class to use.
Types needed: CLIP, OCLIP, timm
For OCLIP models it is also necessary to pass the config of the VE to enable local models (see load_model in open_clip.py), and if the model uses Timm in OpenClip
Working directory: robin/model/multimodal_encoder/
Additional objectives:

  1. remove the encoder_info.py file
  2. remove all conditions with model_name == ...

Improve printing quality

Make rank0_print from train.py available in all the training files and replace all the prints with rank0_print

Documentation

A major documentation over-haul will be needed before next release

Citation

Hello there!
Thanks for the awesome work which I am currently using in my own work. I would like to cite your repo. How would you prefer I do this?

Evaluation Results of Different Model Configurations

I've conducted additional evaluations for the various model configurations available in the Robin repository. My intention is to provide these results to the community for further insights and potential improvements.

Methodology:
The evaluations were conducted using three benchmarks: MM-Vet, SEED-Benchv1, and MMBench. Below is a summary of the models evaluated along with their corresponding results using the original LlaVA evaluation scripts:

Model Name Image Model Text Model MM-Vet SEED-Benchv1 MMBench
liuhaotian/llava-v1.5-7b CLIP-ViT-L/14 336 lmsys/vicuna-7b-v1.5 31.1 58.60 64.3
liuhaotian/llava-v1-7b CLIP-ViT-L/14 336 lmsys/vicuna-7b-v1.3 28.1 33.52 59.2
liuhaotian/llava-v1.5-7b CLIP-ViT-L/14 336 meta-llama/Llama-2-7b-chat-hf 30.1 54.68 56.78
agi-collective/mistral-7b-siglip-so400m-finetune-lora SigLIP--ViT-L/14 384 mistralai/Mistral-7B-v0.1 25.7 53.33 57.47
agi-collective/mistral-7b-oh-siglip-so400m-frozen-ve-finetune-lora SigLIP--ViT-L/14 384 teknium/OpenHermes-2.5-Mistral-7B 35.8 57.39 63.8

Observations and Considerations:

  • The results vary across different benchmarks and model configurations.
  • It's important to note that while some numbers are lower, this does not necessarily imply inferior model performance. At this level, evaluations can be quite subjective.
    -It is encouraged for users and developers to interact with the models directly for a more comprehensive understanding of their capabilities and characteristics.
  • Further exploration and fine-tuning might be beneficial for certain model configurations.
  • Community feedback on these configurations can be valuable for future improvements.

I hope these results are helpful for the ongoing development and refinement of the models in the Robin repository. Your work in creating and maintaining these models is highly appreciated by the community.

Thank you for your dedication to advancing the field of AI.

๐Ÿค— transformers compatiblity

Hi there!
Great work on training the model and the release!

Llava models should be compatible with HF transfomrers library, we recently added Llava 1.5, BakLlava and Vip-llava as well. If the architecture is exactly the same as Llava, with minor changes (different LM and different CLIP encoder), the integration should work out of the box using this conversion script: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava/convert_llava_weights_to_hf.py
You just need to push or save the raw state_dict of your model.

Here are some examples of converted models: https://huggingface.co/llava-hf

I can also help if needed!
Thanks again and looking forward to hearing from you

multiple images per prompt

Hello there!

Is it possible to prompt with multiple images? If so, it's not obvious to me how this can be done without altering your code.
For example

from robin.serve.pipeline import LlavaMistralPipeline

pipe = LlavaMistralPipeline(
model_path="agi-collective/mistral-7b-oh-siglip-so400m-finetune-lora",
model_base="teknium/OpenHermes-2.5-Mistral-7B",
)

messages = [
{"role": "USER", "content": "compare and contrast these images", "image": ['path/to/img/1.jpg', 'path/to/img/2.jpg']},
]
messages = pipe(messages)

I can make a PR if this has not been implemented yet :)

Thank you!

Convert checkpoint to usable model

Write a script (python and/or bash) to convert an intermediate finetuning checkpoint to the final model format for upload to HF and evaluation

Improve OCR capabilities

Experiment with different architectures/data to improve OCR capabilities

  • examine OCR failure cases in benchmarks
  • come up with theory on why things are failing
  • devise solution
  • implement solution

Create a end-user model class

Create a python class that is the only thing the end user should interact with

the creation of an instance of this class should be: model = Robin('agi-collective/[model_name]')

Base LLM, VE type... can be found in the model config.json

There should be optional arguments like temperature, conversation type...

There should be a .predict('[text]', '[image_path/URL]') function

Any other standard or useful arguments are welcome!

The forward pass of training and LlavaMistralPipeline can be good inspiration.

The aim is to simplify the overly complex serve folder, remove all the redundant code in cli, pipeline..., do the same in eval, and have a simple Robin.predict([text], [image]) call

Video understanding

Do a literature review on different methods that can be used to implement video understanding and implement a PoC

Eval download scripts

Creation of scripts that automatically download all of the data for each benchmark

Issues running on Nvidia A10

I'm trying to run this on an ubuntu 20.04 instance with an Nvidia A10 GPU and am getting this error:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/aiworker/monorepo/scripts/agi_llava.py", line 1, in <module>
    from robin.serve.pipeline import LlavaMistralPipeline
  File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/robin/__init__.py", line 1, in <module>
    from .model import LlavaLlamaForCausalLM
  File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/robin/model/__init__.py", line 1, in <module>
    from .language_model.llava_llama import LlavaLlamaForCausalLM, LlavaConfig
  File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/robin/model/language_model/llava_llama.py", line 22, in <module>
    from transformers import AutoConfig, AutoModelForCausalLM, \
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1336, in __getattr__
    value = getattr(module, name)
  File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1335, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1347, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/home/aiworker/miniconda/envs/new1/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi

It seems to be some incompatibility with the flash-attn package but I have yet to find the root cause. Any ideas? Perhaps dependencies need to be updated?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.