michaelnny / mm-llama Goto Github PK
View Code? Open in Web Editor NEWBring multimodality to the LLaMA model by leveraging ImageBind as the modal encoder. This project supports vision input (both images and short videos) to the LLaMA model, with text output generated by LLaMA.
License: MIT License