Coder Social home page Coder Social logo

spc121 / mplug-owl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from x-plug/mplug-owl

4.0 0.0 0.0 19.42 MB

mPLUG-Owl🦉: Modularization Empowers Large Language Models with Multimodality

Home Page: https://www.modelscope.cn/studios/damo/mPLUG-Owl

License: Apache License 2.0

Python 100.00%

mplug-owl's Introduction

mPLUG-Owl🦉: Modularization Empowers Large Language Models with Multimodality

Qinghao Ye*, Haiyang Xu*, Guohai Xu*, Jiabo Ye, Ming Yan†, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang

DAMO Academy, Alibaba Group

*Equal Contribution; † Corresponding Author

English | 简体中文


Examples

Training paradigm and model overview Training paradigm and model overview

News

  • We provide an online demo on modelscope for the public to experience.
  • We released code of mPLUG-Owl🦉 with its pre-trained and instruction tuning checkpoints.

Spotlights

  • A new training paradigm with a modularized design for large multi-modal language models.
  • Learns visual knowledge while support multi-turn conversation consisting of different modalities.
  • Observed abilities such as multi-image correlation and scene text understanding, vision-based document comprehension.
  • Release a visually-related instruction evaluation set OwlEval.

Training paradigm and model overview

Online Demo

Demo of mPLUG-Owl on Modelscope

Checkpoints

Model Phase Download link
mPLUG-Owl 7B Pre-training Download link
mPLUG-Owl 7B Instruction tuning Download link
Tokenizer model N/A Download link

Usage

Install Requirements

Core library dependency:

  • PyTorch=1.12.1
  • transformers=4.28.1
  • Apex
  • einops
  • icecream
  • flask
  • ruamel.yaml
  • uvicorn
  • fastapi
  • markdown2
  • gradio

You can also refer to the exported Conda environment configuration file env.yaml to prepare your environments.

Local Demo

We provide a script to deploy a simple demo in your local machine.

python -m server_mplug.owl_demo --debug --port 6363 --checkpoint_path 'your checkpoint path' --tokenizer_path 'your tokenizer path'

Inference

Build model, toknizer and processor.

from interface import get_model
model, tokenizer, img_processor = get_model(
        checkpoint_path='checkpoint path', tokenizer_path='tokenizer path')

Prepare model inputs.

# We use a human/AI template to organize the context as a multi-turn conversation.
# <image> denotes an image placehold.
prompts = [
'''The following is a conversation between a curious human and AI assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
Human: <image>
Human: Explain why this meme is funny.
AI: ''']

# The image paths should be placed in the image_list and kept in the same order as in the prompts.
# We support urls, local file paths and base64 string. You can custom the pre-process of images by modifying the mplug_owl.modeling_mplug_owl.ImageProcessor
image_list = ['https://xxx.com/image.jpg',]

Get response.

# generate kwargs (the same in transformers) can be passed in the do_generate()
from interface import do_generate
sentence = do_generate(prompts, image_list, model, tokenizer,
                               img_processor, max_length=512, top_k=5, do_sample=True)

Performance Comparison

The comparison results of 50 single-turn responses (left) and 52 multi-turn responses (right) between mPLUG-Owl and baselines with manual evaluation metrics. A/B/C/D denote the rate of each response. Comparison Results

Coming Soon

  • Instruction tuning code.
  • Multi-lingustic support (e.g., Chinese, Japanese, Germen, French, etc.)
  • A visually-related evaluation set OwlEval to comprehensively evaluate various models.

Releated Projects

  • LLaMA. A open-source collection of state-of-the-art large pre-trained language models.
  • Baize. An open-source chat model trained with LoRA on 100k dialogs generated by letting ChatGPT chat with itself.
  • Alpaca. A fine-tuned model trained from a 7B LLaMA model on 52K instruction-following data.
  • LoRA. A plug-and-play module that can greatly reduce the number of trainable parameters for downstream tasks.
  • LLaVA. A visual instruction tuned vision language model which achieves GPT4 level capabilities.
  • mPLUG. A vision-language foundation model for both cross-modal understanding and generation.
  • mPLUG-2. A multimodal model with a modular design, which inspired our project.

Citation

If you found this work useful, consider giving this repository a star and citing our paper as followed:

@article{ye2023mplugowl,
  title={mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality},
  author={Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan†, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang},
  year={2023}
}

mplug-owl's People

Contributors

lukeforeveryoung avatar magaer13 avatar

Stargazers

 avatar  avatar  avatar Pengcheng Shi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.