Coder Social home page Coder Social logo

eqtpartners / genception Goto Github PK

View Code? Open in Web Editor NEW
11.0 10.0 1.0 618 KB

GenCeption is an annotation-free MLLM (Multimodal Large Language Model) evaluation framework that merely requires unimodal data to assess inter-modality semantic coherence and inversely reflects the models' inclination to hallucinate.

License: MIT License

Python 100.00%

genception's Introduction

⚠️ This repository has migrated ⚠️

This repository will not be maintained any further, and issues and pull requests may be ignored. For an up to date codebase, issues, and pull requests, please continue to the new repository.


Evaluate Multimodal LLMs with Unlabeled Unimodal Data


🔥🏅️🤗 Leaderboard🏅️🔥 •  Contribute •  Paper •  Citation

GenCeption is an annotation-free MLLM (Multimodal Large Language Model) evaluation framework that merely requires unimodal data to assess inter-modality semantic coherence and inversely reflects the models' inclination to hallucinate.

GenCeption Procedure

GenCeption is inspired by a popular multi-player game DrawCeption. Using the image modality as an example, the process begins with a seed image $\mathbf{X}^{(0)}$ from a unimodal image dataset for the first iteration ($t$=1). The MLLM creates a detailed description of the image, which is then used by an image generator to produce $\mathbf{X}^{(t)}$. After $T$ iterations, we calculate the GC@T score to measure the MLLM's performance on $\mathbf{X}^{(0)}$.

The GenCeption ranking on MME benchmarking dataset (without using any label) shows a strong correlation with other sophisticated benchmarks such as OpenCompass and HallusionBench. Moreover, the negative correlation with MME scores suggests that GenCeption measures distinct aspects not covered by MME, using the same set of samples. For detailed experimental analysis, please read our paper.

We demostrate a 5-iteration GenCeption procedure below run on a seed images to evaluate 4 VLLMs. Each iteration $t$ shows the generated image $\mathbf{X}^{(t)}$, the description $\mathbf{Q}^{(t)}$ of the preceding image $\mathbf{X}^{(t-1)}$, and the similarity score $s^{(t)}$ relative to $\mathbf{X}^{(0)}$. The GC@5 metric for each VLLM is also presented. Hallucinated elements within descriptions $\mathbf{Q}^{(1)}$ and $\mathbf{Q}^{(2)}$ as compared to the seed image are indicated with red underlined.

GenCeption Example

Contribute

The GenCeption evaluation utilizes MME images, you can request it as described here. We recommend to start by creating your virtual environment and installing dependencies:

conda create --name genception python=3.10 -y
conda activate genception
pip install -r requirements.txt

Firsly, you need to make sure setup the MLLM properly. For example, follow this to set up mPLUG-OWL2, follow this to config ChatGPT-4v, follow this to config Claude-3, and so on.

Secondly, you need to create your evaluation code by referring to how it is done for GPT, LLaVa, mPLUG, Claude and so on. Of course, you need to run through your code, for example, GenCeption on GPT-4o (assuming a proper configuration of OPENAI_API_KEY) is run by

python -m genception.exp_gpt --dataset=datasets/examples --model=gpt-4o

Finally, run the following to calculate GC@T (T=3) metric:

python -m genception.evaluation --results_path=datasets/examples/results_gpt-4o --t=3

This will generate a [email protected] file under the same path.

Contribute to leaderboard

After evaluating a model, please create a PR (Pull-Request) in the 🤗 Space and add your model details and results to leaderboard/leaderboard.json. This will add your results to the 🔥🏅️Leaderboard🏅️🔥.

Contribute to code base

To add your evaluation code, please submit a PR in this GitHub repository.

Cite This Work

@article{cao2023genception,
    author = {Lele Cao and
              Valentin Buchner and
              Zineb Senane and
              Fangkai Yang},
    title = {{GenCeption}: Evaluate Multimodal LLMs with Unlabeled Unimodal Data},
    year={2023},
    journal={arXiv preprint arXiv:2402.14973},
    primaryClass={cs.AI,cs.CL,cs.LG}
}

genception's People

Contributors

valentinbuc avatar cao-lele avatar

Stargazers

Drew M. avatar  avatar Valentin Buchner avatar Zineb Senane avatar Sterl avatar  avatar Guillermo Rodas avatar  avatar Francesc avatar Lele Cao avatar  avatar

Watchers

Dhiana Deva avatar Daniel Marell avatar Lele Cao avatar Erik Ferm avatar  avatar Astrid Berghult avatar Kristian Petersen avatar Kostas Georgiou avatar Valentin Buchner avatar  avatar

Forkers

llcresearch

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.