opengvlab / chartast Goto Github PK

ChartAssistant is a chart-based vision-language model for universal chart comprehension and reasoning.

License: Other

Python 99.27% Shell 0.73%

chartast's Introduction

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

This is the PyTorch implementation of the paper ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning, the paper is available at https://arxiv.org/abs/2401.02384

We have developed ChartAssistant, which uses large-scale chart data to align and instruction tuning. The model possesses powerful mathematical computation capabilities and achieves state-of-the-art performance on multiple datasets without the need for downstream dataset finetuning.

This repo is built upon LLaMA2-Accessory

ChartSFT

We have released the ChartSFT in https://huggingface.co/datasets/FanqingM/ChartAssistant 🔥🔥🔥

Note

Some people may have mistakenly tested our chartqa dataset. We have made the chartqa test results public as chartqa_all.json.
An online demo is avaliable at http://106.14.127.192:7861/ 🔥🔥🔥

5.20 We have released the ChartSFT in https://huggingface.co/datasets/FanqingM/ChartAssistant 🔥🔥🔥 Here is some example:

MathQA:

{
"image": "/mnt/petrelfs/share_data/mengfanqing1/mengfanqing/plotQA/train/png/png/35651.png",
"conversations": [
    {
        "from": "human",
        "value": "Please use commandline to solve the math question: What is the difference between the highest and the second highest % of children under 5 who received treatment of Malaria ?"
    },
    {
        "from": "gpt",
        "value": "{'step1': {'func1': 'select', 'arg1': 'malaria', 'output1': ['10.000', '0.900']}, 'step2': {'func2': 'numpy.sort', 'arg2': 'output1', 'output2': 'sorted_arr'}, 'step3': {'func3': 'getitem', 'arg3':['sorted_arr', '-1'], 'output3': 'maximum'}, 'step4': {'func4': 'numpy.delete', 'arg4': ['sorted_arr', '-1'], 'output4': 'new_arr'}, 'step5': {'func5': 'getitem', 'arg5': ['new_arr', '-1'], 'output5': 'second_maximum'}, 'step6': {'func6': 'numpy.subtract', 'arg6': ['maximum', 'second_maximum'], 'output6': 'diff'}}"
    }
]},

ReferQA:

{
        "image": "/mnt/petrelfs/share_data/luquanfeng/referring_box/train/img/52342_1_arrow.png",
        "conversations": [
            {
                "from": "human",
                "value": "Please use commandline to solve the math question about the referring box: What is the median years of the legend represented by the blue arrow per amount(current us$)?"
            },
            {
                "from": "gpt",
                "value": "{'step1': {'func1': 'select', 'arg1': 'ecuador', 'output1': ['139370000.000', '160050000.000', '165410000.000', '174120000.000', '214220000.000']}, 'step2': {'func2': 'numpy.median', 'arg2': 'outpu
t1', 'output2': 'median'}}"
            }
        ]
    },

Arxiv QA:

{
      "image": "/mnt/petrelfs/share_data/mengfanqing1/mengfanqing/arxiv/Img_QA/vbar/1806.09244_table_0.png",
      "conversations": [
          {
              "from": "human",
              "value": "Please answer my question based on the chart: How many counties are included in each dataset?"
          },
          {
              "from": "gpt",
              "value": "The number of counties included in each dataset are as follows: \n- Brazil-Soybean: 1529 counties \n- US-Soybean: 1814 counties \n- US-Maize: 2204 counties."
          }
      ]
  },

2.15 We update the paper,with better performance and more experiments and corrected experimental results.
1.11: The ChartAssistant, which has undergone two-stage training on ChartSFT, has been open-sourced. You can download it through the following link.
- https://pan.baidu.com/s/1t0QPLDfULNovnYKtsQxjOQ password: 10el
- HuggingFace : put consolidated.00-of-02.model.pth and consolidated.01-of-02.model.pth in one directory, and replace pretrained_path in the scipt as it.
1.10: We update the paper(ChartAssistant.pdf), primarily making updates to the model, correcting some errors in the article, and providing more detailed explanations.

ChartAssisstant

Charts play a vital role in data visualization, understanding data patterns, and informed decision-making. However, their unique combination of graphical elements (e.g., bars, lines) and textual components (e.g., labels, legends) poses challenges for general-purpose multimodal models. While vision-language models trained on chart data excel in comprehension, they struggle with generalization. To address these challenges, we propose ChartAssistant, a chart-based vision-language model for universal chart comprehension and reasoning. ChartAssistant leverages ChartSFT, a comprehensive dataset covering diverse chart-related tasks with basic (e.g. bars and pies) and specialized (e.g. radars, and bubbles) chart types. It undergoes a two-stage training process, starting with pre-training on chart-to-table parsing to align chart and text, followed by multitask instruction-following fine-tuning. This approach enables ChartAssistant to achieve competitive performance across various chart tasks. Experimental results demonstrate significant performance gains over the state-of-the-art UniChart and Chartllama method, especially outperforming them on real-world chart data with zero-shot setting.

Environment

It is same as LLaMA2-Accessory

Inference

replace pretrained_path as the pretrained model path

sh accessory/exps/finetune/mm/test.sh
# Please use the params in the test.sh
# run accessory/single_turn_eval.py

Training

sh accessory/exps/finetune/mm/chart.sh
# run accessory/main_finetune.py

Gradio demo

sh accessory/demo/start.sh

Concat

if you have any questions about this work, you can email Fanqing Meng using [email protected] or just by wechat: mfq2052063742

To Do List

Create the git repository.
Open source the model and model weight.
Open source the inference script.
Open source the dataset (ChartSFT).

chartast's People

Contributors

Stargazers

Watchers

Forkers

fanqingm ai-jie01 knowledgehacker huangdaoqin xenos-code

chartast's Issues

About the functions defined in numerical QA

Thank you for the great work! After reading the paper carefully, I have the following questions:

How many functions are defined when constructing COT over numerical QA? It seems that they are numpy functions (Fig.2). Could you provide a list of them?
Do you invoke these numpy functions during evaluation for numerical questions?

Inquiry about the Possibility of Sharing Generated Code Dataset Publicly

Hello,

I hope this message finds you well. I am reaching out to inquire about the possibility of publicly sharing a dataset consisting of generated code.

Could you please provide guidance on whether it is permissible to share such a dataset publicly? Additionally, are there any specific considerations or requirements that I should be aware of when sharing code datasets?

Thank you for your assistance.

Best regards,

Questions about the training details about ChartAst-S

Thanks for your excellent work!
After carefully reading the paper, I still have questions about the trainable setting of ChartAst-S.
Which part of the model is trainable/frozen in pertaining and instruction tuning stages, respectively? Vision tower, projection layer, and LLM?

Thx again.

When will you opensource the ChartSFT dataset?

Thanks for the work, everyone. Could you please share the dataset "ChartSFT" as promised in the paper? Thank you.

Will there be an online demo?

Will there be an online demo? Thanks!

Questions about Installation / Use

I am looking to test ChartAssistant on a certain type of graph my research group is interested in, but I am having trouble installing your model. Some questions:

Do you need GPUs to use your model?
Can you provide any help with installing to test on a Mac or on Google Colab if a GPU is needed.

I have downloaded the consolidated models from HuggingFace and (hopefully correctly) went through the set-up process for LLaMA2-Accessory.

chinese support？

does this model support chinese chart QA？

when to leverage COT for answering

Thank you for the excellent work! After a thorough review of the paper, I have some inquiries:

It's clear that the COT command line is utilized for generating responses to numerical questions involving charts. I noticed that COT is applied within the MathQA dataset as mentioned in your paper. However, for other datasets, was COT employed consistently? How do you determine when to leverage COT for answering, and when to provide direct responses? Specifically, in the evaluation of the ChartQA dataset, was COT used?

I eagerly await your response.

When will ChartSFT be released?

Nice work, when will ChartSFT be released? Thanks

Issue with Accessing the ChartSumm Dataset

Hello,

ChartSumm Dataset is part of your ChartSFT Dataset. I am attempting to access the ChartSumm dataset, but I have encountered an issue with the "chart images.zip" file. It appears to be corrupted. Upon trying to unzip the file, I receive the following error message:

warning [chart images.zip]: 187898719 extra bytes at beginning or within zipfile (attempting to process anyway) error [chart images.zip]: start of central directory not found; zipfile corrupt. (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly)

It seems that the central directory of the zip file cannot be located, suggesting that the file may be corrupt. I have tried downloading the file again and using different unzipping tools, but the issue persists.

Have you encountered the same problem with the "chart images.zip" file from the ChartSumm dataset repository? Any assistance or guidance you could provide would be greatly appreciated.

Thank you for your attention to this matter.

If I want to fine tune on Chinese data, how much data volume and GPU resources do I need?

thanks

Asking for DATA～

Hi! This work is great. May I ask when you can open source your SFT data? Your SFT data seems to be of very high quality, and we hope to be able to use it in our work.

About the Chart-Table Pretrain Data

Hello, thanks for your great work! I'm curious about the re-plot ops of ChartQA and PlotQA to do chart-table-alignment pretrain stage. In my knowledge, PlotQA Dataset doesn't have the .csv file. So how do you get the table data? Looking forward to your reply!

Discrepancy in Data Count Between Paper and Huggingface Dataset

First of all, thank you for your outstanding work!

I noticed that the chart_upload.json file in the Huggingface dataset contains 2,633,068 entries. However, Table 1 in your paper mentions a total of 39M data samples. So I'm wondering which part of the data has not yet been released and are there any plans to release the remaining data samples?

Many thanks！

How many epochs and what is the best loss of both models in the Chart-to-Table Translation stage?

Hi,
I am trying to replicate the Chart-to-Table Translation stage as described in your paper. It's my first time pre-training such a big model on a large-scale dataset so I don't know when I should stop the pre-training loop. Could you please kindly provide the information about the number of epochs and the best loss that you pre-trained both ChartAst-D and ChartAst-S models in the Chart-to-Table Translation stage.

Thank you for your attention to this matter.

Got killed at initializing meta model

Hi, thank you for sharing your work. I am using single_turn_eval.py to do chart summarization tasks. But my model keeps got killed at initializing the model ' model = MetaModel(args.llama_type, args.llama_config, args.tokenizer_path, with_visual=True)'. My GPU has 64GB Ram. Would it be possible to change to a smaller model? Can you give some hints?

Thank you

Issues about evaluating on ChartQA

Thank you for the open sourcing! I want to reproduce the performance of ChartQA of ChartAst-S. I notice there is a yaml file named ./chart_multitask_mixed_othertypebasetype.yaml in the inference script accessory/exps/finetune/mm/test.sh and cannot find it anywhere. How about its content if I am going to inference on ChartQA?
In addition, are the evaluation results of ChartQA in the paper produced by this code ./accessory/eval_mm/evaluate.py?

About metrics of ChartQA

Good job!
你好，请问下关于chartQA的指标，论文中分-M, -H分别列出2个指标，而看到的Qwen，gemini在chartQA上的指标值只有一个，不清楚是否是-M,-H二者求平均的值。看论文注意到这个点，想交流一下～

How did you get instructions for some parts of ChartSFT

Hi,
Congratulations on your great work! I am curious about how you created the instructions for Chart Summarization since the datasets made of this task (e.g. Chart-to-Text, ChartSumm) do not originally come with the instructions. Also, in the pre-trained stage with Chart-to-Table Translation, did you set one fixed instruction for all samples or apply the same strategies to generate instructions as Chart Summarization?

Thank you for your attention to this matter.

arxiv data question

I have obtained the arxiv data from the link, but there are only 23105 arxiv data in Chart-to-Table Translation, which is not consistent with the 132719 number stated in the paper. Looking forward to your reply, thank you. @FanqingM

Seeking Assistance: Location of 'pdf_only_epoch0-iter9999' Checkpoint in Repository

Hello everyone,

I hope this message finds you well. I am currently looking for the 'pdf_only_epoch0-iter9999' checkpoint in the repository. Could someone kindly guide me to its location or provide information on where I can find it?

Thank you in advance for your assistance.

Best regards,

Leandro Takeshi Hattori.

opengvlab / chartast Goto Github PK

chartast's Introduction

ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning

ChartSFT

Note

ChartAssisstant

Environment

Inference

Training

Gradio demo

Concat

To Do List

chartast's People

Contributors

Stargazers

Watchers

Forkers

chartast's Issues

Recommend Projects

Recommend Topics

Recommend Org