Coder Social home page Coder Social logo

qalign's Introduction

Question Translation Training for Better Multilingual Reasoning

๐Ÿ“ƒ Paper | ๐Ÿค— Huggingface | ๐Ÿ“ญ Contact

โ›ฐ๏ธ Overview

  • This repository shares the code and models of our latest work on multilingual reasoning. In this work, we present a novel X-English question alignment finetuning step which performs targeted language alignment for best use of the LLMs English reasoning abilities.
  • Utilizing this library, you can finetune open-source LLMs into strong multilingual reasoning systems. For example, our fine-tuned LLaMA2-7B/13B achieves superior multilingual performance, significantly outperforming baseline models of equivalent size.
  • Overall, our method effectively reduces the performance disparity of LLMs across English and non-English languages, showing a new paradigm to unlock LLMโ€™s capabilities to accompolish multilingual tasks.

๐Ÿ“ˆ Benchmarks

Below we present LLMs' average answer accuracy (zero-shot) on multilingual reasoning benchmarks. With question alignment, our fine-tuned LLM surpasses the unaligned counterpart and the translate-training baseline (MathOctopus) by a large margin.

Our model has been open-sourced on Huggingface.

System (13B) Monolingual Supervision Multilingual Supervision mGSM mSVAMP
QAlign (ours) MetaMathQA - 57.1 62.6
MetaMath MetaMathQA - 43.9 51.8
MathOctopus - GSM8KInstruct 45.8 46.5
WizardMath GSM8K & MATH - 28.3 35.7
MAmmoTh MathInstruct - 28.9 38.6
RFT GSM8k-ScRel - 29.5 37.1
SFT GSM8K - 29.7 38.1
System (7B) Monolingual Supervision Multilingual Supervision mGSM mSVAMP
QAlign (ours) MetaMathQA - 49.6 57.2
MetaMath MetaMathQA - 38.4 46.2
MathOctopus - GSM8KInstruct 40.0 44.1
WizardMath GSM8K & MATH - 23.0 32.5
MAmmoTh MathInstruct - 21.3 26.3
RFT GSM8k-ScRel - 20.6 31.3
SFT GSM8K - 22.6 30.9

๐Ÿ“‚ Dataset

In the table below, we list datasets that are used in this project. All datasets are available within this repository, with the exception of MetaMathQA. To use MetaMathQA, please download the file MetaMathQA-395K.json with the provided link and place it in the ./data/metamath directory.

Dataset Usage Size Languages Path
MetaMathQA Training 395,000 En ./data/metamath
GSM8KInstruct Training 73,559 En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es ./data/gsm8kinstruct
mGSM Evaluation 2,500 En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es ./evaluate/scripts/data/mgsm
mSVAMP Evaluation 10,000 En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es ./evaluate/scripts/data/msvamp

๐Ÿงฉ Installation

To install this repository, follow these steps:

git clone [email protected]:NJUNLP/QAlign.git
cd QAlign
pip install --editable ./

For detailed information about the conda environment, refer to the environment.yaml file.

๐Ÿ› ๏ธ Training

We develope our training pipeline based on the stanford_alpaca repository.

To perform question alignment and response alignment on pre-trained LLMs, use the following command. Please note that you must replace $PROJECT_PATH with the appropriate paths in finetune.sh or finetune_dp.sh to ensure it is executable. When fine-tuning the 13B model, we utilize DeepSpeed to save memory. You can find our deepspeed configuration in the ./config/ds.json file.

  • finetuning LLaMA2-7B
bash ./training_scripts/finetune_llama2_7B.sh
  • finetuning LLaMA2-13B
bash ./training_scripts/finetune_llama2_13B.sh

๐Ÿ“ Evaluation

We use the evaluation code provided by Chen et al., which meansures answer accuracy by comparing the last numerical number that appears in the LLM-generated response with the gold answer.

To evaluate the model on mGSM and mSVAMP dataset, use the following command. Please note that you must replace $PROJECT_PATH and $MODEL_PATH with the appropriate paths in the script to ensure it is executable.

  • evaluating with mGSM
cd evaluate/scripts

bash evaluate_mgsm.sh
  • evaluating with mSVAMP
cd evaluate/scripts

bash evaluate_msvamp.sh

๐ŸŒฒ Citation

If you find this repository helpful, feel free to cite our paper:

@misc{zhu2024question,
      title={Question Translation Training for Better Multilingual Reasoning}, 
      author={Wenhao Zhu and Shujian Huang and Fei Yuan and Shuaijie She and Jiajun Chen and Alexandra Birch},
      year={2024},
      eprint={2401.07817},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

qalign's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.