qalign,njunlp

Question Translation Training for Better Multilingual Reasoning

⛰️ Overview

This repository shares the code and models of our latest work on multilingual reasoning. In this work, we present a novel X-English question alignment finetuning step which performs targeted language alignment for best use of the LLMs English reasoning abilities.
Utilizing this library, you can finetune open-source LLMs into strong multilingual reasoning systems. For example, our fine-tuned LLaMA2-7B/13B achieves superior multilingual performance, significantly outperforming baseline models of equivalent size.
Overall, our method effectively reduces the performance disparity of LLMs across English and non-English languages, showing a new paradigm to unlock LLM’s capabilities to accompolish multilingual tasks.

📈 Benchmarks

Below we present LLMs' average answer accuracy (zero-shot) on multilingual reasoning benchmarks. With question alignment, our fine-tuned LLM surpasses the unaligned counterpart and the translate-training baseline (MathOctopus) by a large margin.

Our model has been open-sourced on Huggingface.

System (13B)	Monolingual Supervision	Multilingual Supervision	mGSM	mSVAMP
QAlign (ours)	MetaMathQA	-	57.1	62.6
MetaMath	MetaMathQA	-	43.9	51.8
MathOctopus	-	GSM8KInstruct	45.8	46.5
WizardMath	GSM8K & MATH	-	28.3	35.7
MAmmoTh	MathInstruct	-	28.9	38.6
RFT	GSM8k-ScRel	-	29.5	37.1
SFT	GSM8K	-	29.7	38.1

System (7B)	Monolingual Supervision	Multilingual Supervision	mGSM	mSVAMP
QAlign (ours)	MetaMathQA	-	49.6	57.2
MetaMath	MetaMathQA	-	38.4	46.2
MathOctopus	-	GSM8KInstruct	40.0	44.1
WizardMath	GSM8K & MATH	-	23.0	32.5
MAmmoTh	MathInstruct	-	21.3	26.3
RFT	GSM8k-ScRel	-	20.6	31.3
SFT	GSM8K	-	22.6	30.9

📂 Dataset

In the table below, we list datasets that are used in this project. All datasets are available within this repository, with the exception of MetaMathQA. To use MetaMathQA, please download the file MetaMathQA-395K.json with the provided link and place it in the ./data/metamath directory.

Dataset	Usage	Size	Languages	Path
MetaMathQA	Training	395,000	En	./data/metamath
GSM8KInstruct	Training	73,559	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es	./data/gsm8kinstruct
mGSM	Evaluation	2,500	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es	./evaluate/scripts/data/mgsm
mSVAMP	Evaluation	10,000	En, Bn, Th, Sw, Ja, Zh, De, Fr, Ru, Es	./evaluate/scripts/data/msvamp

🧩 Installation

To install this repository, follow these steps:

git clone [email protected]:NJUNLP/QAlign.git
cd QAlign
pip install --editable ./

For detailed information about the conda environment, refer to the environment.yaml file.

🛠️ Training

We develope our training pipeline based on the stanford_alpaca repository.

To perform question alignment and response alignment on pre-trained LLMs, use the following command. Please note that you must replace $PROJECT_PATH with the appropriate paths in finetune.sh or finetune_dp.sh to ensure it is executable. When fine-tuning the 13B model, we utilize DeepSpeed to save memory. You can find our deepspeed configuration in the ./config/ds.json file.

finetuning LLaMA2-7B

bash ./training_scripts/finetune_llama2_7B.sh

finetuning LLaMA2-13B

bash ./training_scripts/finetune_llama2_13B.sh

📏 Evaluation

We use the evaluation code provided by Chen et al., which meansures answer accuracy by comparing the last numerical number that appears in the LLM-generated response with the gold answer.

To evaluate the model on mGSM and mSVAMP dataset, use the following command. Please note that you must replace $PROJECT_PATH and $MODEL_PATH with the appropriate paths in the script to ensure it is executable.

evaluating with mGSM

cd evaluate/scripts

bash evaluate_mgsm.sh

evaluating with mSVAMP

cd evaluate/scripts

bash evaluate_msvamp.sh

🌲 Citation

If you find this repository helpful, feel free to cite our paper:

@misc{zhu2024question,
      title={Question Translation Training for Better Multilingual Reasoning}, 
      author={Wenhao Zhu and Shujian Huang and Fei Yuan and Shuaijie She and Jiajun Chen and Alexandra Birch},
      year={2024},
      eprint={2401.07817},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

njunlp / qalign Goto Github PK

qalign's Introduction

Question Translation Training for Better Multilingual Reasoning

⛰️ Overview

📈 Benchmarks

📂 Dataset

🧩 Installation

🛠️ Training

📏 Evaluation

🌲 Citation

qalign's People

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent