ofa-sys / gsm8k-screl Goto Github PK
View Code? Open in Web Editor NEWCodes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Home Page: https://arxiv.org/abs/2308.01825
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
Home Page: https://arxiv.org/abs/2308.01825
if you have some plan to release the training data?
There seems no people tried your 13b2-u13b version and I may be the first one. But I got 'RuntimeError: mat1 and mat2 shapes cannot be multiplied (111x5120 and 1x2560)' on my inference. While the 7b version works well.
Hi there, is there any chance to share with us your RFT-7B model?
Hi, I wanna to reproduce the result form RFT model for LLaMa 13B, Do you have any plan for that ?
Thanks for this great work. I have two questions: the first one is that the generation code for 7b/13b seems to be missing. The second is about the specific hyperparameter settings. The default hyperparameters set in single_inference_30b.py are not reasonable for generating different reasoning paths.
Thank you for your help!
Hi,
I am trying to run script to finetune 70B model and I am getting this error:
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2699234) of binary:
Any idea what could be the issue?
I am able to train 13B models and I follow all the dependencies version mentioned in the past issues.
Thanks.
Hi,after completing SFT and multipath reasoning, I have some doubts about the data under the data/rft path in your github code base. I would like to ask you how these data are generated from? I see that four data sets are generated after the Filter reasoning path process, and I would like to ask whether the data under data/rft are created from four datasets?
这两个值都是false的情况下,模型每次生成的结果应该是固定的?
Hi,thank you for your excellent work!
I would like to know if augmented datasets like MuggleMath or RFT are suitable for pre-training?
Does SFT contain instruction tuning process?
Could you please directly release datasets for RFT that contains various reasoning paths?
This issue is closed related to #9 and #8. However, after taking their insights into consideration, I still only achieve scores of 24.86 (llama-7b) and 26.99 (llama2-7b) on the gsm8k training set (7.4K, 3 epoch), 41.6% as mentioned in the paper. Here are the specifics:
Environment:
Hardware: 2 X A100 80G GPUs
Software: transformers==4.29.2
Training configuration:
CUDA_VISIBLE_DEVICES=0,1 python3 -m torch.distributed.launch --master_addr ${MASTER_ADDR} --master_port ${MASTER_PORT} --nproc_per_node=2 --use_env train.py
--model_name_or_path $MODEL_PATH
--data_path $2
--bf16 True
--output_dir $SAVE_PATH
--num_train_epochs 1
--per_device_train_batch_size 32
--per_device_eval_batch_size 32
--gradient_accumulation_steps 16
--evaluation_strategy "no"
--save_strategy "steps"
--save_steps 50000
--save_total_limit 1
--learning_rate 2e-5
--weight_decay 0.
--warmup_ratio 0.03
--lr_scheduler_type "cosine"
--logging_steps 1
--fsdp "full_shard auto_wrap"
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer'
--tf32 True
--gradient_checkpointing True
For both training and testing, I used the tokenizer from huggyllama/llama-7b. No significant issues were detected during the training process. However, I suspect some underlying differences in environment or methodology which may be causing this performance gap.
I would appreciate any insights or suggestions to help bridge this discrepancy and achieve the expected performance.
Thanks
总结:
可以复现 49+ 的分数,需要注意 1. 使用 LlamaTokenizer 2. pad_token 效果有问题,需要排除其干扰
https://github.com/Haskely/gsm8k-rft-llama7b-u13b_evaluation/tree/main
使用
from transformers import AutoTokenizer
model_path = "OFA-Sys/gsm8k-rft-llama7b-u13b"
tokenizer = AutoTokenizer.from_pretrained(model_path)
会报错:
File "/data/public/zhangzixin/conda_envs/nova/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/public/zhangzixin/conda_envs/nova/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
return self.unk_token_id
^^^^^^^^^^^^^^^^^
File "/data/public/zhangzixin/conda_envs/nova/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/public/zhangzixin/conda_envs/nova/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/public/zhangzixin/conda_envs/nova/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
return self.unk_token_id
^^^^^^^^^^^^^^^^^
File "/data/public/zhangzixin/conda_envs/nova/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id
return self.convert_tokens_to_ids(self.unk_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/public/zhangzixin/conda_envs/nova/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids
return self._convert_token_to_id_with_added_voc(tokens)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/public/zhangzixin/conda_envs/nova/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc
return self.unk_token_id
^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded
但是我检查本仓库源码,加载方式是一样的:
Line 185 in f4d0176
我的 transformers 版本:transformers 4.31.0
PS:我手动使用 LlamaTokenizer.from_pretrained(model_path)
不会报错, 暂时按这种方式测分了
Hello, I'm trying to reproduce your results for two settings with llama2-7b, but I cannot get as high scores as those mentioned in the paper.
Btw, while training on 8 Nvidia-A800-80g gpus, I always got torch.cuda.OutOfMemoryError. So I divide the micro-batch-size-per-gpu
by 2 and double the gradient-accumulation-step
.
Is this because we are using different GPUs/environments?
Could you please share a requirement.txt
about your environment or certain checkpoint/seeds to help reproducing your result.
Thanks!
Lines 43 to 45 in f4d0176
gsm8k-ScRel/train_llama_30b_65b.py
Lines 51 to 53 in f4d0176
gsm8k-ScRel/train_llama2_70b.py
Lines 51 to 53 in f4d0176
gsm8k-ScRel/group_test_7b_13b.py
Lines 109 to 110 in f4d0176
Lines 109 to 110 in f4d0176
而
gsm8k-ScRel/single_inference_30b.py
Lines 114 to 115 in f4d0176
gsm8k-ScRel/single_inference_65b.py
Lines 113 to 114 in f4d0176
Could you please provide official enviroment of your project, like requirements.txt?
Hello,
Thank you for sharing the invaluable code. While attempting to replicate your work, I noticed that the test_7b_13b.sh script references a test.py file, but it seems to be missing from the repository. Would you be able to add this file? It would be immensely helpful for researchers like us who are trying to replicate your work.
Best
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.