Coder Social home page Coder Social logo

jackaduma / alpaca-lora-rlhf-pytorch Goto Github PK

View Code? Open in Web Editor NEW
53.0 5.0 6.0 19.14 MB

A full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware. Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the Alpaca architecture. Basically ChatGPT but with Alpaca

License: MIT License

Python 100.00%
alpaca chatgpt llama llm lora pytorch rlhf gpt finetune deepspeed peft ppo reward-models

alpaca-lora-rlhf-pytorch's Introduction

Alpaca-LoRA-RLHF-PyTorch

a full pipeline to finetune Alpaca LLM with LoRA and RLHF on consumer hardware


Table of Contents


Environment Setup

穷人卡:2080Ti 12G
torch==2.0.0
cuda==11.8

Todo List

  • SFT: Supervised Finetune
  • Merge Adapter into Model
  • RLHF
    • train reward model
    • tuning with RL

Run


Supervised Finetune

 check src/peft/utils/save_and_load.py , Only comment the line 52 to # #to_return = {k: v for k, v in to_return.items() if (("lora_" in k and adapter_name in k) or ("bias" in k))}
python supervised_finetune.py --base_model 'decapoda-research/llama-7b-hf' --data_path 'yahma/alpaca-cleaned' --output_dir './lora-alpaca' --num_epochs 1

Merge PEFT adapter into Model

pip uninstall peft -y
pip install peft==0.2.0  # 0.3.0.dev0 raise many errors
python merge_peft_adapter.py --model_name ./alpaca-lora

Train Reward Model

python train_reward_model.py --model_name 'decapoda-research/llama-7b-hf' --gradient_accumulation_steps 32 --per_device_train_batch_size 1 --train_subset 100 --eval_subset 10 --local_rank 0 --bf16 False

Merge Reward adapter into Model

python merge_peft_adapter.py --model_name ./alpaca-lora-reward-model

Tuning LM with PPO

python tuning_lm_with_rl.py --model_name './lora-alpaca-adapter-merged' --reward_model_name './lora-alpaca-reward-model-adapter-merged' --adafactor False --tokenizer_name 'decapoda-research/llama-7b-hf' --save_freq 100 --output_max_length 128 --batch_size 1 --gradient_accumulation_steps 1 --batched_gen True --ppo_epochs 1 --seed 0 --learning_rate 1.4e-5 --early_stopping True --output_dir './checkpoints/tuning_llama_rl'

Notes

  1. 第一步SFT之前,切记有个注意事项,需要检查下 安装的peft代码, src/peft/utils/save_and_load.py , 如果 line 52 有这行代码 #to_return = {k: v for k, v in to_return.items() if (("lora_" in k and adapter_name in k) or ("bias" in k))},需要将其注释掉,否则在finetune完之后,保存不了 adapter model 的参数。切记!
  2. PEFT的版本,目前从git上安装的是 0.3.0.dev0 版本,在merge_peft_adapter的时候有问题,需要切换到peft==0.2.0 (0.3.0.dev0 没有 _get_submodules()这个函数)
  3. train reward model的时候 会发生另一个问题: ValueError: weight is on the meta device, we need a value to put in on 0. 需要参看 transformer 在github上的最新代码,我在发现这个问题的时候,隔天发现在transformer的github上 8小时前才刚刚修复了这个问题。
  4. 最后一步,代码上基本是ok的,但是本人只有2080Ti的卡,加载完finetune model之后,再加载Reward model的时候 直接CUDA out of memory了,所以并未执行。

Reference

utils & templates 来自 alpaca-lora

requirements 主要是按照 [alpaca-lora](https://github.com/tloen/ alpaca-lora) 来配环境。


Star-History

star-history


Donation

If this project help you reduce time to develop, you can give me a cup of coffee :)

AliPay(支付宝)

ali_pay

WechatPay(微信)

wechat_pay

License

MIT © Kun

alpaca-lora-rlhf-pytorch's People

Contributors

jackaduma avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

alpaca-lora-rlhf-pytorch's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.