Coder Social home page Coder Social logo

dpo训练异常 about llms_tool HOT 16 CLOSED

stanleylsx avatar stanleylsx commented on May 16, 2024
dpo训练异常

from llms_tool.

Comments (16)

tuzeao avatar tuzeao commented on May 16, 2024 1

没有碰到过。我这边代码基本没有碰上deepspeed相关的问题。
顺便google了一下,感觉似乎是端口设置不当的问题,建议在启动脚本的时候指定好deepspeed的host文件或者 master_port, include 等等参数试试

from llms_tool.

tuzeao avatar tuzeao commented on May 16, 2024

主体代码并没有大的改动,且也用的是咱们的example数据

from llms_tool.

stanleylsx avatar stanleylsx commented on May 16, 2024

主体代码并没有大的改动,且也用的是咱们的example数据

收到,我的数据集只是简单的调试数据集,跑起来的效果是不能确保有效。如果需要用数据跑出来一个有效的基线,我可以把这个调试数据集换成正式的数据集。

from llms_tool.

tuzeao avatar tuzeao commented on May 16, 2024

我试了一下来自https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLMcomparison_gpt4_data_zh.json数据集共48k,并且尝试把chosen和reject进行对调(为了验证代码可行性),发现还是如上图所示loss不变,'rewards/chosen': 0.0, 'rewards/rejected': 0.0的情况

from llms_tool.

stanleylsx avatar stanleylsx commented on May 16, 2024

我试了一下来自https://github.com/Instruction-Tuning-with-GPT-4/GPT-4-LLMcomparison_gpt4_data_zh.json数据集共48k,并且尝试把chosen和reject进行对调(为了验证代码可行性),发现还是如上图所示loss不变,'rewards/chosen': 0.0, 'rewards/rejected': 0.0的情况

你的学习率给的多少呢?

from llms_tool.

tuzeao avatar tuzeao commented on May 16, 2024

如下:

deepspeed \
    --include="localhost:"${gpus} \
    --master_port=9909 \
    main.py \
    --deepspeed deepspeed_configs/zero_stage2_config.json \
    --mode dpo_train \
    --fine_tuning_type full \
    --model_path ${model_path} \
    --output_dir ${output_dir} \
    --cache_data_path ${cache_dir} \
    --task_name ${task_name} \
    --do_train \
    --num_train_epochs 1.0 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --learning_rate 1e-5 \
    --save_strategy epoch \
    --logging_steps 1 \
    --model_type baichuan \
    --prompt_template baichuan

from llms_tool.

tuzeao avatar tuzeao commented on May 16, 2024

如下:

deepspeed \
    --include="localhost:"${gpus} \
    --master_port=9909 \
    main.py \
    --deepspeed deepspeed_configs/zero_stage2_config.json \
    --mode dpo_train \
    --fine_tuning_type full \
    --model_path ${model_path} \
    --output_dir ${output_dir} \
    --cache_data_path ${cache_dir} \
    --task_name ${task_name} \
    --do_train \
    --num_train_epochs 1.0 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --learning_rate 1e-5 \
    --save_strategy epoch \
    --logging_steps 1 \
    --model_type baichuan \
    --prompt_template baichuan

补充:模型为Baichuan-13B-Chat,全量参数微调,dpo训练

from llms_tool.

stanleylsx avatar stanleylsx commented on May 16, 2024

如下:

deepspeed \
    --include="localhost:"${gpus} \
    --master_port=9909 \
    main.py \
    --deepspeed deepspeed_configs/zero_stage2_config.json \
    --mode dpo_train \
    --fine_tuning_type full \
    --model_path ${model_path} \
    --output_dir ${output_dir} \
    --cache_data_path ${cache_dir} \
    --task_name ${task_name} \
    --do_train \
    --num_train_epochs 1.0 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --learning_rate 1e-5 \
    --save_strategy epoch \
    --logging_steps 1 \
    --model_type baichuan \
    --prompt_template baichuan

补充:模型为Baichuan-13B-Chat,全量参数微调,dpo训练

已经找到原因了正在fix

from llms_tool.

LuJunru avatar LuJunru commented on May 16, 2024

@tuzeao Hello, 请教一下13B微调dpo需要多少显存和内存?我看代码里有个deepcopy,用的时候一直报oom

from llms_tool.

tuzeao avatar tuzeao commented on May 16, 2024

有deepcopy是因为dpo需要一个ref model来作为参考,不论是deepcopy还是重新model.from_pretrained本质都是为了这个。
用stage3可以跑起来,stage2不行,基本都是8*A100打满,单卡肯定不行。

from llms_tool.

LuJunru avatar LuJunru commented on May 16, 2024

有deepcopy是因为dpo需要一个ref model来作为参考,不论是deepcopy还是重新model.from_pretrained本质都是为了这个。 用stage3可以跑起来,stage2不行,基本都是8*A100打满,单卡肯定不行。

@tuzeao 感谢。修复了oom问题后,还遇到一个情况是程序直接在deepspeed初始化阶段跳出了,也没有任何其它报错信息,请问有遇到过类似的情况吗?我的代码基本上和sft没什么区别,只是改了dpo trainer:

Time to load utils op: 0.4331231117248535 seconds
[2023-08-31 19:02:27,894] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [begin]
[2023-08-31 19:02:27,895] [INFO] [utils.py:786:see_memory_usage] MA 12.74 GB         Max_MA 12.74 GB         CA 14.31 GB         Max_CA 14 GB 
[2023-08-31 19:02:27,895] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 16.27 GB, percent = 4.3%
Parameter Offload: Total persistent parameters: 266240 in 65 params
[2023-08-31 19:02:28,030] [INFO] [utils.py:785:see_memory_usage] DeepSpeedZeRoOffload initialize [end]
[2023-08-31 19:02:28,031] [INFO] [utils.py:786:see_memory_usage] MA 12.74 GB         Max_MA 12.74 GB         CA 14.31 GB         Max_CA 14 GB 
[2023-08-31 19:02:28,031] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 16.28 GB, percent = 4.3%
[2023-08-31 19:02:28,148] [INFO] [utils.py:785:see_memory_usage] Before creating fp16 partitions
[2023-08-31 19:02:28,149] [INFO] [utils.py:786:see_memory_usage] MA 12.74 GB         Max_MA 12.74 GB         CA 14.31 GB         Max_CA 14 GB 
[2023-08-31 19:02:28,149] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 16.27 GB, percent = 4.3%
[2023-08-31 19:02:40,532] [INFO] [utils.py:785:see_memory_usage] After creating fp16 partitions: 7
[2023-08-31 19:02:40,533] [INFO] [utils.py:786:see_memory_usage] MA 12.74 GB         Max_MA 12.74 GB         CA 13.54 GB         Max_CA 14 GB 
[2023-08-31 19:02:40,533] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 16.26 GB, percent = 4.3%
[2023-08-31 19:02:40,654] [INFO] [utils.py:785:see_memory_usage] Before creating fp32 partitions
[2023-08-31 19:02:40,655] [INFO] [utils.py:786:see_memory_usage] MA 12.74 GB         Max_MA 12.74 GB         CA 13.54 GB         Max_CA 14 GB 
[2023-08-31 19:02:40,655] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 16.26 GB, percent = 4.3%
[2023-08-31 19:02:55,692] [INFO] [utils.py:785:see_memory_usage] After creating fp32 partitions
[2023-08-31 19:02:55,693] [INFO] [utils.py:786:see_memory_usage] MA 12.74 GB         Max_MA 12.74 GB         CA 13.54 GB         Max_CA 14 GB 
[2023-08-31 19:02:55,693] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 41.39 GB, percent = 11.0%
[2023-08-31 19:02:55,814] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states
[2023-08-31 19:02:55,815] [INFO] [utils.py:786:see_memory_usage] MA 12.74 GB         Max_MA 12.74 GB         CA 13.54 GB         Max_CA 14 GB 
[2023-08-31 19:02:55,815] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory:  used = 41.42 GB, percent = 11.0%
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 80359) of binary: /opt/conda/bin/python

from llms_tool.

Darren-w avatar Darren-w commented on May 16, 2024

请教下 loss为0.6931,'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0
这个问题是怎么修复的?

{'loss': 0.6931, 'learning_rate': 1.8467489107293509e-06, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -1307.955322265625, 'logps/chosen': -2537.4287109375, 'logits/rejected': 46.67363739013672, 'logits/chosen': 47.7917366027832, 'epoch': 0.5}
{'loss': 0.6931, 'learning_rate': 8.217156947590064e-07, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -566.4472045898438, 'logps/chosen': -863.7874755859375, 'logits/rejected': 47.26816177368164, 'logits/chosen': 47.84038543701172, 'epoch': 0.51}
{'loss': 0.6931, 'learning_rate': 2.05569786813925e-07, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -1334.9326171875, 'logps/chosen': -2294.96533203125, 'logits/rejected': 46.71002960205078, 'logits/chosen': 47.56126022338867, 'epoch': 0.52}
{'loss': 0.6931, 'learning_rate': 0.0, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -2079.239501953125, 'logps/chosen': -4097.9521484375, 'logits/rejected': 46.693851470947266, 'logits/chosen': 47.634830474853516, 'epoch': 0.52}

from llms_tool.

Darren-w avatar Darren-w commented on May 16, 2024

我将 ref_model = deepcopy(model) 重新用

ref_model = model_class.from_pretrained(
        args.model_name_or_path,
        torch_dtype=torch_dtype,
        device_map=args.device_map,
        trust_remote_code=args.trust_remote_code,
    )

加载一遍,不再为 0 ,但是loss大的离谱,这正常吗?
如下为训练过程中logging :

{'loss': 0.6931, 'learning_rate': 0.0004998072590601808, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -1000.675537109375, 'logps/chosen': -2531.12158203125, 'logits/rejected': 71.77708435058594, 'logits/chosen': 89.86573791503906, 'epoch': 0.06}
{'loss': 27.3212, 'learning_rate': 0.000499229333433282, 'rewards/chosen': 2.425292491912842, 'rewards/rejected': 23.138628005981445, 'rewards/accuracies': 0.390625, 'rewards/margins': -20.713336944580078, 'logps/rejected': -1084.484375, 'logps/chosen': -2350.399658203125, 'logits/rejected': 119.3428955078125, 'logits/chosen': 148.06057739257812, 'epoch': 0.12}
{'loss': 16.8992, 'learning_rate': 0.0004982671142387316, 'rewards/chosen': -2.7379796504974365, 'rewards/rejected': -34.28330993652344, 'rewards/accuracies': 0.578125, 'rewards/margins': 31.545333862304688, 'logps/rejected': -1494.697021484375, 'logps/chosen': -2707.468505859375, 'logits/rejected': 116.12063598632812, 'logits/chosen': 105.90425109863281, 'epoch': 0.18}
{'loss': 631.0259, 'learning_rate': 0.0004969220851487844, 'rewards/chosen': -819.938720703125, 'rewards/rejected': -214.3435821533203, 'rewards/accuracies': 0.046875, 'rewards/margins': -605.5950317382812, 'logps/rejected': -3010.5205078125, 'logps/chosen': -10806.6650390625, 'logits/rejected': 108.48920440673828, 'logits/chosen': 106.30279541015625, 'epoch': 0.24}
{'loss': 249.3126, 'learning_rate': 0.0004951963201008077, 'rewards/chosen': -336.6590881347656, 'rewards/rejected': -88.30933380126953, 'rewards/accuracies': 0.0625, 'rewards/margins': -248.34974670410156, 'logps/rejected': -1614.24853515625, 'logps/chosen': -5640.9951171875, 'logits/rejected': 97.84693908691406, 'logits/chosen': 96.17440032958984, 'epoch': 0.3}
{'loss': 29.9124, 'learning_rate': 0.0004930924800994192, 'rewards/chosen': -69.85091400146484, 'rewards/rejected': -53.566314697265625, 'rewards/accuracies': 0.34375, 'rewards/margins': -16.284597396850586, 'logps/rejected': -1685.541748046875, 'logps/chosen': -3788.849609375, 'logits/rejected': 93.85322570800781, 'logits/chosen': 92.70535278320312, 'epoch': 0.36}
{'loss': 64.2579, 'learning_rate': 0.0004906138091134118, 'rewards/chosen': -111.27135467529297, 'rewards/rejected': -97.27839660644531, 'rewards/accuracies': 0.453125, 'rewards/margins': -13.992941856384277, 'logps/rejected': -2088.65576171875, 'logps/chosen': -4169.7021484375, 'logits/rejected': 65.00901794433594, 'logits/chosen': 64.33920288085938, 'epoch': 0.42}

from llms_tool.

JingyuLi-code avatar JingyuLi-code commented on May 16, 2024

感谢咱们项目非常简约又规范的代码,在看和改造的时候都非常舒服

不过在dpo训练的时候,我的loss和rewards/chosen一直是以下这样的,这正常吗?

{'loss': 0.6931, 'learning_rate': 9.99231529256779e-06, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -445.9979248046875, 'logps/chosen': -30.411256790161133, 'logits/rejected': 2.6535236835479736, 'logits/chosen': 1.1344398260116577, 'epoch': 0.0}
{'loss': 0.6931, 'learning_rate': 9.991217477220333e-06, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -200.39610290527344, 'logps/chosen': -33.55436706542969, 'logits/rejected': 1.5881015062332153, 'logits/chosen': 3.952385187149048, 'epoch': 0.0}
{'loss': 0.6931, 'learning_rate': 9.990119661872873e-06, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -155.58612060546875, 'logps/chosen': -40.69193649291992, 'logits/rejected': 2.194831132888794, 'logits/chosen': 1.8327357769012451, 'epoch': 0.0}
{'loss': 0.6931, 'learning_rate': 9.989021846525415e-06, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -258.0289306640625, 'logps/chosen': -41.872779846191406, 'logits/rejected': 4.575325965881348, 'logits/chosen': 0.9270402789115906, 'epoch': 0.0}
{'loss': 0.6931, 'learning_rate': 9.987924031177957e-06, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -319.283203125, 'logps/chosen': -37.61365509033203, 'logits/rejected': 2.6409215927124023, 'logits/chosen': 2.5549163818359375, 'epoch': 0.0}
{'loss': 0.6931, 'learning_rate': 9.986826215830499e-06, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -215.65521240234375, 'logps/chosen': -34.17694091796875, 'logits/rejected': 5.640789985656738, 'logits/chosen': 0.18884596228599548, 'epoch': 0.0}
{'loss': 0.6931, 'learning_rate': 9.986826215830499e-06, 'rewards/chosen': 0.0, 'rewards/rejected': 0.0, 'rewards/accuracies': 0.0, 'rewards/margins': 0.0, 'logps/rejected': -422.649658203125, 'logps/chosen': -43.95429992675781, 'logits/rejected': 1.0700151920318604, 'logits/chosen': 1.0940637588500977, 'epoch': 0.0}
```遇到同样的问题,loss=0.6931不变,rewards/chosen为0,调小学习率和改变bf16也没用,请问有什么解决思路吗?

from llms_tool.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.