Luca suggested to add pretrain --warmup_fraction 0.05

Add `--warmup_fraction` to pretraining script about litgpt HOT 2 CLOSED

awaelchli commented on June 15, 2024 1

Add `--warmup_fraction` to pretraining script

from litgpt.

Comments (2)

awaelchli commented on June 15, 2024 1

Yeah sorry, this would only be used for continued pretraining. I forgot to mention that.

from litgpt.

rasbt commented on June 15, 2024

That's not a bad idea and will be more robust for different dataset sizes. I wonder (and I don't know the answer) though if there is a minimum number of warm up step that should always be done, and a max number of steps that shouldn't be exceeded.

For example if we use 0.05 and pretrain on 3T tokens, that's 150 billion warmup steps, which is a bit large :D

For reference, here are the number of steps for popular LLMs (taken from the OLMo paper)

from litgpt.

Related Issues (20)

test_tinyllama issue with LitData and `iterate_over_all` HOT 2
Remove old and unused LLMs
Pretraining example from readme fails in Colab HOT 3
Streamline LitGPT API HOT 7
Redundancy? HOT 2
support for qwen2 and baichuan
'Phi-3-mini-4k-instruct' is not a supported config name HOT 1
Continue pre-training got RuntimeError: Failed processing /tmp/data HOT 4
prompt_style HOT 4
Lora recipes use lots of memory because of not wrapping parameters with gradient in separate FSDP unit HOT 2
how to pretrain llama2? HOT 4
Python API
Stream option HOT 3
Resolve output characters garbled HOT 4
Continually pretrained Llama2-7B-hf model inference is not working on 16GB GPU machine HOT 5
how to pretrain llama2 in custom data? HOT 1
Is there any best practice for using litdata to load custom data for pretraining? HOT 1
performing continuous pretraining and then finetuning causes error HOT 3
pretrain custom dataset gpu memory oom
Create new CI API key HOT 1

Add `--warmup_fraction` to pretraining script about litgpt HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent