Hello, I'm using the pretrain code to train falcon-7B, I've already

Actually I was using the pretrain , and I think the max token length is fixed in

Training time is unexpectedly very slow compared to lit-llama about litgpt HOT 4 CLOSED

lightning-ai commented on May 19, 2024

Training time is unexpectedly very slow compared to lit-llama

from litgpt.

Comments (4)

iskandr commented on May 19, 2024

I'm also hitting some CUDA out of memory errors on models + data that I expect to more easily fit on a 40GB A100 MiG.

I'm not familiar with the lit-llama codebase, so I'm not sure what's potentially different in lit-parrot but wanted to note that I'm seeing something similar.

from litgpt.

carmocca commented on May 19, 2024

Do you still see this behaviour, and if so, can you share exactly the code you ran and the arguments passed?

from litgpt.

carmocca commented on May 19, 2024

This is because LLaMA fine-tuning is hardcoded to use 256 max_seq_length:
https://github.com/Lightning-AI/lit-llama/blob/main/scripts/prepare_alpaca.py#L26
https://github.com/Lightning-AI/lit-llama/blob/main/finetune/adapter.py#L52

Whereas this repository is configured to use the longest sequence length in alpaca: 1037. If you override it to 256 in https://github.com/Lightning-AI/lit-gpt/blob/main/finetune/adapter.py#L30, you should see the times match.

from litgpt.

LamOne1 commented on May 19, 2024

Actually I was using the pretrain script, and I think the max token length is fixed in both lit-llama and lit-gpt?

from litgpt.

Training time is unexpectedly very slow compared to lit-llama about litgpt HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent