Hi, By anychance, have you tried literally run on the llama-2 model?

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Eval Loss NaN on Llama-2 about llm-pruner HOT 3 OPEN

horseee commented on May 20, 2024

Eval Loss NaN on Llama-2

from llm-pruner.

Comments (3)

horseee commented on May 20, 2024

Greetings!

Regarding the test results on PTB after pruning, the reason for the worse score lies in the unpruned llama2-7B model, which was approximately 47, significantly higher than llama-7B (~22) on PTB.

As for the issue of NaN during post-training, we encountered the same problem you reported. Currently, we are searching for the appropriate hyper-parameters to fine-tune the pruned model. If we obtain any new findings or find any bugs in our code, we will promptly update you.

from llm-pruner.

mmichaelzhang commented on May 20, 2024

Thank you for the timely reply! Hope to get back with good news.

Cheers.

from llm-pruner.

kyang-06 commented on May 20, 2024

@mmichaelzhang Have you resolved this issue? I also observed training loss explosion and encountered performance deterioration for llama2-7b using default llama settings:

Wikitext2 w/o tune	Ptb w/o tune	BoolQ Acc	PIQA Acc	HellaSwag Acc_norm	WinoGrande Acc	ARC-e acc	ARC-c Acc_norm	OBQA Acc_norm
19.24	72.61	37.83	52.34	26.64	49.41	25.08	27.82	28.40

As pruned model weights is quantized by int8 and frozen for post-training, I think the phenomenon is non-related with BF16/FP16 dtype, which is considered as the cause by authors:

Tip: Training LLaMA-2 in float16 is not recommended and is known to produce nan; as such, the model should be trained in bfloat16.

from llm-pruner.

Eval Loss NaN on Llama-2 about llm-pruner HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent