Coder Social home page Coder Social logo

chatglm-tuning's Issues

用测试数据训练时bitsandbytes报的错,有大佬知道是什么回事吗

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 0%| | 0/8 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/python/finetune.py", line 162, in
main()
File "/home/python/finetune.py", line 121, in main
model = ChatGLMForConditionalGeneration.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2646, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2969, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 676, in _load_state_dict_into_meta_model
set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 196, in to
return self.cuda(device)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 160, in cuda
CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
File "/opt/conda/lib/python3.10/ctypes/init.py", line 387, in getattr
func = self.getitem(name)
File "/opt/conda/lib/python3.10/ctypes/init.py", line 392, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

问答数据集

诚心请教一下,如果我想用问答数据集进行微调,我大概需要修改哪些地方呢?

必须finetune后infer?

[Errno 2] No such file or directory: 'output/chatglm-lora.pt'
必须finetune保存模型到output/chatglm-lora.pt后才能infer?

我去掉了lora部分,在原始结构finetune,总报错ValueError: Attempting to unscale FP16 gradients.

我查了一下代码,是在这个地方 allow_fp16 无法被设置成true,设置了就能通过,我应该在什么地方配置,有大神指导吗?

│ /home/ubuntu/venv/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:285 in unscale_ │
│ │
│ 282 │ │ inv_scale = self._scale.double().reciprocal().float() │
│ 283 │ │ found_inf = torch.full((1,), 0.0, dtype=torch.float32, device=self._scale.device │
│ 284 │ │ │
│ ❱ 285 │ │ optimizer_state["found_inf_per_device"] = self.unscale_grads(optimizer, inv_sc │
│ 286 │ │ optimizer_state["stage"] = OptState.UNSCALED │
│ 287 │ │
│ 288 │ def _maybe_opt_step(self, optimizer, optimizer_state, *args, **kwargs): │
│ │
│ /home/ubuntu/venv/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:213 in │
unscale_grads
│ │
│ 210 │ │ │ │ │ │ continue │
│ 211 │ │ │ │ │ #allow_fp16 = True │
│ 212 │ │ │ │ │ if (not allow_fp16) and param.grad.dtype == torch.float16: │
│ ❱ 213 │ │ │ │ │ │ raise ValueError("Attempting to unscale FP16 gradients.") │
│ 214 │ │ │ │ │ if param.grad.is_sparse:

加载模型该用哪个pt文件

如果是在训练过程中的中间文件,并没有chatglm-lora.pt文件,那么在加载模型的时候,需要加载哪一个文件呢?

infer的时候使用int8报错

infer的时候(load_in_8bit=True):
expected scalar type Float but found Half

另外我怎么用int8和int4呢,我只有1080ti,显存不够;
多谢多谢!

ValueError: Please specify `target_modules` in `peft_config`

Traceback (most recent call last):
File "finetune.py", line 96, in
main()
File "finetune.py", line 76, in main
model = get_peft_model(model, peft_config)
File "/home/bocheng/softinstalled/anaconda3/envs/py38/lib/python3.8/site-packages/peft/mapping.py", line 142, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/home/bocheng/softinstalled/anaconda3/envs/py38/lib/python3.8/site-packages/peft/mapping.py", line 117, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

infer报错

Traceback (most recent call last):
File "infer.py", line 27, in
model = get_peft_model(model, peft_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 143, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 118, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

在colab上报错ModuleNotFoundError: No module named 'modeling_chatglm'


ModuleNotFoundError Traceback (most recent call last)
in
1 from transformers import AutoTokenizer, AutoModel, TrainingArguments, AutoConfig
----> 2 from modeling_chatglm import ChatGLMForConditionalGeneration
3 import torch
4 import torch.nn as nn
5 from peft import get_peft_model, LoraConfig, TaskType

ModuleNotFoundError: No module named 'modeling_chatglm'


NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

infer 时报错

│ 334 │ │ return _sentencepiece.SentencePieceProcessor__EncodeAsImmutableProtoBatch(self, │
│ 335 │ │
│ 336 │ def _DecodeIds(self, ids): │
│ ❱ 337 │ │ return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids) │
│ 338 │ │
│ 339 │ def _DecodePieces(self, pieces): │
│ 340 │ │ return _sentencepiece.SentencePieceProcessor__DecodePieces(self, pieces) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: Out of range: piece id is out of range.

加载预训练报错

AttributeError: /root/anaconda3/envs/big_project/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

关于微调代码的疑问

源代码如下:

class ModifiedTrainer(Trainer):

    def compute_loss(self, model, inputs, return_outputs=False):
        return model(
            input_ids=inputs["input_ids"],
            attention_mask=torch.ones_like(inputs["input_ids"]).bool(),
            labels=inputs["input_ids"],
        ).loss

疑问1:这里的attention mask不应该是下三角或者unilm那种吗?
疑问2:这里的labels不需要把一部分设置为-100吗?

finetune指定--per_device_train_batch_size 大于1时报错

Traceback (most recent call last):
File "finetune.py", line 93, in
main()
File "finetune.py", line 85, in main
trainer.train()
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/transformers/trainer.py", line 1872, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "finetune.py", line 35, in data_collator
"input_ids": torch.stack([
RuntimeError: stack expects each tensor to be equal size, but got [51] at entry 0 and [55] at entry 1

如何在上一次的结果上继续训练

1、上一次训练没有收敛好,想在上次的最佳checkpoint上继续训练
2、数据发生了变化,新的微调量变得更小,想在之前的checkpoint上继续fine-tune
通过指定--overwrite_output_dir True, resume_from_checkpoint=True不能奏效,loss会恢复成很大的值(但是确实是从上一个epoch开始迭代并接续了之前结果的学习率),请教下应该如何调整?

requirements问题

能不能pip list一下具体的torch版本、peft之类的,因为peft强制依赖torch>=1.13

低版本torch是不是不行?

大概5小时可以训练完,但是loss一直是0,是正常的吗

大概5小时可以训练完,但是loss一直是0,是正常的吗

{'loss': 0.0, 'learning_rate': 1.9230769230769234e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.730769230769231e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.5384615384615387e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.3461538461538464e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.153846153846154e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 9.615384615384617e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 7.692307692307694e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 5.76923076923077e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 3.846153846153847e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 1.9230769230769234e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 1.0}

@mymusise

环境配置问题

CUDA必须大于11.6吗?
我的环境是CUDA11.2,微调的时候报错:ImportError: cannot import name 'skip_init' from 'torch.nn.utils'
skip_init函数是不是只有在torch 2.0上才能用?
哪位大佬帮忙给解答一下?十分感谢!!!

数据预处理报错

使用的默认配置
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

52002it [00:27, 1911.56it/s]
Traceback (most recent call last):
File "/home/dspwasc/Public/chat/ChatGLM-Tuning/tokenize_dataset_rows.py", line 45, in
main()
File "/home/dspwasc/Public/chat/ChatGLM-Tuning/tokenize_dataset_rows.py", line 38, in main
arr = np.array(all_tokenized)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (52002,) + inhomogeneous part.

bs设置为1和3的时候,loss都是0, 是哪里有问题吗?

{'loss': 0.0, 'learning_rate': 1.9e-05, 'epoch': 0.15}
{'loss': 0.0, 'learning_rate': 1.8e-05, 'epoch': 0.3}
{'loss': 0.0, 'learning_rate': 1.7e-05, 'epoch': 0.45}
{'loss': 0.0, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.6}
{'loss': 0.0, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.75}
{'loss': 0.0, 'learning_rate': 1.4e-05, 'epoch': 0.9}

代码基本已经验证成功了,但会输出无关问题

@mymusise 首先非常感谢您的代码,我已经基本跑通。V100由于开不了int8推理,所以只设置batch_size=2。
我采用的数据的json格式如下:
{"context": "[Round 0]\n问:你的名字\n答:", "target": "我叫大山,是一个代表我的虚拟身份的名称。\n"}
即照搬chatglm的输入格式。

我采用了一个只有16个对话的极小数据集,经过我的测试,有一个小问题,就是虽然在finetune.py中您在label中多加入了一个eos_token_id,但针对训练集上的问题进行推理,不但有正确答案,还和其他问题和回答一起输出。

所以我又在tokenize_datasets_row.py第16行又加入一个eos_token_id,即:

input_ids = promopt_ids + target_ids + [tokenizer.eos_token_id]*2

输出结果就正常了,我也不知道为什么。可能原程序并没有预测出eos。

infer报错

Traceback (most recent call last):
File "infer.py", line 27, in
model = get_peft_model(model, peft_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 143, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 118, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

RuntimeError: GET was unable to find an engine to execute this computation

Traceback (most recent call last):
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 162, in
main()
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 153, in main
trainer.train()
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 2645, in training_step
loss = self.compute_loss(model, inputs)
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 100, in compute_loss
return model(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/peft/peft_model.py", line 529, in forward
return self.base_model(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 1033, in forward
transformer_outputs = self.transformer(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 878, in forward
layer_ret = layer(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 573, in forward
attention_outputs = self.attention(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 398, in forward
mixed_raw_layer = self.query_key_value(hidden_states)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/peft/tuners/lora.py", line 613, in forward
after_B = self.lora_B(after_A.transpose(-2, -1)).transpose(-2, -1)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: GET was unable to find an engine to execute this computation

使用cuda11.2, 报错是因为cuda的原因吗?

运行finetune.py时报错

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "A:\ChatGLM-Tuning-master\finetune.py", line 6, in
from peft import get_peft_model, LoraConfig, TaskType
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft_init_.py", line 22, in
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\mapping.py", line 16, in
from .peft_model import (
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\peft_model.py", line 31, in
from .tuners import LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners_init_.py", line 20, in
from .lora import LoraConfig, LoraModel
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners\lora.py", line 36, in
import bitsandbytes as bnb
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes_init_.py", line 7, in
from .autograd.functions import (
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\autograd_init
.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\autograd_functions.py", line 9, in
import bitsandbytes.functional as F
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py", line 22, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues

3090训练时报显存溢出

File "L:\PycharmProjects\chatglm\venv\lib\site-packages\bitsandbytes\functional.py", line 361, in get_transform_buffer
return init_func((rows, cols), dtype=dtype, device=device), state
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 24.00 GiB total capacity; 22.76 GiB already allocated; 0 bytes free; 23.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/52000 [00:09<?, ?it/s]

bug: _IncompatibleKeys(missing_keys:.......................

感谢作者大佬的伟大工作。
我在infer时遇到个bug,不知道是我哪一步操作有问题,请大佬们指正。

我用 https://github.com/mymusise/ChatGLM-Tuning/blob/master/finetune.py 训出lora,在 https://github.com/mymusise/ChatGLM-Tuning/blob/master/infer.ipynb 加载lora文件,出现提示如下
image
不知道是lora训练的问题,还是infer加载的有问题??

训练代码如下:
CUDA_VISIBLE_DEVICES=2 python finetune.py
--dataset_path data/need_demo
--lora_rank 8
--per_device_train_batch_size 4
--gradient_accumulation_steps 1
--max_steps 50000
--save_steps 10000
--save_total_limit 2
--learning_rate 2e-5
--fp16
--logging_steps 50
--output_dir output

infer代码如下:
1679385364624

key error 'seq_len'

用最新的finetune代码,遇到错误:

dataset:

Dataset({ features: ['input_ids', 'seq_len'], num_rows: 52002 })

def data_collator(features: list) -> dict: len_ids = [len(feature["input_ids"]) for feature in features] longest = max(len_ids) + 1 input_ids = [] attention_mask_list = [] position_ids_list = [] labels_list = [] for ids_l, feature in sorted(zip(len_ids, features), key=lambda x: -x[0]): ids = feature["input_ids"] seq_len = feature["seq_len"] labels = ( [-100] * (seq_len - 1) + ids[(seq_len - 1) :] + [tokenizer.eos_token_id] + [-100] * (longest - ids_l - 1) )

File "finetune.py", line 71, in data_collator
seq_len = feature["seq_len"]
KeyError: 'seq_len'

无法正确生成 eos token

模型在训练结束后 Inference 无法正确生成 eos token,之前看有 issue 提过这个问题,但是关闭了

预处理数据集时报错

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
52002it [00:20, 2492.40it/s]
Traceback (most recent call last):
File "A:\ChatGLM-Tuning-master\tokenize_dataset_rows.py", line 45, in
main()
File "A:\ChatGLM-Tuning-master\tokenize_dataset_rows.py", line 38, in main
arr = np.array(all_tokenized)
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.