用测试数据训练时bitsandbytes报的错，有大佬知道是什么回事吗

issues

/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 0%| | 0/8 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/python/finetune.py", line 162, in
main()
File "/home/python/finetune.py", line 121, in main
model = ChatGLMForConditionalGeneration.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2646, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2969, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 676, in _load_state_dict_into_meta_model
set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 196, in to
return self.cuda(device)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 160, in cuda
CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
File "/opt/conda/lib/python3.10/ctypes/init.py", line 387, in getattr
func = self.getitem(name)
File "/opt/conda/lib/python3.10/ctypes/init.py", line 392, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

是否考虑多卡并行训练？

目前只能在一张卡训练。

no attribute 'enable_input_require_grads'

AttributeError: 'ChatGLMForConditionalGeneration' object has no attribute 'enable_input_require_grads'

问答数据集

诚心请教一下，如果我想用问答数据集进行微调，我大概需要修改哪些地方呢？

jsonl文件如何处理获得

请问如何将json文件处理为jsonl文件的呢，能分享一下处理脚本吗？

必须finetune后infer？

[Errno 2] No such file or directory: 'output/chatglm-lora.pt'
必须finetune保存模型到output/chatglm-lora.pt后才能infer？

perf安装

perf怎么安装不了呢

我去掉了lora部分，在原始结构finetune，总报错ValueError: Attempting to unscale FP16 gradients.

我查了一下代码，是在这个地方 allow_fp16 无法被设置成true，设置了就能通过，我应该在什么地方配置，有大神指导吗？

│ /home/ubuntu/venv/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:285 in unscale_ │
│ │
│ 282 │ │ inv_scale = self._scale.double().reciprocal().float() │
│ 283 │ │ found_inf = torch.full((1,), 0.0, dtype=torch.float32, device=self._scale.device │
│ 284 │ │ │
│ ❱ 285 │ │ optimizer_state["found_inf_per_device"] = self.unscale_grads(optimizer, inv_sc │
│ 286 │ │ optimizer_state["stage"] = OptState.UNSCALED │
│ 287 │ │
│ 288 │ def _maybe_opt_step(self, optimizer, optimizer_state, *args, **kwargs): │
│ │
│ /home/ubuntu/venv/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:213 in │
│ unscale_grads │
│ │
│ 210 │ │ │ │ │ │ continue │
│ 211 │ │ │ │ │ #allow_fp16 = True │
│ 212 │ │ │ │ │ if (not allow_fp16) and param.grad.dtype == torch.float16: │
│ ❱ 213 │ │ │ │ │ │ raise ValueError("Attempting to unscale FP16 gradients.") │
│ 214 │ │ │ │ │ if param.grad.is_sparse:

加载模型该用哪个pt文件

如果是在训练过程中的中间文件，并没有chatglm-lora.pt文件，那么在加载模型的时候，需要加载哪一个文件呢？

finetune的时候加上 --fp16报错，RuntimeError: expected scalar type Half but found Float

finetune完之后又进行效果评估吗？

Thx for the codes. 已知chat-glm有进行finetune过，在当前数据上继续finetune输出结果体感如何？有进一步的改善吗？

infer的时候使用int8报错

infer的时候（load_in_8bit=True）：
expected scalar type Float but found Half

另外我怎么用int8和int4呢，我只有1080ti，显存不够；
多谢多谢！

Where is define "VOCAB_FILES_NAMES" in tokenization_chatglm.py

Hi ,
Thanks for ur release!

Where is define "VOCAB_FILES_NAMES" in tokenization_chatglm.py

ChatGLM-Tuning/tokenization_chatglm.py

Line 301 in f7ba507

save_directory, VOCAB_FILES_NAMES["vocab_file"]

24G显存oom了

中文数据集长度在320以内

ValueError: Please specify `target_modules` in `peft_config`

Traceback (most recent call last):
File "finetune.py", line 96, in
main()
File "finetune.py", line 76, in main
model = get_peft_model(model, peft_config)
File "/home/bocheng/softinstalled/anaconda3/envs/py38/lib/python3.8/site-packages/peft/mapping.py", line 142, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/home/bocheng/softinstalled/anaconda3/envs/py38/lib/python3.8/site-packages/peft/mapping.py", line 117, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

infer报错

Traceback (most recent call last):
File "infer.py", line 27, in
model = get_peft_model(model, peft_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 143, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 118, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

由于代码更新后不涉及此问题，所以关闭

infer和cover_alpaca2jsonl的数据格式不一致

cover_alpaca2jsonl没有使用 ###
infer里面使用了 ###
此外，cover_alpaca2jsonl得到的jsonl文件带有context和target键，且没有###，和项目中的 https://github.com/mymusise/ChatGLM-Tuning/blob/master/data/alpaca_data.jsonl 是不同的，应该删掉项目里的alpaca_data.jsonl文件？

在colab上报错ModuleNotFoundError: No module named 'modeling_chatglm'

ModuleNotFoundError Traceback (most recent call last)
in
1 from transformers import AutoTokenizer, AutoModel, TrainingArguments, AutoConfig
----> 2 from modeling_chatglm import ChatGLMForConditionalGeneration
3 import torch
4 import torch.nn as nn
5 from peft import get_peft_model, LoraConfig, TaskType

ModuleNotFoundError: No module named 'modeling_chatglm'

NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

infer 时报错

│

│ 334 │ │ return _sentencepiece.SentencePieceProcessor__EncodeAsImmutableProtoBatch(self, │
│ 335 │ │
│ 336 │ def _DecodeIds(self, ids): │
│ ❱ 337 │ │ return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids) │
│ 338 │ │
│ 339 │ def _DecodePieces(self, pieces): │
│ 340 │ │ return _sentencepiece.SentencePieceProcessor__DecodePieces(self, pieces) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: Out of range: piece id is out of range.

加载预训练报错

AttributeError: /root/anaconda3/envs/big_project/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

Merge LoRa with original weights

关于微调代码的疑问

源代码如下：

class ModifiedTrainer(Trainer):

    def compute_loss(self, model, inputs, return_outputs=False):
        return model(
            input_ids=inputs["input_ids"],
            attention_mask=torch.ones_like(inputs["input_ids"]).bool(),
            labels=inputs["input_ids"],
        ).loss

疑问1：这里的attention mask不应该是下三角或者unilm那种吗？
疑问2：这里的labels不需要把一部分设置为-100吗？

finetune指定--per_device_train_batch_size 大于1时报错

Traceback (most recent call last):
File "finetune.py", line 93, in
main()
File "finetune.py", line 85, in main
trainer.train()
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/transformers/trainer.py", line 1872, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "finetune.py", line 35, in data_collator
"input_ids": torch.stack([
RuntimeError: stack expects each tensor to be equal size, but got [51] at entry 0 and [55] at entry 1

请问这个项目是Lora，还是原始结构的finetune?

在推理时的temperature参数似乎无效

我尝试了将temperature调到很离谱的大小（如100），但生成的结果和temperature为0时相同

感谢所有贡献者，希望共享微调模型

感谢作者！感谢所有贡献者！感谢所有愿意训练模型的人！
希望能共享微调后的模型，开源**！

使用 alpaca 数据 lora fintuning 之后，在 infer 的时候 response 里为什么会出现 instruction 和 input 部分的内容？

如何在上一次的结果上继续训练

1、上一次训练没有收敛好，想在上次的最佳checkpoint上继续训练
2、数据发生了变化，新的微调量变得更小，想在之前的checkpoint上继续fine-tune
通过指定--overwrite_output_dir True, resume_from_checkpoint=True不能奏效，loss会恢复成很大的值（但是确实是从上一个epoch开始迭代并接续了之前结果的学习率），请教下应该如何调整？

多轮对话如何处理input和output数据呢？

目前这个是借助alpaca的方式来finetune，如果希望finetune到别的中文数据集，如中文对话数据集，应该处理的输入是什么样子的呢？

output中存下的checkpoint怎么用

请问微调时候，Trainer存下的checkpoints是没用吗？要等所有数据微调完才能存lora.pt去微调吗

问一下大家，设备和环境问题，no kernel image is available

是CUDA10.2,测试torch没问题，但是运行cli_demo.py就报错算力不足的问题如图：

requirements问题

能不能pip list一下具体的torch版本、peft之类的，因为peft强制依赖torch>=1.13

低版本torch是不是不行？

大概5小时可以训练完，但是loss一直是0，是正常的吗

{'loss': 0.0, 'learning_rate': 1.9230769230769234e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.730769230769231e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.5384615384615387e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.3461538461538464e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.153846153846154e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 9.615384615384617e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 7.692307692307694e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 5.76923076923077e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 3.846153846153847e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 1.9230769230769234e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 1.0}

@mymusise

有没有可用于指令微调的中文数据集？

类似这样的
https://github.com/tatsu-lab/stanford_alpaca/blob/main/alpaca_data.json

环境配置问题

CUDA必须大于11.6吗？
我的环境是CUDA11.2，微调的时候报错：ImportError: cannot import name 'skip_init' from 'torch.nn.utils'
skip_init函数是不是只有在torch 2.0上才能用？
哪位大佬帮忙给解答一下？十分感谢！！！

多卡执行finetune报错

Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 1a000

数据预处理报错

使用的默认配置
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

52002it [00:27, 1911.56it/s]
Traceback (most recent call last):
File "/home/dspwasc/Public/chat/ChatGLM-Tuning/tokenize_dataset_rows.py", line 45, in
main()
File "/home/dspwasc/Public/chat/ChatGLM-Tuning/tokenize_dataset_rows.py", line 38, in main
arr = np.array(all_tokenized)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (52002,) + inhomogeneous part.

bs设置为1和3的时候，loss都是0，是哪里有问题吗？

{'loss': 0.0, 'learning_rate': 1.9e-05, 'epoch': 0.15}
{'loss': 0.0, 'learning_rate': 1.8e-05, 'epoch': 0.3}
{'loss': 0.0, 'learning_rate': 1.7e-05, 'epoch': 0.45}
{'loss': 0.0, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.6}
{'loss': 0.0, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.75}
{'loss': 0.0, 'learning_rate': 1.4e-05, 'epoch': 0.9}

代码基本已经验证成功了，但会输出无关问题

@mymusise 首先非常感谢您的代码，我已经基本跑通。V100由于开不了int8推理，所以只设置batch_size=2。
我采用的数据的json格式如下：
{"context": "[Round 0]\n问：你的名字\n答：", "target": "我叫大山,是一个代表我的虚拟身份的名称。\n"}
即照搬chatglm的输入格式。

我采用了一个只有16个对话的极小数据集，经过我的测试，有一个小问题，就是虽然在finetune.py中您在label中多加入了一个eos_token_id,但针对训练集上的问题进行推理，不但有正确答案，还和其他问题和回答一起输出。

所以我又在tokenize_datasets_row.py第16行又加入一个eos_token_id,即：

input_ids = promopt_ids + target_ids + [tokenizer.eos_token_id]*2

输出结果就正常了，我也不知道为什么。可能原程序并没有预测出eos。

NameError: name 'init_empty_weights' is not defined

infer报错

Traceback (most recent call last):
File "infer.py", line 27, in
model = get_peft_model(model, peft_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 143, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 118, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

RuntimeError: GET was unable to find an engine to execute this computation

Traceback (most recent call last):
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 162, in
main()
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 153, in main
trainer.train()
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 2645, in training_step
loss = self.compute_loss(model, inputs)
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 100, in compute_loss
return model(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/peft/peft_model.py", line 529, in forward
return self.base_model(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 1033, in forward
transformer_outputs = self.transformer(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 878, in forward
layer_ret = layer(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 573, in forward
attention_outputs = self.attention(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 398, in forward
mixed_raw_layer = self.query_key_value(hidden_states)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/peft/tuners/lora.py", line 613, in forward
after_B = self.lora_B(after_A.transpose(-2, -1)).transpose(-2, -1)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: GET was unable to find an engine to execute this computation

使用cuda11.2，报错是因为cuda的原因吗？

想问下这个训练后的效果好吗，我训练完后，效果比原本的模型差很多，根本就不是一个级别的

运行finetune.py时报错

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "A:\ChatGLM-Tuning-master\finetune.py", line 6, in
from peft import get_peft_model, LoraConfig, TaskType
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft_init_.py", line 22, in
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\mapping.py", line 16, in
from .peft_model import (
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\peft_model.py", line 31, in
from .tuners import LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners_init_.py", line 20, in
from .lora import LoraConfig, LoraModel
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners\lora.py", line 36, in
import bitsandbytes as bnb
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes_init_.py", line 7, in
from .autograd.functions import (
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\autograd_init.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\autograd_functions.py", line 9, in
import bitsandbytes.functional as F
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py", line 22, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues

3090训练时报显存溢出

File "L:\PycharmProjects\chatglm\venv\lib\site-packages\bitsandbytes\functional.py", line 361, in get_transform_buffer
return init_func((rows, cols), dtype=dtype, device=device), state
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 24.00 GiB total capacity; 22.76 GiB already allocated; 0 bytes free; 23.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/52000 [00:09<?, ?it/s]

bug: _IncompatibleKeys(missing_keys:.......................

感谢作者大佬的伟大工作。
我在infer时遇到个bug，不知道是我哪一步操作有问题，请大佬们指正。

我用 https://github.com/mymusise/ChatGLM-Tuning/blob/master/finetune.py 训出lora，在 https://github.com/mymusise/ChatGLM-Tuning/blob/master/infer.ipynb 加载lora文件，出现提示如下

不知道是lora训练的问题，还是infer加载的有问题？？

训练代码如下：
CUDA_VISIBLE_DEVICES=2 python finetune.py
--dataset_path data/need_demo
--lora_rank 8
--per_device_train_batch_size 4
--gradient_accumulation_steps 1
--max_steps 50000
--save_steps 10000
--save_total_limit 2
--learning_rate 2e-5
--fp16
--logging_steps 50
--output_dir output

infer代码如下：

key error 'seq_len'

用最新的finetune代码，遇到错误：

dataset：

Dataset({ features: ['input_ids', 'seq_len'], num_rows: 52002 })

def data_collator(features: list) -> dict: len_ids = [len(feature["input_ids"]) for feature in features] longest = max(len_ids) + 1 input_ids = [] attention_mask_list = [] position_ids_list = [] labels_list = [] for ids_l, feature in sorted(zip(len_ids, features), key=lambda x: -x[0]): ids = feature["input_ids"] seq_len = feature["seq_len"] labels = ( [-100] * (seq_len - 1) + ids[(seq_len - 1) :] + [tokenizer.eos_token_id] + [-100] * (longest - ids_l - 1) )

File "finetune.py", line 71, in data_collator
seq_len = feature["seq_len"]
KeyError: 'seq_len'

无法正确生成 eos token

模型在训练结束后 Inference 无法正确生成 eos token，之前看有 issue 提过这个问题，但是关闭了

预处理数据集时报错

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
52002it [00:20, 2492.40it/s]
Traceback (most recent call last):
File "A:\ChatGLM-Tuning-master\tokenize_dataset_rows.py", line 45, in
main()
File "A:\ChatGLM-Tuning-master\tokenize_dataset_rows.py", line 38, in main
arr = np.array(all_tokenized)
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

mymusise / chatglm-tuning Goto Github PK

chatglm-tuning's Issues

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

To view examples of installing some common dependencies, click the "Open Examples" button below.

Recommend Projects

Recommend Topics

Recommend Org

To view examples of installing some common dependencies, click the
"Open Examples" button below.